public inbox for [email protected]  
help / color / mirror / Atom feed
Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
34+ messages / 6 participants
[nested] [flat]

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-02-20 17:59 Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Andres Freund @ 2026-02-20 17:59 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
> > > +      */
> > > +     if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > > +     {
> > > +             vacrel->vm_new_visible_pages++;
> > > +             if (presult.all_frozen)
> > > +             {
> > > +                     vacrel->vm_new_visible_frozen_pages++;
> > > +                     *vm_page_frozen = true;
> >
> > Not this patches fault, but I find "vm_new_visible_pages" and
> > "vm_new_visible_frozen_pages" pretty odd names. The concept is all-visible and
> > frozen. The page itself isn't visible or invisible...
> 
> I thought having the extra word "all" in there made it too long. And
> since "vm" is there, that isn't set unless the page is
> _all_-visible/all-frozen. But if you think it gives people the wrong
> idea, I am willing to change it. I can omit vm and make it:
> new_all_visible_all_frozen_pages
> new_all_visible_pages
> new_all_frozen_pages
> 
> Is that clearer?

Yes, I think so.


> > It's also a bit odd that a function that sounds rather read-only does stuff
> > like clearing VM/all-visible.
> 
> I thought about this a lot. Ultimately, I ended up keeping it the way it is.
> I think the other option is changing from this:
> 
>     do_set_vm = heap_page_will_set_vm(&prstate,
>                                       params->relation,
>                                       blockno, buffer, page,
>                                       vmbuffer,
>                                       params->reason,
>                                       do_prune, do_freeze,
>                                       prstate.lpdead_items,
>                                       &old_vmbits, &new_vmbits);
>
> to this:
> 
>     heap_page_prepare_vm_set(&prstate,
>                                 params->relation,
>                                 blockno, buffer, page,
>                                 vmbuffer,
>                                 params->reason,
>                                 do_prune, do_freeze,
>                                 prstate.lpdead_items,
>                                 &old_vmbits, &new_vmbits);
> 
>     do_set_vm = (new_vmbits & VISIBILITYMAP_VALID_BITS) != 0;
> 
> or heap_page_plan_vm_set()

> heap_page_will_set_vm() has symmetry with heap_page_will_freeze(), the
> helper that decides whether or not we will freeze tuples. I like that
> symmetry since heap_page_will_set_vm() decides whether or not to set
> the VM.
> 
> Now, heap_page_plan/prepare_vm_set() does indirectly hint that
> something like clearing VM/all-visible could happen -- if you
> understand that preparing the VM to have bits set also includes
> clearing any existing corruption. And "prepare" or "plan" has more
> symmetry with prune_freeze_plan() -- though that function does not
> make changes on the page.
> 
> Ultimately, clearing the VM/page of corruption is pretty anomalous
> from the rest of the code in heap_page_prune_and_freeze(). All other
> changes to the page are done in a single critical section at the
> bottom of the function.
> 
> I could see an argument for moving identify_and_fix_vm_corruption()
> out of the helper and into heap_page_prune_and_freeze() but then we'd
> have to move visibilitymap_get_status() out too. And that takes away a
> lot of the benefit of encapsulating all that logic.

I was wondering about that option. Relatedly, I also was wondering if we ought
to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.


> > Why are we not doing fixing up of the page *before* we prune it?  It's a bit
> > insane that we do the WAL logging for pruning, which in turn will often
> > include an FPI, before we do the fixups. The fixes aren't WAL logged, so this
> > actually leads to the standby getting further out of sync.
> >
> > I realize this isn't your mess, but brrr.
> 
> Well, after this patch set, clearing the VM does happen before we emit
> WAL for pruning.

That I think is a substantial improvement, the current (i.e. before your
series) placement really is pretty insane due to the guaranteed divergence it
causes.

I wonder if we actually should just force an FPI whenever we detect such
corruption, that way it would reliably fixed on the standby as well.


> It wouldn't be hard to move the corruption fixups to the beginning of
> heap_page_prune_and_freeze() in the new code structure.

As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
true?

I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
nlpdead_items > 0)" test should be moved to
heap_prune_record_dead_or_unused(). That way the WARNING could include the
offset number and it'd also work in the mark_unused_now case.

Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
DELETE_IN_PROGRESS?


At that point the !page_all_visible && vm_all_visible part could indeed be
moved to the start of heap_page_prune_and_freeze()


> But it would split visibility map-related logic into two parts of
> heap_page_prune_and_freeze().

I'm not convinced that that's an issue. new/old_vmbits area already variables at
the level of heap_page_prune_and_freeze(), determining it a bit earlier seems
not a problem.  And I think doing this check even if !attempt_update_vm might
very well make sense.


> Would it be worth it? What benefit would we get? Do you just feel that it
> should logically come first?

One insanity is that right now we will process all frozen pages over and over
due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
It'd be quite defensible to just skip processing the page once we determined
it's already all frozen.  But for that we'd probably want to do the
"page_all_visible && vm_all_visible" check before returning...




> > Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
> > HEAP_PAGE_PRUNE_UPDATE_VM would be set?
> 
> Yes, when setting the VM on-access, it is too expensive to call
> heap_prepare_freeze_tuple() on each tuple. I could work on trying to
> optimize it, but it isn't currently viable.

Is it too expensive to do so even when we already decided to do some pruning?
I am not surprised it's too expensive when there's not even a dead tuple on
the page.  But I am mildly surprised if it's too expensive to do when we'd WAL
log anyway?





> > > From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
> > > From: Melanie Plageman <[email protected]>
> > > Date: Tue, 2 Dec 2025 16:16:22 -0500
> 
> > > +static TransactionId
> > > +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> > > +                              uint8 old_vmbits, uint8 new_vmbits,
> > > +                              TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> > > +                              TransactionId visibility_cutoff_xid)
> > > +{
> >
> > The logic for horizons is now split between this and "Calculate what the
> > snapshot conflict horizon should be for a record" in heap_page_will_freeze().
> 
> That is true in master too. We determine frz_conflict_horizon in
> heap_page_will_freeze() and later before emitting the WAL record
> decide which of the latest_xid_removed and frz_conflict_horizon that
> we should use as the snapshot conflict horizon for the combined
> record.

I'm not sure that confusing code in master is a particularly good reason for
anything...


> All I've done is expand that part (the part before emitting the WAL
> record) a bit because now we have to consider what the horizon would
> be if we set the VM.
> 
> If I really wanted to calculate it only in a single place, I could
> maintain a new variable, all_frozen_except_dead, and remove the
> frz_conflict_horizon from heap_page_will_freeze(). Then, in
> get_conflict_xid(), I could have the following logic:
> 
>     if (do_set_vm)
>         conflict_xid = visibility_cutoff_xid;
>     else if (do_freeze)
>     {
>         if (all_frozen_except_dead)
>             conflict_xid = visibility_cutoff_xid;
>         else
>         {
>             conflict_xid = OldestXmin;
>             TransactionIdRetreat(conflict_xid);
>         }
>     }
>     else
>         conflict_xid = InvalidTransactionId;
> 
> I think using all_frozen_except_dead while maintaining
> visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
> the potential to be confusing, though. We'd need to keep updating
> visibility_cutoff_xid when all_visible is false but
> all_frozen_except_dead is true as well as when all_visible is true.
> And because we don't care about all_visible_except_dead, it gets even
> more confusing to make sure we are maintaining the right variables in
> the right situations.

I suspect we should just track all of the horizons/cutoffs all the time. This
whole stuff about optimizing out a few conditional assignments complicates the
code substantially and feels extremely error prone to me.

I probably complained about this before, and it's not this patch's fault, but
PruneState->{all_visible,all_frozen} are imo confusingly named, due to
sounding like they describe the current state, rather than the possible state
after pruning.  It's not helped by this comment:

	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
	 * That's convenient for heap_page_prune_and_freeze() to use them to
	 * decide whether to opportunistically freeze the page or not.  The
	 * all_visible and all_frozen values ultimately used to set the VM are
	 * adjusted to include LP_DEAD items after we determine whether or not to
	 * opportunistically freeze.

"all-visible ... are adjusted to include LP_DEAD" ... - just reading that it's
hard to know what it means.



> > Although I guess I don't understand that code:
> >
> >                 /*
> >                  * Calculate what the snapshot conflict horizon should be for a record
> >                  * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
> >                  * for conflicts when the whole page is eligible to become all-frozen
> >                  * in the VM once we're done with it. Otherwise, we generate a
> >                  * conservative cutoff by stepping back from OldestXmin.
> >                  */
> >                 if (prstate->all_frozen)
> >                         prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
> >                 else
> >                 {
> >                         /* Avoids false conflicts when hot_standby_feedback in use */
> >                         prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
> >                         TransactionIdRetreat(prstate->frz_conflict_horizon);
> >                 }
> >
> > Why does it make sense to use OldestXmin? Consider e.g. the case where there
> > is one very old tuple that needs to be frozen and one new live tuple on a
> > page. Because of the new tuple we can't mark the page all-frozen. But there's
> > also no reason to not use much less aggressive horizon than OldestXmin, namely
> > the newer of xmin,xmax of the old frozen tuple?
> 
> We don't track the newest frozen xmin right now. Doing so wouldn't be
> free (i.e. more comparisons which may matter in a query without much
> other overhead).

I have an extremely hard time to believe that tracking the Min(xmin) over all
rows is going to be noticeable in comparison to all the overheads in pruning
(including pruning where we do nothing).

There are some checks that would be different, in particular,
heap_pre_freeze_checks() is quite expensive, because it forces an access to
the SLRU, without using hint bits. But that's a check that we wisely
(79d4bf4eff14) do only after we already decided to freeze.


The first thing to improve pruning performance that I would do is to introduce
a fastpath for pages that a) area already frozen b) do not have dead items (if
we're not freezing). Iterating through HOT chains is far from cheap, and if
all rows are live, there's not really a point in doing so.  This is
particulary important for VACUUMs where we end up freezing a ton of pages that
are already frozen, due to the silly skip_pages_threshold thing.


> The only purpose it would serve is to make the snapshot conflict
> horizon more accurate/more aggressive when we freeze tuples, which
> would lead to canceling less queries than master -- which is outside
> the purview of this patch.

I think it'd also serve to actually make the choice of the horizon a heck of a
lot easier to understand.  Using approximations of the accurate value means
you have to think about why that approximation is correct, whether it's too
approximate etc.


> There's also a set of complications around maintaining this number
> accurately mentioned by Peter in [1].
>
> [1] https://www.postgresql.org/message-id/CAH2-WzkB-Pt3zPeTXvMik6jcJn%2BdcpUqO-tt_hc13bD6sGRLPg%40mail.g...

Maybe I'm confused, but what does that email have to do with determining an
accurate horizon? It argues that we can remove / freeze xmax in some
situations, but doesn't seem to talk about an accurate horizon, except in an
aside about not needing to take aborted xmins into account?



> From e591bf061ee673f3750d1180673e1ab48be43bb8 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 27 Jan 2026 16:53:11 -0500
> Subject: [PATCH v34 03/14] Move VM assert into prune/freeze code and simplify
>  returned values
> 
> After pruning and freezing, we do an assert-only validatation that the
> page's visibility status matches what we found during the pruning and
> freezing pass over the page.
> 
> There's no reason to wait until lazy_scan_prune() to do this validation,
> as all of the VM setting logic has already been moved to
> heap_page_prune_and_freeze().

It's a bit funny to say "wait until lazy_scan_prune()", when lazy_scan_prune()
is what calls heap_page_prune_and_freeze().  I'd just say that moving it to
heap_page_prune_and_freeze() avoids having to pass information around and
will, in the future, allow the check to be performed when freezing during
on-access.


> From a94267babeedec6705fd7f3b43242c6ba0e458c0 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 2 Dec 2025 16:16:22 -0500
> Subject: [PATCH v34 04/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
>  prune/freeze

> @@ -804,6 +809,62 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
>  	return do_freeze;
>  }
>  
> +/*
> + * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
> + * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
> + */
> +static TransactionId
> +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> +				 uint8 old_vmbits, uint8 new_vmbits,
> +				 TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> +				 TransactionId visibility_cutoff_xid)
> +{
> +	TransactionId conflict_xid;
> +
> +	/*
> +	 * We can omit the snapshot conflict horizon if we are not pruning or
> +	 * freezing any tuples and are setting an already all-visible page
> +	 * all-frozen in the VM.

Maybe mention when this can happen, because it's not immediately obvious.


>  In this case, all of the tuples on the page must
> +	 * already be seen as frozen by all MVCC snapshots on the standby.

Maybe + " (any conflict would have been handled in reaction to the WAL record
for freezing those tuples)" or such?



> +	 */
> +	if (!do_prune &&
> +		!do_freeze &&
> +		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
> +		(new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> +		return InvalidTransactionId;

Those != 0 check seem vestigial to me. Just checking for
& VISIBILITYMAP_ALL_VISIBLE is sufficient.


> +	/*
> +	 * The snapshot conflict horizon for the whole record should be the most
> +	 * conservative of all the horizons calculated for any of the possible
> +	 * modifications.  If this record will prune tuples, any transactions on
> +	 * the standby older than the youngest xmax of the most recently removed
> +	 * tuple this record will prune will conflict.

Why just xmax? You can have cases where xmin is a newer xid than xmax.


> If this record will freeze
> +	 * tuples, any transactions on the standby with xids older than the
> +	 * youngest tuple this record will freeze will conflict.

Transactions on the standby have no xid. Did you mean xmin?



> +	 * If we are setting the VM, the conflict horizon is almost always the
> +	 * visibility cutoff XID, except in the situation described above.
> +	 *
> +	 * By picking the newest of all of those, we can ensure that all changes
> +	 * in the record have been taken into account.
> +	 */

Comment seems better than before!


> 	if (do_set_vm)
> 		conflict_xid = visibility_cutoff_xid;
> 	else if (do_freeze)
> 		conflict_xid = frz_conflict_horizon;
> 	else
> 		conflict_xid = InvalidTransactionId;

Could it be worth checking that if (do_set_vm && do_freeze) the
frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?


> Subject: [PATCH v34 06/14] Remove XLOG_HEAP2_VISIBLE entirely

Did not again look at this, as it seems like it ought to not be
interesting... LMK if I should have.



> From a64707f1f2fa88d7292f7a2f2a760c613eea4950 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 17 Dec 2025 13:57:16 -0500
> Subject: [PATCH v34 07/14] Simplify heap_page_would_be_all_visible visibility
>  check
> 
> heap_page_would_be_all_visible() doesn't care about the distinction
> between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
> that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
> us to return false.
> 
> Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
> includes an extra step to distinguish between dead and recently dead
> tuples using OldestXmin. Replace it with the more minimal
> HeapTupleSatisfiesVacuumHorizon().
> 
> This has the added benefit of making it easier to replace uses of
> OldestXmin in heap_page_would_be_all_visible() in the future.

Seems reasonable. Can probably be pulled forward and just committed?



> From 85ab0d4eb681eaba4668ee23602d425c27f56d07 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 22 Dec 2025 10:46:45 -0500
> Subject: [PATCH v34 08/14] Remove table_scan_analyze_next_tuple unneeded
>  parameter OldestXmin
> 
> heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
> recently dead tuples when counting them, so it doesn't need OldestXmin.

This part seems quite obviously the right thing to do, it's quite obviously
just wasted effort right now.



> Looking at other table AMs implementing table_scan_analyze_next_tuple(),
> it appears most do not use OldestXmin either.

Does "most" mean some do?  Not sure removing a parameter that's unused by
heapam is worth breakage...



> From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 17 Dec 2025 16:51:05 -0500
> Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
>  level visibility
>
> [...]
>
> Because comparing a transaction ID against GlobalVisState is more
> expensive than comparing against a single XID, we defer this check until
> after scanning all tuples on the page.

Curious, is this a precaution or was this a measurable bottleneck?


>  	 * The visibility cutoff xid is the newest xmin of live, committed tuples
> -	 * older than OldestXmin on the page. This field is only kept up-to-date
> -	 * if the page is all-visible. As soon as a tuple is encountered that is
> -	 * not visible to all, this field is unmaintained. As long as it is
> -	 * maintained, it can be used to calculate the snapshot conflict horizon
> -	 * when updating the VM and/or freezing all the tuples on the page.
> +	 * on the page older than the visibility horizon represented in the
> +	 * GlobalVisState. This field is only kept up-to-date if the page is
> +	 * all-visible. It is invalid if there are any tuples on the page that are
> +	 * not visible to all. As long as it is maintained, it can be used to
> +	 * calculate the snapshot conflict horizon when updating the VM and/or
> +	 * freezing all the tuples on the page.
>  	 */
>  	prstate->visibility_cutoff_xid = InvalidTransactionId;
>  }
> @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	prune_freeze_plan(RelationGetRelid(params->relation),
>  					  buffer, &prstate, off_loc);
>  
> +	/*
> +	 * After processing all the live tuples on the page, if the newest xmin
> +	 * amongst them may be considered running by any snapshot, the page cannot
> +	 * be all-visible.
> +	 */
> +	if (prstate.all_visible &&
> +		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&

Any reason to test IsNormal rather than just IsValid()?  There should never be
a reason it's a valid but not "normal" xid, right?


> @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
>  				}
>  
>  				/*
> -				 * The inserter definitely committed.  But is it old enough
> -				 * that everyone sees it as committed?  A FrozenTransactionId
> -				 * is seen as committed to everyone.  Otherwise, we check if
> -				 * there is a snapshot that considers this xid to still be
> -				 * running, and if so, we don't consider the page all-visible.
> +				 * The inserter definitely committed. But we don't know if it
> +				 * is old enough that everyone sees it as committed. Later,
> +				 * after processing all the tuples on the page, we'll check if
> +				 * there is any snapshot that still considers the newest xid
> +				 * on the page to be running. If so, we don't consider the
> +				 * page all-visible.
>  				 */
>  				xmin = HeapTupleHeaderGetXmin(htup);
>  
> -				/*
> -				 * For now always use prstate->cutoffs for this test, because
> -				 * we only update 'all_visible' and 'all_frozen' when freezing
> -				 * is requested. We could use GlobalVisTestIsRemovableXid
> -				 * instead, if a non-freezing caller wanted to set the VM bit.
> -				 */
> -				Assert(prstate->cutoffs);
> -				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
> -				{
> -					prstate->all_visible = false;
> -					prstate->all_frozen = false;
> -					break;
> -				}
> -
>  				/* Track newest xmin on page. */
>  				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
>  					TransactionIdIsNormal(xmin))

Kinda wonder if this cod eshould be in something like
heap_prune_record_freezable() or such, rather than be inside
heap_prune_record_unchanged_lp_normal().


> Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
> 
> In the prune/freeze path, we currently delay clearing all_visible and
> all_frozen in the presence of dead items to allow opportunistic
> freezing.
> 
> However, if no freezing will be attempted, there’s no need to delay.
> Clearing the flags earlier avoids extra bookkeeping in
> heap_prune_record_unchanged_lp_normal(). This currently has no runtime
> effect because all callers that consider setting the VM also prepare
> freeze plans, but upcoming changes will allow on-access pruning to set
> the VM without freezing. The extra bookkeeping was noticeable in a
> profile of on-access VM setting.

What workload was that?


Theoretically, even if we don't freeze, the page still may be all-visible or
all frozen after the removal of dead items, no? Practically that won't happen,
because we don't remove dead items in any of the relevant paths, but from the
commit message and comments that's not entirely clear.



> Subject: [PATCH v34 11/14] Track which relations are modified by a query
> 
> Save the relids in a bitmap in the estate. A later commit will pass this
> information down to scan nodes to control whether or not the scan allows
> setting the visibility map while on-access pruning. We don't want to set
> the visibility map if the query is just going to modify the page
> immediately after.
> 
> Reviewed-by: Chao Li <[email protected]>

> diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
> index f8053d9e572..1e3cd73cf27 100644
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -678,6 +678,12 @@ typedef struct EState
>  									 * ExecDoInitialPruning() */
>  	const char *es_sourceText;	/* Source text from QueryDesc */
>  
> +	/*
> +	 * RT indexes of relations modified by the query through a
> +	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
> +	 */
> +	Bitmapset  *es_modified_relids;
> +

Other EState fields are initialized in CreateExecutorState, this isn't afaict?


Wonder if it's worth adding a crosscheck somewhere, verifying that if a
relation is modified, it's in es_modified_relids. Otherwise this could very
well silently get out of date.


Also, there's some overlap between the informtion collected this way, and
AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
modes via rte->rellockmode.



> From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 3 Dec 2025 15:12:18 -0500
> Subject: [PATCH v34 12/14] Pass down information on table modification to scan
>  node
> 
> Pass down information to sequential scan, index [only] scan, and bitmap
> table scan nodes on whether or not the query modifies the relation being
> scanned. A later commit will use this information to update the VM
> during on-access pruning only if the relation is not modified by the
> query.

Perhaps worth splitting up, so the addition of the 0 flag is separate from the
the read only hint aspect.


Unfortunately ran out of time for the last two patches.

Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
@ 2026-03-02 23:38 ` Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-02 23:38 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Feb 20, 2026 at 12:59 PM Andres Freund <[email protected]> wrote:
>
> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
>
> > I could see an argument for moving identify_and_fix_vm_corruption()
> > out of the helper and into heap_page_prune_and_freeze() but then we'd
> > have to move visibilitymap_get_status() out too. And that takes away a
> > lot of the benefit of encapsulating all that logic.
>
> I was wondering about that option. Relatedly, I also was wondering if we ought
> to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.

Attached v35 does this. I always pin the vmbuffer if we are going to
prune in heap_page_prune_opt(). In many cases, because it's saved in
the scan descriptor, it won't actually need to take a new pin. During
pruning, I check for VM corruption even if I am not considering
setting the VM.

> > Well, after this patch set, clearing the VM does happen before we emit
> > WAL for pruning.
>
> That I think is a substantial improvement, the current (i.e. before your
> series) placement really is pretty insane due to the guaranteed divergence it
> causes.
>
> I wonder if we actually should just force an FPI whenever we detect such
> corruption, that way it would reliably fixed on the standby as well.

Only problem is we would have to do an FPI of the VM page as well if
we wanted the corruption to be reliably fixed on the standby.

> > It wouldn't be hard to move the corruption fixups to the beginning of
> > heap_page_prune_and_freeze() in the new code structure.
>
> As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
> true?
>
> I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
> nlpdead_items > 0)" test should be moved to
> heap_prune_record_dead_or_unused(). That way the WARNING could include the
> offset number and it'd also work in the mark_unused_now case.
>
> Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
> DELETE_IN_PROGRESS?
>
> At that point the !page_all_visible && vm_all_visible part could indeed be
> moved to the start of heap_page_prune_and_freeze()

I've done all this. There is heap page/VM corruption check at the
beginning of heap_page_prune_and_freeze() and then checking for
corruption during pruning in the previously covered case (lpdead
items) as well as the mark_unused_now case, and
RECENTLY_DEAD/INSERT_IN_PROGRESS/DELETE_IN_PROGRESS.

> > Would it be worth it? What benefit would we get? Do you just feel that it
> > should logically come first?
>
> One insanity is that right now we will process all frozen pages over and over
> due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
> It'd be quite defensible to just skip processing the page once we determined
> it's already all frozen.  But for that we'd probably want to do the
> "page_all_visible && vm_all_visible" check before returning...

I've added a fast path to bypass pruning/freezing when the page is
already all-visible. And I check for pg_all_visible && vm_all_visible
beforehand. The one downside this has is if there is a page marked
all-frozen but has dead tuples on it, we'll never get to fix that
corruption nor clean up the dead tuples. But the fast path kind of
seems worth it to me.

> > > Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
> > > HEAP_PAGE_PRUNE_UPDATE_VM would be set?
> >
> > Yes, when setting the VM on-access, it is too expensive to call
> > heap_prepare_freeze_tuple() on each tuple. I could work on trying to
> > optimize it, but it isn't currently viable.
>
> Is it too expensive to do so even when we already decided to do some pruning?
> I am not surprised it's too expensive when there's not even a dead tuple on
> the page.  But I am mildly surprised if it's too expensive to do when we'd WAL
> log anyway?

It's not really possible in the current code structure to only call
heap_prepare_freeze_tuple() when there are at least some prunable
tuples. We go through the line pointers and record them as prunable at
the same time we call heap_prepare_freeze_tuple(), so we won't know
until we've examined all line pointers that there are no prunable
tuples, at which point we will have called heap_prepare_freeze_tuple()
for every tuple.

> > I think using all_frozen_except_dead while maintaining
> > visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
> > the potential to be confusing, though. We'd need to keep updating
> > visibility_cutoff_xid when all_visible is false but
> > all_frozen_except_dead is true as well as when all_visible is true.
> > And because we don't care about all_visible_except_dead, it gets even
> > more confusing to make sure we are maintaining the right variables in
> > the right situations.
>
> I suspect we should just track all of the horizons/cutoffs all the time. This
> whole stuff about optimizing out a few conditional assignments complicates the
> code substantially and feels extremely error prone to me.

I've done this in v35. I posted the freeze horizon tracking patch
separately in [1] but it is in v35 as 0004. Tracking the newest live
xid is in 0009. This also always tracks all_visible for all callers
since I unconditionally pass the vmbuffer now. I still don't set the
VM if the query is modifying the relation, though.

> I probably complained about this before, and it's not this patch's fault, but
> PruneState->{all_visible,all_frozen} are imo confusingly named, due to
> sounding like they describe the current state, rather than the possible state
> after pruning.  It's not helped by this comment:
>
>          * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
>          * That's convenient for heap_page_prune_and_freeze() to use them to
>          * decide whether to opportunistically freeze the page or not.  The
>          * all_visible and all_frozen values ultimately used to set the VM are
>          * adjusted to include LP_DEAD items after we determine whether or not to
>          * opportunistically freeze.
>
> "all-visible ... are adjusted to include LP_DEAD" ... - just reading that it's
> hard to know what it means.

0003 does the rename.

> The first thing to improve pruning performance that I would do is to introduce
> a fastpath for pages that a) area already frozen b) do not have dead items (if
> we're not freezing). Iterating through HOT chains is far from cheap, and if
> all rows are live, there's not really a point in doing so.  This is
> particulary important for VACUUMs where we end up freezing a ton of pages that
> are already frozen, due to the silly skip_pages_threshold thing.

0007 adds a fast path.

> > +static TransactionId
> > +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> > +                              uint8 old_vmbits, uint8 new_vmbits,
> > +                              TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> > +                              TransactionId visibility_cutoff_xid)
> > +{
> > +     TransactionId conflict_xid;
> > +
> > +     /*
> > +      * We can omit the snapshot conflict horizon if we are not pruning or
> > +      * freezing any tuples and are setting an already all-visible page
> > +      * all-frozen in the VM.
>
> Maybe mention when this can happen, because it's not immediately obvious.

I've added this to my TODO. I honestly can't think of a scenario where
it can happen. But I remember spending quite a bit of time thinking
about it on another occasion. The current code (in master) does
specifically account for this scenario, which is why I kept the logic,
but I'm not sure how it can happen.

I made all the other changes to specific comments you mentioned in
your mail but I won't bore you with itemization.

> >       if (do_set_vm)
> >               conflict_xid = visibility_cutoff_xid;
> >       else if (do_freeze)
> >               conflict_xid = frz_conflict_horizon;
> >       else
> >               conflict_xid = InvalidTransactionId;
>
> Could it be worth checking that if (do_set_vm && do_freeze) the
> frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?

Yes, as you mentioned off-list, this wasn't right. New code is like this

TransactionId conflict_xid = InvalidTransactionId;
...
    if (do_set_vm)
        conflict_xid = newest_live_xid;
    if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
        conflict_xid = newest_frozen_xid;

> > From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 17 Dec 2025 16:51:05 -0500
> > Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
> >  level visibility
> >
> > [...]
> >
> > Because comparing a transaction ID against GlobalVisState is more
> > expensive than comparing against a single XID, we defer this check until
> > after scanning all tuples on the page.
>
> Curious, is this a precaution or was this a measurable bottleneck?

I did see GlobalVisTestXidMaybeRunning() in a profile I did when it
was still called for every HEAPTUPLE_LIVE tuple in
heap_prune_record_unchanged_lp_normal(), but I don't have the profile
or test case around anymore.

However, since I now unconditionally maintain the newest_live_xid,
moving GlobalVisTestXidMaybeRunning() back into
heap_prune_record_unchanged_lp_normal() wouldn't help us avoid any
work. It would just make the values of prstate.set_all_visible and
prstate.set_all_frozen more accurate sooner. But I don't think it's
worth the extra function call since set_all_frozen and set_all_visible
won't be totally "done" until after we decide whether or not to
opportunistically freeze anyway.

> > @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >       prune_freeze_plan(RelationGetRelid(params->relation),
> >                                         buffer, &prstate, off_loc);
> >
> > +     /*
> > +      * After processing all the live tuples on the page, if the newest xmin
> > +      * amongst them may be considered running by any snapshot, the page cannot
> > +      * be all-visible.
> > +      */
> > +     if (prstate.all_visible &&
> > +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
>
> Any reason to test IsNormal rather than just IsValid()?  There should never be
> a reason it's a valid but not "normal" xid, right?

Well the reason I did this was that the existing code in master
tracking visibility_cutoff_xid only advances it if
TransactionIdIsNormal(). I'm a bit confused about it too because it
seems like we would still want to do it for bootstrap mode xids. But I
see PageSetPrunable() only allows normal xids.

> > @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
> >                               }
> >
> >                               /*
> > -                              * The inserter definitely committed.  But is it old enough
> > -                              * that everyone sees it as committed?  A FrozenTransactionId
> > -                              * is seen as committed to everyone.  Otherwise, we check if
> > -                              * there is a snapshot that considers this xid to still be
> > -                              * running, and if so, we don't consider the page all-visible.
> > +                              * The inserter definitely committed. But we don't know if it
> > +                              * is old enough that everyone sees it as committed. Later,
> > +                              * after processing all the tuples on the page, we'll check if
> > +                              * there is any snapshot that still considers the newest xid
> > +                              * on the page to be running. If so, we don't consider the
> > +                              * page all-visible.
> >                                */
> >                               xmin = HeapTupleHeaderGetXmin(htup);
> >
> > -                             /*
> > -                              * For now always use prstate->cutoffs for this test, because
> > -                              * we only update 'all_visible' and 'all_frozen' when freezing
> > -                              * is requested. We could use GlobalVisTestIsRemovableXid
> > -                              * instead, if a non-freezing caller wanted to set the VM bit.
> > -                              */
> > -                             Assert(prstate->cutoffs);
> > -                             if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
> > -                             {
> > -                                     prstate->all_visible = false;
> > -                                     prstate->all_frozen = false;
> > -                                     break;
> > -                             }
> > -
> >                               /* Track newest xmin on page. */
> >                               if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
> >                                       TransactionIdIsNormal(xmin))
>
> Kinda wonder if this cod eshould be in something like
> heap_prune_record_freezable() or such, rather than be inside
> heap_prune_record_unchanged_lp_normal().

I played around with it, but it all felt a bit awkward. I wrote it
down for a future enhancement idea.

> > Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
> >
> > In the prune/freeze path, we currently delay clearing all_visible and
> > all_frozen in the presence of dead items to allow opportunistic
> > freezing.
> >
> > However, if no freezing will be attempted, there’s no need to delay.
> > Clearing the flags earlier avoids extra bookkeeping in
> > heap_prune_record_unchanged_lp_normal(). This currently has no runtime
> > effect because all callers that consider setting the VM also prepare
> > freeze plans, but upcoming changes will allow on-access pruning to set
> > the VM without freezing. The extra bookkeeping was noticeable in a
> > profile of on-access VM setting.
>
> What workload was that?

It was a select * offset all query with a few fat tuples on each page
and none of them prunable. I'm planning on digging up the
case/creating a new one to see if it is reproducible. This was with an
older version of the code that had more conditionals as well. This
commit is actually dropped in v35 because I now always keep
newest_live_xid up-to-date (0009) which means unsetting
set_all_visible sooner has no benefit.

> Theoretically, even if we don't freeze, the page still may be all-visible or
> all frozen after the removal of dead items, no? Practically that won't happen,
> because we don't remove dead items in any of the relevant paths, but from the
> commit message and comments that's not entirely clear.

Yea, it's clearer with the commit dropped.

> > @@ -678,6 +678,12 @@ typedef struct EState
> >                                                                        * ExecDoInitialPruning() */
> >       const char *es_sourceText;      /* Source text from QueryDesc */
> >
> > +     /*
> > +      * RT indexes of relations modified by the query through a
> > +      * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
> > +      */
> > +     Bitmapset  *es_modified_relids;
> > +
>
> Other EState fields are initialized in CreateExecutorState, this isn't afaict?

Oops, yes. I based it on es_unpruned_relids which wasn't initialized
there either. I've added a commit (0013) to initialize a few EState
fields that weren't initialized in CreateExecutorState() as well.

> Wonder if it's worth adding a crosscheck somewhere, verifying that if a
> relation is modified, it's in es_modified_relids. Otherwise this could very
> well silently get out of date.

Done in v35 (0014).

> Also, there's some overlap between the informtion collected this way, and
> AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
> modes via rte->rellockmode.

Those are in parser/planner, so it doesn't seem like a good fit. I
populate es_modified_relids in the executor.

I don't know exactly what the overlap would be between RTEs with an
exclusive rellockmode and es_modified_relids. It seems like you could
have RTEs which don't end up getting modified that have a lock level
that would have made you think that they would be modified.

But were you imagining a substitution or a cross-check?

> > From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 3 Dec 2025 15:12:18 -0500
> > Subject: [PATCH v34 12/14] Pass down information on table modification to scan
> >  node
>
> Perhaps worth splitting up, so the addition of the 0 flag is separate from the
> the read only hint aspect.

Done.

[1] https://www.postgresql.org/message-id/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gma...


Attachments:

  [text/x-patch] v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch (16.4K, 2-v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch)
  download | inline diff:
From 7526e2a0e7d1a013cb9f4d95dff8a4feabd7035b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 26 Feb 2026 10:09:55 -0500
Subject: [PATCH v35 01/18] Move commonly used context into PruneState and
 simplify helpers

heap_page_prune_and_freeze() and many of its helpers use the heap
buffer, block number, and page. Other helpers took the heap page and
didn't use it. Initializing these values once during
prune_freeze_setup() simplifies the helpers' interfaces and avoids any
repeated calls to BufferGetBlockNumber() and BufferGetPage().

While updating PruneState, also reorganize its fields to make layout and
documentation more consistent
---
 src/backend/access/heap/pruneheap.c | 136 +++++++++++++++-------------
 1 file changed, 72 insertions(+), 64 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 632c2427952..3c5d33834fc 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,16 @@ typedef struct
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
 	struct VacuumCutoffs *cutoffs;
+	Relation	relation;
+
+	/*
+	 * Keep the buffer, block, and page handy so that helpers needing to
+	 * access them don't need to make repeated calls to BufferGetBlockNumber()
+	 * and BufferGetPage().
+	 */
+	BlockNumber block;
+	Buffer		buffer;
+	Page		page;
 
 	/*-------------------------------------------------------
 	 * Fields describing what to do to the page
@@ -98,11 +108,19 @@ typedef struct
 	 */
 	int8		htsv[MaxHeapTuplesPerPage + 1];
 
-	/*
-	 * Freezing-related state.
+	/*-------------------------------------------------------
+	 * Working state for freezing
+	 *-------------------------------------------------------
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*
+	 * The snapshot conflict horizon used when freezing tuples. The final
+	 * snapshot conflict horizon for the record may be newer if pruning
+	 * removes newer transaction IDs.
+	 */
+	TransactionId frz_conflict_horizon;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -129,13 +147,6 @@ typedef struct
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
 
-	/*
-	 * The snapshot conflict horizon used when freezing tuples. The final
-	 * snapshot conflict horizon for the record may be newer if pruning
-	 * removes newer transaction IDs.
-	 */
-	TransactionId frz_conflict_horizon;
-
 	/*
 	 * all_visible and all_frozen indicate if the all-visible and all-frozen
 	 * bits in the visibility map can be set for this page after pruning.
@@ -162,14 +173,12 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
-static void prune_freeze_plan(Oid reloid, Buffer buffer,
-							  PruneState *prstate,
+static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
-											   HeapTuple tup,
-											   Buffer buffer);
+											   HeapTuple tup);
 static inline HTSV_Result htsv_get_valid_status(int status);
-static void heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
+static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
 static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
@@ -181,15 +190,14 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
 											 bool was_normal);
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_normal);
 
-static void heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum);
-static void heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_unused(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum);
+static void heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum);
 static void heap_prune_record_unchanged_lp_redirect(PruneState *prstate, OffsetNumber offnum);
 
 static void page_verify_redirects(Page page);
 
-static bool heap_page_will_freeze(Relation relation, Buffer buffer,
-								  bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
+static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
 
 
@@ -342,6 +350,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
 	prstate->cutoffs = params->cutoffs;
+	prstate->relation = params->relation;
+	prstate->block = BufferGetBlockNumber(params->buffer);
+	prstate->buffer = params->buffer;
+	prstate->page = BufferGetPage(params->buffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -455,16 +467,15 @@ prune_freeze_setup(PruneFreezeParams *params,
  * *off_loc is used for error callback and cleared before returning.
  */
 static void
-prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
-				  OffsetNumber *off_loc)
+prune_freeze_plan(PruneState *prstate, OffsetNumber *off_loc)
 {
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	Page		page = prstate->page;
+	BlockNumber blockno = prstate->block;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	OffsetNumber offnum;
 	HeapTupleData tup;
 
-	tup.t_tableOid = reloid;
+	tup.t_tableOid = RelationGetRelid(prstate->relation);
 
 	/*
 	 * Determine HTSV for all tuples, and queue them up for processing as HOT
@@ -505,7 +516,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		/* Nothing to do if slot doesn't contain a tuple */
 		if (!ItemIdIsUsed(itemid))
 		{
-			heap_prune_record_unchanged_lp_unused(page, prstate, offnum);
+			heap_prune_record_unchanged_lp_unused(prstate, offnum);
 			continue;
 		}
 
@@ -518,7 +529,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 			if (unlikely(prstate->mark_unused_now))
 				heap_prune_record_unused(prstate, offnum, false);
 			else
-				heap_prune_record_unchanged_lp_dead(page, prstate, offnum);
+				heap_prune_record_unchanged_lp_dead(prstate, offnum);
 			continue;
 		}
 
@@ -539,8 +550,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		tup.t_len = ItemIdGetLength(itemid);
 		ItemPointerSet(&tup.t_self, blockno, offnum);
 
-		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup,
-															buffer);
+		prstate->htsv[offnum] = heap_prune_satisfies_vacuum(prstate, &tup);
 
 		if (!HeapTupleHeaderIsHeapOnly(htup))
 			prstate->root_items[prstate->nroot_items++] = offnum;
@@ -571,7 +581,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 		*off_loc = offnum;
 
 		/* Process this item or chain of items */
-		heap_prune_chain(page, blockno, maxoff, offnum, prstate);
+		heap_prune_chain(maxoff, offnum, prstate);
 	}
 
 	/*
@@ -627,7 +637,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 			}
 		}
 		else
-			heap_prune_record_unchanged_lp_normal(page, prstate, offnum);
+			heap_prune_record_unchanged_lp_normal(prstate, offnum);
 	}
 
 	/* We should now have processed every tuple exactly once  */
@@ -648,7 +658,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
 
 /*
  * Decide whether to proceed with freezing according to the freeze plans
- * prepared for the given heap buffer. If freezing is chosen, this function
+ * prepared for the current heap buffer. If freezing is chosen, this function
  * performs several pre-freeze checks.
  *
  * The values of do_prune, do_hint_prune, and did_tuple_hint_fpi must be
@@ -660,8 +670,7 @@ prune_freeze_plan(Oid reloid, Buffer buffer, PruneState *prstate,
  * page, and false otherwise.
  */
 static bool
-heap_page_will_freeze(Relation relation, Buffer buffer,
-					  bool did_tuple_hint_fpi,
+heap_page_will_freeze(bool did_tuple_hint_fpi,
 					  bool do_prune,
 					  bool do_hint_prune,
 					  PruneState *prstate)
@@ -709,18 +718,19 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 			 * Freezing would make the page all-frozen.  Have already emitted
 			 * an FPI or will do so anyway?
 			 */
-			if (RelationNeedsWAL(relation))
+			if (RelationNeedsWAL(prstate->relation))
 			{
 				if (did_tuple_hint_fpi)
 					do_freeze = true;
 				else if (do_prune)
 				{
-					if (XLogCheckBufferNeedsBackup(buffer))
+					if (XLogCheckBufferNeedsBackup(prstate->buffer))
 						do_freeze = true;
 				}
 				else if (do_hint_prune)
 				{
-					if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+					if (XLogHintBitIsNeeded() &&
+						XLogCheckBufferNeedsBackup(prstate->buffer))
 						do_freeze = true;
 				}
 			}
@@ -733,7 +743,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
 		 * Validate the tuples we will be freezing before entering the
 		 * critical section.
 		 */
-		heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
 
 		/*
 		 * Calculate what the snapshot conflict horizon should be for a record
@@ -822,8 +832,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 						   TransactionId *new_relfrozen_xid,
 						   MultiXactId *new_relmin_mxid)
 {
-	Buffer		buffer = params->buffer;
-	Page		page = BufferGetPage(buffer);
 	PruneState	prstate;
 	bool		do_freeze;
 	bool		do_prune;
@@ -842,8 +850,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * Prepare queue of state changes to later be executed in a critical
 	 * section.
 	 */
-	prune_freeze_plan(RelationGetRelid(params->relation),
-					  buffer, &prstate, off_loc);
+	prune_freeze_plan(&prstate, off_loc);
 
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
@@ -861,15 +868,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
-		PageIsFull(page);
+	do_hint_prune = ((PageHeader) prstate.page)->pd_prune_xid != prstate.new_prune_xid ||
+		PageIsFull(prstate.page);
 
 	/*
 	 * Decide if we want to go ahead with freezing according to the freeze
 	 * plans we prepared, or not.
 	 */
-	do_freeze = heap_page_will_freeze(params->relation, buffer,
-									  did_tuple_hint_fpi,
+	do_freeze = heap_page_will_freeze(did_tuple_hint_fpi,
 									  do_prune,
 									  do_hint_prune,
 									  &prstate);
@@ -901,14 +907,14 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * Update the page's pd_prune_xid field to either zero, or the lowest
 		 * XID of any soon-prunable tuple.
 		 */
-		((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
+		((PageHeader) prstate.page)->pd_prune_xid = prstate.new_prune_xid;
 
 		/*
 		 * Also clear the "page is full" flag, since there's no point in
 		 * repeating the prune/defrag process until something else happens to
 		 * the page.
 		 */
-		PageClearFull(page);
+		PageClearFull(prstate.page);
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
@@ -916,7 +922,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 * the buffer dirty below.
 		 */
 		if (!do_freeze && !do_prune)
-			MarkBufferDirtyHint(buffer, true);
+			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
 	if (do_prune || do_freeze)
@@ -924,21 +930,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
 		{
-			heap_page_prune_execute(buffer, false,
+			heap_page_prune_execute(prstate.buffer, false,
 									prstate.redirected, prstate.nredirected,
 									prstate.nowdead, prstate.ndead,
 									prstate.nowunused, prstate.nunused);
 		}
 
 		if (do_freeze)
-			heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
-		MarkBufferDirty(buffer);
+		MarkBufferDirty(prstate.buffer);
 
 		/*
 		 * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
 		 */
-		if (RelationNeedsWAL(params->relation))
+		if (RelationNeedsWAL(prstate.relation))
 		{
 			/*
 			 * The snapshotConflictHorizon for the whole record should be the
@@ -958,7 +964,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
-			log_heap_prune_and_freeze(params->relation, buffer,
+			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
 									  InvalidBuffer,	/* vmbuffer */
 									  0,	/* vmflags */
 									  conflict_xid,
@@ -1018,12 +1024,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
  * Perform visibility checks for heap pruning.
  */
 static HTSV_Result
-heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
+heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 {
 	HTSV_Result res;
 	TransactionId dead_after;
 
-	res = HeapTupleSatisfiesVacuumHorizon(tup, buffer, &dead_after);
+	res = HeapTupleSatisfiesVacuumHorizon(tup, prstate->buffer, &dead_after);
 
 	if (res != HEAPTUPLE_RECENTLY_DEAD)
 		return res;
@@ -1100,13 +1106,14 @@ htsv_get_valid_status(int status)
  * based on that outcome.
  */
 static void
-heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
-				 OffsetNumber rootoffnum, PruneState *prstate)
+heap_prune_chain(OffsetNumber maxoff, OffsetNumber rootoffnum,
+				 PruneState *prstate)
 {
 	TransactionId priorXmax = InvalidTransactionId;
 	ItemId		rootlp;
 	OffsetNumber offnum;
 	OffsetNumber chainitems[MaxHeapTuplesPerPage];
+	Page		page = prstate->page;
 
 	/*
 	 * After traversing the HOT chain, ndeadchain is the index in chainitems
@@ -1235,7 +1242,7 @@ heap_prune_chain(Page page, BlockNumber blockno, OffsetNumber maxoff,
 		/*
 		 * Advance to next chain member.
 		 */
-		Assert(ItemPointerGetBlockNumber(&htup->t_ctid) == blockno);
+		Assert(ItemPointerGetBlockNumber(&htup->t_ctid) == prstate->block);
 		offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
 		priorXmax = HeapTupleHeaderGetUpdateXid(htup);
 	}
@@ -1270,7 +1277,7 @@ process_chain:
 			i++;
 		}
 		for (; i < nchain; i++)
-			heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
+			heap_prune_record_unchanged_lp_normal(prstate, chainitems[i]);
 	}
 	else if (ndeadchain == nchain)
 	{
@@ -1296,7 +1303,7 @@ process_chain:
 
 		/* the rest of tuples in the chain are normal, unchanged tuples */
 		for (int i = ndeadchain; i < nchain; i++)
-			heap_prune_record_unchanged_lp_normal(page, prstate, chainitems[i]);
+			heap_prune_record_unchanged_lp_normal(prstate, chainitems[i]);
 	}
 }
 
@@ -1421,7 +1428,7 @@ heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum, bool was_norm
  * Record an unused line pointer that is left unchanged.
  */
 static void
-heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_unused(PruneState *prstate, OffsetNumber offnum)
 {
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
@@ -1432,9 +1439,10 @@ heap_prune_record_unchanged_lp_unused(Page page, PruneState *prstate, OffsetNumb
  * update bookkeeping of tuple counts and page visibility.
  */
 static void
-heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
@@ -1615,7 +1623,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
  * Record line pointer that was already LP_DEAD and is left unchanged.
  */
 static void
-heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber offnum)
+heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 {
 	Assert(!prstate->processed[offnum]);
 	prstate->processed[offnum] = true;
-- 
2.43.0



  [text/x-patch] v35-0002-Add-PageGetPruneXid-helper.patch (1.9K, 3-v35-0002-Add-PageGetPruneXid-helper.patch)
  download | inline diff:
From aad49496321243eaab94d288da021c537b96f652 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 14:09:11 -0500
Subject: [PATCH v35 02/18] Add PageGetPruneXid helper

This is inline with other page header accessors. It improves readability
and avoids long lines.
---
 src/backend/access/heap/pruneheap.c | 4 ++--
 src/include/storage/bufpage.h       | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3c5d33834fc..1d61b336193 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -234,7 +234,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 * determining the appropriate horizon is a waste if there's no prune_xid
 	 * (i.e. no updates/deletes left potentially dead tuples around).
 	 */
-	prune_xid = ((PageHeader) page)->pd_prune_xid;
+	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
 		return;
 
@@ -868,7 +868,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * pd_prune_xid field or the page was marked full, we will update the hint
 	 * bit.
 	 */
-	do_hint_prune = ((PageHeader) prstate.page)->pd_prune_xid != prstate.new_prune_xid ||
+	do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
 		PageIsFull(prstate.page);
 
 	/*
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index ae3725b3b81..92a6bb9b0c0 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -441,6 +441,12 @@ PageClearAllVisible(Page page)
 	((PageHeader) page)->pd_flags &= ~PD_ALL_VISIBLE;
 }
 
+static inline TransactionId
+PageGetPruneXid(const PageData *page)
+{
+	return ((const PageHeaderData *) page)->pd_prune_xid;
+}
+
 /*
  * These two require "access/transam.h", so left as macros.
  */
-- 
2.43.0



  [text/x-patch] v35-0003-Rename-PruneState-all_visible-all_frozen.patch (13.7K, 4-v35-0003-Rename-PruneState-all_visible-all_frozen.patch)
  download | inline diff:
From 7038ae8d57ff2d5f63c2a306e34703a4b54c047a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 15:59:04 -0500
Subject: [PATCH v35 03/18] Rename PruneState->all_visible/all_frozen

to set_all_visible and set_all_frozen to clarify that this is the
proposed state of the all-visible and all-frozen bits for a heap page in
the visibility map, not the current state.

Author: Melanie Plageman <[email protected]>
Suggested-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c | 144 ++++++++++++++--------------
 1 file changed, 74 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 1d61b336193..fa5aa2a63f2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -148,22 +148,24 @@ typedef struct
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page after pruning.
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
 	 *
 	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
 	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and all_frozen is
+	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
 	 * true.
 	 *
-	 * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
-	 * That's convenient for heap_page_prune_and_freeze() to use them to
-	 * decide whether to freeze the page or not.  The all_visible and
-	 * all_frozen values returned to the caller are adjusted to include
-	 * LP_DEAD items after we determine whether to opportunistically freeze.
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to freeze the page or not.  The
+	 * set_all_visible and set_all_frozen values returned to the caller are
+	 * adjusted to include LP_DEAD items after we determine whether to
+	 * opportunistically freeze.
 	 */
-	bool		all_visible;
-	bool		all_frozen;
+	bool		set_all_visible;
+	bool		set_all_frozen;
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
@@ -419,22 +421,22 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * setting the VM bits.
 	 *
 	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'all_visible' and 'all_frozen' for our own decision-making. If
-	 * the whole page would become frozen, we consider opportunistically
-	 * freezing tuples.  We will not be able to freeze the whole page if there
-	 * are tuples present that are not visible to everyone or if there are
-	 * dead tuples which are not yet removable.  However, dead tuples which
-	 * will be removed by the end of vacuuming should not preclude us from
-	 * opportunistically freezing.  Because of that, we do not immediately
-	 * clear all_visible and all_frozen when we see LP_DEAD items.  We fix
-	 * that after scanning the line pointers. We must correct all_visible and
-	 * all_frozen before we return them to the caller, so that the caller
-	 * doesn't set the VM bits incorrectly.
+	 * also use 'set_all_visible' and 'set_all_frozen' for our own
+	 * decision-making. If the whole page would become frozen, we consider
+	 * opportunistically freezing tuples.  We will not be able to freeze the
+	 * whole page if there are tuples present that are not visible to everyone
+	 * or if there are dead tuples which are not yet removable.  However, dead
+	 * tuples which will be removed by the end of vacuuming should not
+	 * preclude us from opportunistically freezing.  Because of that, we do
+	 * not immediately clear set_all_visible and set_all_frozen when we see
+	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
+	 * correct set_all_visible and set_all_frozen before we return them to the
+	 * caller, so that the caller doesn't set the VM bits incorrectly.
 	 */
 	if (prstate->attempt_freeze)
 	{
-		prstate->all_visible = true;
-		prstate->all_frozen = true;
+		prstate->set_all_visible = true;
+		prstate->set_all_frozen = true;
 	}
 	else
 	{
@@ -442,8 +444,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 		 * Initializing to false allows skipping the work to update them in
 		 * heap_prune_record_unchanged_lp_normal().
 		 */
-		prstate->all_visible = false;
-		prstate->all_frozen = false;
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
 	}
 
 	/*
@@ -683,8 +685,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	 */
 	if (!prstate->attempt_freeze)
 	{
-		Assert(!prstate->all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
+		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -710,9 +712,9 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * anymore.  The opportunistic freeze heuristic must be improved;
 		 * however, for now, try to approximate the old logic.
 		 */
-		if (prstate->all_frozen && prstate->nfrozen > 0)
+		if (prstate->set_all_frozen && prstate->nfrozen > 0)
 		{
-			Assert(prstate->all_visible);
+			Assert(prstate->set_all_visible);
 
 			/*
 			 * Freezing would make the page all-frozen.  Have already emitted
@@ -752,7 +754,7 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * in the VM once we're done with it. Otherwise, we generate a
 		 * conservative cutoff by stepping back from OldestXmin.
 		 */
-		if (prstate->all_frozen)
+		if (prstate->set_all_frozen)
 			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
 		else
 		{
@@ -769,7 +771,7 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 */
 		Assert(!prstate->pagefrz.freeze_required);
 
-		prstate->all_frozen = false;
+		prstate->set_all_frozen = false;
 		prstate->nfrozen = 0;	/* avoid miscounts in instrumentation */
 	}
 	else
@@ -804,11 +806,12 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
  * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set.  They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
+ * presult->set_all_visible and presult->set_all_frozen after determining
+ * whether or not to opportunistically freeze, to indicate if the VM bits can
+ * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed, because at the moment only callers that also freeze
+ * need that information.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -882,21 +885,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	/*
 	 * While scanning the line pointers, we did not clear
-	 * all_visible/all_frozen when encountering LP_DEAD items because we
-	 * wanted the decision whether or not to freeze the page to be unaffected
-	 * by the short-term presence of LP_DEAD items.  These LP_DEAD items are
-	 * effectively assumed to be LP_UNUSED items in the making.  It doesn't
-	 * matter which vacuum heap pass (initial pass or final pass) ends up
-	 * setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * set_all_visible/set_all_frozen when encountering LP_DEAD items because
+	 * we wanted the decision whether or not to freeze the page to be
+	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
+	 * items are effectively assumed to be LP_UNUSED items in the making.  It
+	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
+	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
 	 *
 	 * Now that we finished determining whether or not to freeze the page,
-	 * update all_visible and all_frozen so that they reflect the true state
-	 * of the page for setting PD_ALL_VISIBLE and VM bits.
+	 * update set_all_visible and set_all_frozen so that they reflect the true
+	 * state of the page for setting PD_ALL_VISIBLE and VM bits.
 	 */
 	if (prstate.lpdead_items > 0)
-		prstate.all_visible = prstate.all_frozen = false;
+		prstate.set_all_visible = prstate.set_all_frozen = false;
 
-	Assert(!prstate.all_frozen || prstate.all_visible);
+	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -984,8 +987,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.all_visible;
-	presult->all_frozen = prstate.all_frozen;
+	presult->all_visible = prstate.set_all_visible;
+	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
 
 	/*
@@ -1365,9 +1368,9 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
 	prstate->ndead++;
 
 	/*
-	 * Deliberately delay unsetting all_visible and all_frozen until later
-	 * during pruning. Removable dead tuples shouldn't preclude freezing the
-	 * page.
+	 * Deliberately delay unsetting set_all_visible and set_all_frozen until
+	 * later during pruning. Removable dead tuples shouldn't preclude freezing
+	 * the page.
 	 */
 
 	/* Record the dead offset for vacuum */
@@ -1489,14 +1492,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->all_visible)
+			if (prstate->set_all_visible)
 			{
 				TransactionId xmin;
 
 				if (!HeapTupleHeaderXminCommitted(htup))
 				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
+					prstate->set_all_visible = false;
+					prstate->set_all_frozen = false;
 					break;
 				}
 
@@ -1511,15 +1514,16 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 
 				/*
 				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'all_visible' and 'all_frozen' when freezing
-				 * is requested. We could use GlobalVisTestIsRemovableXid
-				 * instead, if a non-freezing caller wanted to set the VM bit.
+				 * we only update 'set_all_visible' and 'set_all_frozen' when
+				 * freezing is requested. We could use
+				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
+				 * caller wanted to set the VM bit.
 				 */
 				Assert(prstate->cutoffs);
 				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
 				{
-					prstate->all_visible = false;
-					prstate->all_frozen = false;
+					prstate->set_all_visible = false;
+					prstate->set_all_frozen = false;
 					break;
 				}
 
@@ -1532,8 +1536,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 
 		case HEAPTUPLE_RECENTLY_DEAD:
 			prstate->recently_dead_tuples++;
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * This tuple will soon become DEAD.  Update the hint field so
@@ -1552,8 +1556,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * assumption is a bit shaky, but it is what acquire_sample_rows()
 			 * does, so be consistent.
 			 */
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
@@ -1571,8 +1575,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * will commit and update the counters after we report.
 			 */
 			prstate->live_tuples++;
-			prstate->all_visible = false;
-			prstate->all_frozen = false;
+			prstate->set_all_visible = false;
+			prstate->set_all_frozen = false;
 
 			/*
 			 * This tuple may soon become DEAD.  Update the hint field so that
@@ -1614,7 +1618,7 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 		 * definitely cannot be set all-frozen in the visibility map later on.
 		 */
 		if (!totally_frozen)
-			prstate->all_frozen = false;
+			prstate->set_all_frozen = false;
 	}
 }
 
@@ -1637,10 +1641,10 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 	 * hastup/nonempty_pages as provisional no matter how LP_DEAD items are
 	 * handled (handled here, or handled later on).
 	 *
-	 * Similarly, don't unset all_visible and all_frozen until later, at the
-	 * end of heap_page_prune_and_freeze().  This will allow us to attempt to
-	 * freeze the page after pruning.  As long as we unset it before updating
-	 * the visibility map, this will be correct.
+	 * Similarly, don't unset set_all_visible and set_all_frozen until later,
+	 * at the end of heap_page_prune_and_freeze().  This will allow us to
+	 * attempt to freeze the page after pruning.  As long as we unset it
+	 * before updating the visibility map, this will be correct.
 	 */
 
 	/* Record the dead offset for vacuum */
-- 
2.43.0



  [text/x-patch] v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch (8.0K, 5-v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch)
  download | inline diff:
From a3b91ab430e7af8b459c169181c1dc3f0f04c8bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 13:55:45 -0500
Subject: [PATCH v35 04/18] Use the newest to-be-frozen xid as the conflict
 horizon for freezing

Previously WAL records that froze tuples used OldestXmin as the snapshot
conflict horizon. However, OldestXmin is newer than the newest frozen
tuple's xid. By tracking the newest to-be-frozen xid and using it as the
snapshot conflict horizon instead, we end up with an older horizon that
will result in fewer query cancellations on the standby.
---
 src/backend/access/heap/heapam.c    | 16 +++++++++++
 src/backend/access/heap/pruneheap.c | 44 ++++++++---------------------
 src/include/access/heapam.h         |  8 ++++++
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a231563f0df..76f94fdfa5b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6781,6 +6781,10 @@ heap_inplace_unlock(Relation relation,
  * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we
  * have already forced page-level freezing, since that might incur the same
  * SLRU buffer misses that we specifically intended to avoid by freezing.
+ *
+ * We won't update the FreezePageConflictXid because any lockers don't affect
+ * visibility on the standby, and we don't have to worry about the update XID
+ * because the only way it can be older than OldestXmin is if it aborted.
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
@@ -7173,7 +7177,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 		/* Verify that xmin committed if and when freeze plan is executed */
 		if (freeze_xmin)
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMIN_COMMITTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 
 	/*
@@ -7192,6 +7200,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 */
 		replace_xvac = pagefrz->freeze_required = true;
 
+		if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+			pagefrz->FreezePageConflictXid = xid;
+
 		/* Will set replace_xvac flags in freeze plan below */
 	}
 
@@ -7316,7 +7327,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 * independent of this, since the lock is released at xact end.)
 		 */
 		if (freeze_xmax && !HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMAX_ABORTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 	else if (!TransactionIdIsValid(xid))
 	{
@@ -7499,6 +7514,7 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	cutoffs.MultiXactCutoff = MultiXactCutoff;
 
 	pagefrz.freeze_required = true;
+	pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	pagefrz.FreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.FreezePageRelminMxid = MultiXactCutoff;
 	pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa5aa2a63f2..07868dbcc17 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -114,13 +114,6 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
-	/*
-	 * The snapshot conflict horizon used when freezing tuples. The final
-	 * snapshot conflict horizon for the record may be newer if pruning
-	 * removes newer transaction IDs.
-	 */
-	TransactionId frz_conflict_horizon;
-
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -377,6 +370,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/* initialize page freezing working state */
 	prstate->pagefrz.freeze_required = false;
+	prstate->pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
@@ -407,7 +401,6 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * PruneState.
 	 */
 	prstate->deadoffsets = presult->deadoffsets;
-	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * Vacuum may update the VM after we're done.  We can keep track of
@@ -746,22 +739,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
+											 prstate->cutoffs->OldestXmin));
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -886,11 +865,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	/*
 	 * While scanning the line pointers, we did not clear
 	 * set_all_visible/set_all_frozen when encountering LP_DEAD items because
-	 * we wanted the decision whether or not to freeze the page to be
-	 * unaffected by the short-term presence of LP_DEAD items.  These LP_DEAD
-	 * items are effectively assumed to be LP_UNUSED items in the making.  It
-	 * doesn't matter which vacuum heap pass (initial pass or final pass) ends
-	 * up setting the page all-frozen, as long as the ongoing VACUUM does it.
+	 * we wanted the decision whether or not to opportunistically freeze the
+	 * page to be unaffected by the short-term presence of LP_DEAD items.
+	 * These LP_DEAD items are effectively assumed to be LP_UNUSED items in
+	 * the making. It doesn't matter which vacuum heap pass (initial pass or
+	 * final pass) ends up setting the page all-frozen, as long as the ongoing
+	 * VACUUM does it.
 	 *
 	 * Now that we finished determining whether or not to freeze the page,
 	 * update set_all_visible and set_all_frozen so that they reflect the true
@@ -953,7 +933,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * The snapshotConflictHorizon for the whole record should be the
 			 * most conservative of all the horizons calculated for any of the
 			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
+			 * transactions on the standby older than the youngest xid of the
 			 * most recently removed tuple this record will prune will
 			 * conflict.  If this record will freeze tuples, any transactions
 			 * on the standby with xids older than the youngest tuple this
@@ -961,9 +941,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			TransactionId conflict_xid;
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
+			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
 									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
+				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..fae79b37f0d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -208,6 +208,14 @@ typedef struct HeapPageFreeze
 	TransactionId FreezePageRelfrozenXid;
 	MultiXactId FreezePageRelminMxid;
 
+	/*
+	 * The youngest XID that will be frozen or removed during freezing. It is
+	 * used to calculate the snapshot conflict horizon for a WAL record
+	 * freezing tuples. Because it is only used if we do end up freezing
+	 * tuples, there is no need for a "no freeze" version.
+	 */
+	TransactionId FreezePageConflictXid;
+
 	/*
 	 * "No freeze" NewRelfrozenXid/NewRelminMxid trackers.
 	 *
-- 
2.43.0



  [text/x-patch] v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (5.9K, 6-v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 09b9cc477d8d9b689888566b9d4dced5eefea208 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v35 05/18] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  2 +-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 76f94fdfa5b..e19209f180d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..47624194f93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 07868dbcc17..5ce3e54a036 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -209,7 +209,7 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fae79b37f0d..4e2e71be558 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -417,7 +429,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v35-0006-Fix-visibility-map-corruption-in-more-cases.patch (18.3K, 7-v35-0006-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c6a1fa5c8319779b800f903e24d3f239e16c1cc1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v35 06/18] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.
---
 src/backend/access/heap/pruneheap.c  | 174 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 100 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5ce3e54a036..fa470f663b7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -207,6 +224,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not yet pinned and pruning is performed, vmbuffer will be
+ * pinned. If we find VM corruption during pruning, we will fix it.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -273,6 +293,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -280,14 +310,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -350,6 +373,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -766,6 +795,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
+ * page, but it does not need to be done in a critical section because
+ * clearing the VM is not WAL-logged.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("LP_DEAD item found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear.  However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck with buffer lock before concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
+						prstate->block,
+						RelationGetRelationName(prstate->relation))));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -826,6 +939,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -970,6 +1087,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->all_visible = prstate.set_all_visible;
 	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1292,7 +1410,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1302,6 +1421,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1385,6 +1511,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1524,7 +1659,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1539,6 +1675,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1563,7 +1703,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1629,6 +1770,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5b6f2441f6b..0a0aa8e5a9e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,11 +424,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1963,81 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2069,6 +1989,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2178,18 +2099,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.all_visible || !(*has_lpdead_items));
 	Assert(!presult.all_frozen || presult.all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e2e71be558..9db92c7db8a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -258,6 +258,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -319,6 +325,12 @@ typedef struct PruneFreezeResult
 	bool		all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.3K, 8-v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 7e8ea684a4c6ee5d4b7169ec3195be75e76172e9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v35 07/18] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we can exit early.
---
 src/backend/access/heap/pruneheap.c | 73 +++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fa470f663b7..73db45f8dfd 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -880,6 +881,66 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	presult->vmbits = prstate->vmbits;
+
+	if (!PageIsEmpty(page))
+		presult->hastup = true;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -943,6 +1004,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 9-v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From f271209e3feb75f79e94b83c3d564e5d14d1b9bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v35 08/18] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..aee88947393 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1060,6 +1060,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 73db45f8dfd..7b72804a3e5 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1024,6 +1024,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1692,29 +1703,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 0a0aa8e5a9e..6c7807d5bd3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -460,13 +460,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2053,13 +2053,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2815,7 +2812,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3576,14 +3573,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3604,7 +3601,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3623,7 +3620,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3704,7 +3701,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3713,16 +3710,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3751,6 +3749,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9db92c7db8a..e401dd52e25 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -474,6 +474,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 10-v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From f70d52103e8f665de92bd531ff3a261b0142d20d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v35 09/18] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Earlier version reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7b72804a3e5..dd731f64bc6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -433,53 +430,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -709,7 +688,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -962,9 +940,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1030,9 +1007,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1184,7 +1161,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1644,6 +1621,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1691,32 +1669,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6c7807d5bd3..b5370ec26da 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -462,7 +462,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -470,7 +470,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2788,7 +2788,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2814,14 +2814,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2862,7 +2862,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3575,7 +3575,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3583,7 +3583,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3606,7 +3606,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3624,7 +3624,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3634,7 +3634,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3723,9 +3723,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3755,8 +3755,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (27.3K, 11-v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 479ab7c11c1e48c938934706acf21cff460297c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v35 10/18] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 321 ++++++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 107 +--------
 src/include/access/heapam.h          |  37 ++-
 3 files changed, 266 insertions(+), 199 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd731f64bc6..d41e1c6fce4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,6 +213,12 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId newest_frozen_xid,
+									  TransactionId newest_live_xid);
 
 
 /*
@@ -373,9 +383,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -445,7 +456,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/*
 	 * Currently, only VACUUM performs freezing, but other callers may in the
-	 * future. Other callers must initialize prstate.all_frozen to false,
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
 	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
 	 *
 	 * We only consider opportunistic freezing if the page would become
@@ -774,6 +785,66 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed,
+				 TransactionId newest_frozen_xid,
+				 TransactionId newest_live_xid)
+{
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be seen as frozen by all MVCC snapshots on the standby (any
+	 * conflict would ahve been handled in reaction to the WAL record freezing
+	 * those tuples).
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshot conflict horizon for the whole record should be the most
+	 * conservative (newest) of all the horizons calculated for any of the
+	 * possible modifications. If this record will prune tuples, any queries
+	 * on the standby with xmin older than the youngest XID of the most
+	 * recently removed tuple this record will prune will conflict.  If this
+	 * record will freeze tuples, any queries on the standby with xmin older
+	 * than the youngest tuple this record will freeze will conflict.
+	 *
+	 * If we are setting the VM, the conflict horizon is almost always the
+	 * newest live XID, except in the situation described above.
+	 *
+	 * By picking the newest of all of those, we can ensure that all changes
+	 * in the record have been taken into account.
+	 */
+	if (do_set_vm)
+		conflict_xid = newest_live_xid;
+	if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
+		conflict_xid = newest_frozen_xid;
+
+	/*
+	 * If we are removing tuples with a younger XID than our so far calculated
+	 * conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+	{
+		Assert(do_prune);
+		conflict_xid = latest_xid_removed;
+	}
+
+	return conflict_xid;
+}
+
 /*
  * Helper to fix visibility-related corruption on a heap page and its
  * corresponding VM page. An all-visible page cannot have dead items nor can
@@ -839,7 +910,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -856,7 +927,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -885,8 +992,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
@@ -913,15 +1020,14 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
 
-	presult->vmbits = prstate->vmbits;
-
 	if (!PageIsEmpty(page))
 		presult->hastup = true;
 }
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -936,12 +1042,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -969,15 +1073,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -986,8 +1092,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1058,6 +1164,25 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.old_vmbits, prstate.new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.pagefrz.FreezePageConflictXid,
+									prstate.newest_live_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1079,14 +1204,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1100,6 +1228,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1107,29 +1256,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xid of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1139,33 +1271,64 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->all_visible = prstate.set_all_visible;
-	presult->all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b5370ec26da..4678e0b9c26 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -458,13 +458,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1995,8 +1988,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2037,29 +2028,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2080,6 +2048,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2093,71 +2069,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.all_visible || !(*has_lpdead_items));
-	Assert(!presult.all_frozen || presult.all_visible);
-
-	if (!presult.all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3571,7 +3482,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e401dd52e25..7ef4cbbfb1e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -260,7 +260,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -276,8 +277,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -311,25 +311,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * all_visible and all_frozen indicate if the all-visible and all-frozen
-	 * bits in the visibility map can be set for this page, after pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
-	 * true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		all_visible;
-	bool		all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -466,7 +453,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 12-v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 410c0e06c85c4d686f114635b0044549dc22eceb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v35 11/18] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4678e0b9c26..68fa77b5318 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1902,9 +1902,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1922,13 +1925,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 13-v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 44626ffe27eddbd1dea7851b10079c150069faf7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v35 12/18] Remove XLOG_HEAP2_VISIBLE entirely

As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e19209f180d..2f9ef87463e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8894,50 +8894,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..df89f93edb4 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d41e1c6fce4..b66d49f4d60 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1245,8 +1245,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 68fa77b5318..ef607945a93 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1925,11 +1925,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2793,9 +2793,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index d83afbfb9d6..afacc1b8e0d 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,12 +476,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..f5cbcf084a4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4356,7 +4356,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch (924B, 14-v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From f24da3eaa6c3587bb0621817b78c148af0393349 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v35 13/18] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v35-0014-Track-which-relations-are-modified-by-a-query.patch (5.4K, 15-v35-0014-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 382cdd7f98291e00e0fe11c53a32e2b64396fd8e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v35 14/18] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execMain.c  | 13 +++++++++++++
 src/backend/executor/execUtils.c | 32 ++++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 54 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..6f51b82a364 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_range_table_size = parentestate->es_range_table_size;
 	rcestate->es_relations = parentestate->es_relations;
 	rcestate->es_rowmarks = parentestate->es_rowmarks;
+	rcestate->es_modified_relids = parentestate->es_modified_relids;
 	rcestate->es_rteperminfos = parentestate->es_rteperminfos;
 	rcestate->es_plannedstmt = parentestate->es_plannedstmt;
 	rcestate->es_junkFilter = parentestate->es_junkFilter;
@@ -3165,6 +3174,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..b4e95644404 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,34 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+	Index		rti;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +926,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch (21.2K, 16-v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch)
  download | inline diff:
From 61ce0d481c14b6203efdb7fa77949e777505d613 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v35 15/18] Make begin_scan() functions take a flags argument

This lets us pass more information from the executor to use when
building the scan descriptor. A future commit will use this to tell the
scan descriptor whether or not its relation is read-only in the current
query.
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 9cd563fd0c3..eea24eb7116 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index ee9b6106922..977308f7282 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2060,7 +2060,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 47624194f93..ebe2e87a28b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index fd9d4087b5a..cc486e66793 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1926,7 +1926,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..c5cbc5b4e1f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1158,7 +1158,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b04b0dbd2a0..654cc7db175 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22623,7 +22623,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23087,7 +23087,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c68c26cbf38..106bcd3301c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -107,7 +107,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index dd7e11c0ca5..3da2db74e88 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7186,7 +7186,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v35-0016-Pass-down-information-on-table-modification-to-s.patch (8.0K, 17-v35-0016-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1b41b0a89323c45652965d2e11afd729bdb2c1c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v35 16/18] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  2 ++
 7 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ebe2e87a28b..3a8eb9d8b61 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 106bcd3301c..1017676fce0 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -103,11 +103,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ef4cbbfb1e..c20218f8190 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..599011ba567 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 18-v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 88a86dbfc54db38c890718d74419d94f15dade18 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v35 17/18] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2f9ef87463e..5539bb8c10b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3a8eb9d8b61..eb5a1b7bd21 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b66d49f4d60..fc2ddcb5ab4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
 									  uint8 old_vmbits, uint8 new_vmbits,
 									  TransactionId latest_xid_removed,
@@ -237,7 +240,8 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * pinned. If we find VM corruption during pruning, we will fix it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -319,6 +323,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -375,6 +381,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -937,21 +944,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1166,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ef607945a93..ab76800b4df 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2007,7 +2007,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c20218f8190..0a3e3df9b2d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -435,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v35-0018-Set-pd_prune_xid-on-insert.patch (9.3K, 19-v35-0018-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 815a2d10ebc6f672be5508a0c4a98ff866d0d71b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v35 18/18] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c              | 31 +++++++++++++------
 src/backend/access/heap/heapam_xlog.c         | 17 +++++++++-
 src/backend/access/heap/pruneheap.c           | 14 ++++-----
 .../modules/index/expected/killtuples.out     |  8 ++---
 4 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5539bb8c10b..bb124bc767b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,29 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2240,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2604,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index df89f93edb4..edd5c946c6a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fc2ddcb5ab4..72a1c311bd0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1904,16 +1904,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-03 07:32   ` Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Chao Li @ 2026-03-03 07:32 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 3, 2026, at 07:38, Melanie Plageman <[email protected]> wrote:
> 
> On Fri, Feb 20, 2026 at 12:59 PM Andres Freund <[email protected]> wrote:
>> 
>> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
>> 
>>> I could see an argument for moving identify_and_fix_vm_corruption()
>>> out of the helper and into heap_page_prune_and_freeze() but then we'd
>>> have to move visibilitymap_get_status() out too. And that takes away a
>>> lot of the benefit of encapsulating all that logic.
>> 
>> I was wondering about that option. Relatedly, I also was wondering if we ought
>> to do identify_and_fix_vm_corruption() regardless of ->attempt_update_vm.
> 
> Attached v35 does this. I always pin the vmbuffer if we are going to
> prune in heap_page_prune_opt(). In many cases, because it's saved in
> the scan descriptor, it won't actually need to take a new pin. During
> pruning, I check for VM corruption even if I am not considering
> setting the VM.
> 
>>> Well, after this patch set, clearing the VM does happen before we emit
>>> WAL for pruning.
>> 
>> That I think is a substantial improvement, the current (i.e. before your
>> series) placement really is pretty insane due to the guaranteed divergence it
>> causes.
>> 
>> I wonder if we actually should just force an FPI whenever we detect such
>> corruption, that way it would reliably fixed on the standby as well.
> 
> Only problem is we would have to do an FPI of the VM page as well if
> we wanted the corruption to be reliably fixed on the standby.
> 
>>> It wouldn't be hard to move the corruption fixups to the beginning of
>>> heap_page_prune_and_freeze() in the new code structure.
>> 
>> As identify_and_fix_vm_corruption() needs lpdead_items, I'm not sure that's
>> true?
>> 
>> I wonder if at least the warning for the "(PageIsAllVisible(heap_page) &&
>> nlpdead_items > 0)" test should be moved to
>> heap_prune_record_dead_or_unused(). That way the WARNING could include the
>> offset number and it'd also work in the mark_unused_now case.
>> 
>> Perhaps it also should trigger for RECENTLY_DEAD, INSERT_IN_PROGRESS,
>> DELETE_IN_PROGRESS?
>> 
>> At that point the !page_all_visible && vm_all_visible part could indeed be
>> moved to the start of heap_page_prune_and_freeze()
> 
> I've done all this. There is heap page/VM corruption check at the
> beginning of heap_page_prune_and_freeze() and then checking for
> corruption during pruning in the previously covered case (lpdead
> items) as well as the mark_unused_now case, and
> RECENTLY_DEAD/INSERT_IN_PROGRESS/DELETE_IN_PROGRESS.
> 
>>> Would it be worth it? What benefit would we get? Do you just feel that it
>>> should logically come first?
>> 
>> One insanity is that right now we will process all frozen pages over and over
>> due to he skip pages threshold, wasting a *lot* of CPU and memory bandwidth.
>> It'd be quite defensible to just skip processing the page once we determined
>> it's already all frozen.  But for that we'd probably want to do the
>> "page_all_visible && vm_all_visible" check before returning...
> 
> I've added a fast path to bypass pruning/freezing when the page is
> already all-visible. And I check for pg_all_visible && vm_all_visible
> beforehand. The one downside this has is if there is a page marked
> all-frozen but has dead tuples on it, we'll never get to fix that
> corruption nor clean up the dead tuples. But the fast path kind of
> seems worth it to me.
> 
>>>> Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
>>>> HEAP_PAGE_PRUNE_UPDATE_VM would be set?
>>> 
>>> Yes, when setting the VM on-access, it is too expensive to call
>>> heap_prepare_freeze_tuple() on each tuple. I could work on trying to
>>> optimize it, but it isn't currently viable.
>> 
>> Is it too expensive to do so even when we already decided to do some pruning?
>> I am not surprised it's too expensive when there's not even a dead tuple on
>> the page.  But I am mildly surprised if it's too expensive to do when we'd WAL
>> log anyway?
> 
> It's not really possible in the current code structure to only call
> heap_prepare_freeze_tuple() when there are at least some prunable
> tuples. We go through the line pointers and record them as prunable at
> the same time we call heap_prepare_freeze_tuple(), so we won't know
> until we've examined all line pointers that there are no prunable
> tuples, at which point we will have called heap_prepare_freeze_tuple()
> for every tuple.
> 
>>> I think using all_frozen_except_dead while maintaining
>>> visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
>>> the potential to be confusing, though. We'd need to keep updating
>>> visibility_cutoff_xid when all_visible is false but
>>> all_frozen_except_dead is true as well as when all_visible is true.
>>> And because we don't care about all_visible_except_dead, it gets even
>>> more confusing to make sure we are maintaining the right variables in
>>> the right situations.
>> 
>> I suspect we should just track all of the horizons/cutoffs all the time. This
>> whole stuff about optimizing out a few conditional assignments complicates the
>> code substantially and feels extremely error prone to me.
> 
> I've done this in v35. I posted the freeze horizon tracking patch
> separately in [1] but it is in v35 as 0004. Tracking the newest live
> xid is in 0009. This also always tracks all_visible for all callers
> since I unconditionally pass the vmbuffer now. I still don't set the
> VM if the query is modifying the relation, though.
> 
>> I probably complained about this before, and it's not this patch's fault, but
>> PruneState->{all_visible,all_frozen} are imo confusingly named, due to
>> sounding like they describe the current state, rather than the possible state
>> after pruning.  It's not helped by this comment:
>> 
>>         * NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
>>         * That's convenient for heap_page_prune_and_freeze() to use them to
>>         * decide whether to opportunistically freeze the page or not.  The
>>         * all_visible and all_frozen values ultimately used to set the VM are
>>         * adjusted to include LP_DEAD items after we determine whether or not to
>>         * opportunistically freeze.
>> 
>> "all-visible ... are adjusted to include LP_DEAD" ... - just reading that it's
>> hard to know what it means.
> 
> 0003 does the rename.
> 
>> The first thing to improve pruning performance that I would do is to introduce
>> a fastpath for pages that a) area already frozen b) do not have dead items (if
>> we're not freezing). Iterating through HOT chains is far from cheap, and if
>> all rows are live, there's not really a point in doing so.  This is
>> particulary important for VACUUMs where we end up freezing a ton of pages that
>> are already frozen, due to the silly skip_pages_threshold thing.
> 
> 0007 adds a fast path.
> 
>>> +static TransactionId
>>> +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
>>> +                              uint8 old_vmbits, uint8 new_vmbits,
>>> +                              TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
>>> +                              TransactionId visibility_cutoff_xid)
>>> +{
>>> +     TransactionId conflict_xid;
>>> +
>>> +     /*
>>> +      * We can omit the snapshot conflict horizon if we are not pruning or
>>> +      * freezing any tuples and are setting an already all-visible page
>>> +      * all-frozen in the VM.
>> 
>> Maybe mention when this can happen, because it's not immediately obvious.
> 
> I've added this to my TODO. I honestly can't think of a scenario where
> it can happen. But I remember spending quite a bit of time thinking
> about it on another occasion. The current code (in master) does
> specifically account for this scenario, which is why I kept the logic,
> but I'm not sure how it can happen.
> 
> I made all the other changes to specific comments you mentioned in
> your mail but I won't bore you with itemization.
> 
>>>      if (do_set_vm)
>>>              conflict_xid = visibility_cutoff_xid;
>>>      else if (do_freeze)
>>>              conflict_xid = frz_conflict_horizon;
>>>      else
>>>              conflict_xid = InvalidTransactionId;
>> 
>> Could it be worth checking that if (do_set_vm && do_freeze) the
>> frz_conflict_horizon won't "violated" by using visibility_cutoff_xid instead?
> 
> Yes, as you mentioned off-list, this wasn't right. New code is like this
> 
> TransactionId conflict_xid = InvalidTransactionId;
> ...
>    if (do_set_vm)
>        conflict_xid = newest_live_xid;
>    if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
>        conflict_xid = newest_frozen_xid;
> 
>>> From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 17 Dec 2025 16:51:05 -0500
>>> Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
>>> level visibility
>>> 
>>> [...]
>>> 
>>> Because comparing a transaction ID against GlobalVisState is more
>>> expensive than comparing against a single XID, we defer this check until
>>> after scanning all tuples on the page.
>> 
>> Curious, is this a precaution or was this a measurable bottleneck?
> 
> I did see GlobalVisTestXidMaybeRunning() in a profile I did when it
> was still called for every HEAPTUPLE_LIVE tuple in
> heap_prune_record_unchanged_lp_normal(), but I don't have the profile
> or test case around anymore.
> 
> However, since I now unconditionally maintain the newest_live_xid,
> moving GlobalVisTestXidMaybeRunning() back into
> heap_prune_record_unchanged_lp_normal() wouldn't help us avoid any
> work. It would just make the values of prstate.set_all_visible and
> prstate.set_all_frozen more accurate sooner. But I don't think it's
> worth the extra function call since set_all_frozen and set_all_visible
> won't be totally "done" until after we decide whether or not to
> opportunistically freeze anyway.
> 
>>> @@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>>>      prune_freeze_plan(RelationGetRelid(params->relation),
>>>                                        buffer, &prstate, off_loc);
>>> 
>>> +     /*
>>> +      * After processing all the live tuples on the page, if the newest xmin
>>> +      * amongst them may be considered running by any snapshot, the page cannot
>>> +      * be all-visible.
>>> +      */
>>> +     if (prstate.all_visible &&
>>> +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
>> 
>> Any reason to test IsNormal rather than just IsValid()?  There should never be
>> a reason it's a valid but not "normal" xid, right?
> 
> Well the reason I did this was that the existing code in master
> tracking visibility_cutoff_xid only advances it if
> TransactionIdIsNormal(). I'm a bit confused about it too because it
> seems like we would still want to do it for bootstrap mode xids. But I
> see PageSetPrunable() only allows normal xids.
> 
>>> @@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
>>>                              }
>>> 
>>>                              /*
>>> -                              * The inserter definitely committed.  But is it old enough
>>> -                              * that everyone sees it as committed?  A FrozenTransactionId
>>> -                              * is seen as committed to everyone.  Otherwise, we check if
>>> -                              * there is a snapshot that considers this xid to still be
>>> -                              * running, and if so, we don't consider the page all-visible.
>>> +                              * The inserter definitely committed. But we don't know if it
>>> +                              * is old enough that everyone sees it as committed. Later,
>>> +                              * after processing all the tuples on the page, we'll check if
>>> +                              * there is any snapshot that still considers the newest xid
>>> +                              * on the page to be running. If so, we don't consider the
>>> +                              * page all-visible.
>>>                               */
>>>                              xmin = HeapTupleHeaderGetXmin(htup);
>>> 
>>> -                             /*
>>> -                              * For now always use prstate->cutoffs for this test, because
>>> -                              * we only update 'all_visible' and 'all_frozen' when freezing
>>> -                              * is requested. We could use GlobalVisTestIsRemovableXid
>>> -                              * instead, if a non-freezing caller wanted to set the VM bit.
>>> -                              */
>>> -                             Assert(prstate->cutoffs);
>>> -                             if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
>>> -                             {
>>> -                                     prstate->all_visible = false;
>>> -                                     prstate->all_frozen = false;
>>> -                                     break;
>>> -                             }
>>> -
>>>                              /* Track newest xmin on page. */
>>>                              if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
>>>                                      TransactionIdIsNormal(xmin))
>> 
>> Kinda wonder if this cod eshould be in something like
>> heap_prune_record_freezable() or such, rather than be inside
>> heap_prune_record_unchanged_lp_normal().
> 
> I played around with it, but it all felt a bit awkward. I wrote it
> down for a future enhancement idea.
> 
>>> Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
>>> 
>>> In the prune/freeze path, we currently delay clearing all_visible and
>>> all_frozen in the presence of dead items to allow opportunistic
>>> freezing.
>>> 
>>> However, if no freezing will be attempted, there’s no need to delay.
>>> Clearing the flags earlier avoids extra bookkeeping in
>>> heap_prune_record_unchanged_lp_normal(). This currently has no runtime
>>> effect because all callers that consider setting the VM also prepare
>>> freeze plans, but upcoming changes will allow on-access pruning to set
>>> the VM without freezing. The extra bookkeeping was noticeable in a
>>> profile of on-access VM setting.
>> 
>> What workload was that?
> 
> It was a select * offset all query with a few fat tuples on each page
> and none of them prunable. I'm planning on digging up the
> case/creating a new one to see if it is reproducible. This was with an
> older version of the code that had more conditionals as well. This
> commit is actually dropped in v35 because I now always keep
> newest_live_xid up-to-date (0009) which means unsetting
> set_all_visible sooner has no benefit.
> 
>> Theoretically, even if we don't freeze, the page still may be all-visible or
>> all frozen after the removal of dead items, no? Practically that won't happen,
>> because we don't remove dead items in any of the relevant paths, but from the
>> commit message and comments that's not entirely clear.
> 
> Yea, it's clearer with the commit dropped.
> 
>>> @@ -678,6 +678,12 @@ typedef struct EState
>>>                                                                       * ExecDoInitialPruning() */
>>>      const char *es_sourceText;      /* Source text from QueryDesc */
>>> 
>>> +     /*
>>> +      * RT indexes of relations modified by the query through a
>>> +      * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
>>> +      */
>>> +     Bitmapset  *es_modified_relids;
>>> +
>> 
>> Other EState fields are initialized in CreateExecutorState, this isn't afaict?
> 
> Oops, yes. I based it on es_unpruned_relids which wasn't initialized
> there either. I've added a commit (0013) to initialize a few EState
> fields that weren't initialized in CreateExecutorState() as well.
> 
>> Wonder if it's worth adding a crosscheck somewhere, verifying that if a
>> relation is modified, it's in es_modified_relids. Otherwise this could very
>> well silently get out of date.
> 
> Done in v35 (0014).
> 
>> Also, there's some overlap between the informtion collected this way, and
>> AcquireExecutorLocks(), ScanQueryForLocks(), which determine the needed lock
>> modes via rte->rellockmode.
> 
> Those are in parser/planner, so it doesn't seem like a good fit. I
> populate es_modified_relids in the executor.
> 
> I don't know exactly what the overlap would be between RTEs with an
> exclusive rellockmode and es_modified_relids. It seems like you could
> have RTEs which don't end up getting modified that have a lock level
> that would have made you think that they would be modified.
> 
> But were you imagining a substitution or a cross-check?
> 
>>> From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
>>> From: Melanie Plageman <[email protected]>
>>> Date: Wed, 3 Dec 2025 15:12:18 -0500
>>> Subject: [PATCH v34 12/14] Pass down information on table modification to scan
>>> node
>> 
>> Perhaps worth splitting up, so the addition of the 0 flag is separate from the
>> the read only hint aspect.
> 
> Done.
> 
> [1] https://www.postgresql.org/message-id/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gma...
> <v35-0001-Move-commonly-used-context-into-PruneState-and-s.patch><v35-0002-Add-PageGetPruneXid-helper.patch><v35-0003-Rename-PruneState-all_visible-all_frozen.patch><v35-0004-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch><v35-0005-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch><v35-0006-Fix-visibility-map-corruption-in-more-cases.patch><v35-0007-Add-pruning-fast-path-for-all-visible-and-all-fr.patch><v35-0008-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v35-0009-Keep-newest-live-XID-up-to-date-even-if-page-not.patch><v35-0010-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v35-0011-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v35-0012-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v35-0013-Initialize-missing-fields-in-CreateExecutorState.patch><v35-0014-Track-which-relations-are-modified-by-a-query.patch><v35-0015-Make-begin_scan-functions-take-a-flags-argument.patch><v35-0016-Pass-down-information-on-table-modification-to-s.patch><v35-0017-Allow-on-access-pruning-to-set-pages-all-visible.patch><v35-0018-Set-pd_prune_xid-on-insert.patch>

1 - 0001
```
+prune_freeze_plan(PruneState *prstate, OffsetNumber *off_loc)
 {
-	Page		page = BufferGetPage(buffer);
-	BlockNumber blockno = BufferGetBlockNumber(buffer);
-	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	Page		page = prstate->page;
+	BlockNumber blockno = prstate->block;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
```

As there is a local “page”, maybe just use the local one for PageGetMaxOffsetNumber.

0002 looks good.

2 - 0003 - Does it make sense to also do the same renaming in PruneFreezeResult?

3 - 0004
```
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
+											 prstate->cutoffs->OldestXmin));
```

At this point of Assert, can prstate->pagefrz.FreezePageConflictXid be InvalidTransactionId? My understanding is no, in that case, would it make sense to also Assert(prstate->pagefrz.FreezePageConflictXid != InvalidTransactionId)?

Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:

Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId || 
  TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)

I will continue with 0005 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
@ 2026-03-03 15:52     ` Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-03 15:52 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 3, 2026 at 2:33 AM Chao Li <[email protected]> wrote:
>
> 2 - 0003 - Does it make sense to also do the same renaming in PruneFreezeResult?

I could do that. Later commits remove them, so I thought it didn't
make sense. If only this commit goes in though, it would make sense.

> -                * Calculate what the snapshot conflict horizon should be for a record
> -                * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
> -                * for conflicts when the whole page is eligible to become all-frozen
> -                * in the VM once we're done with it. Otherwise, we generate a
> -                * conservative cutoff by stepping back from OldestXmin.
> -                */
> -               if (prstate->set_all_frozen)
> -                       prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
> -               else
> -               {
> -                       /* Avoids false conflicts when hot_standby_feedback in use */
> -                       prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
> -                       TransactionIdRetreat(prstate->frz_conflict_horizon);
> -               }
> +               Assert(TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid,
> +                                                                                        prstate->cutoffs->OldestXmin));
> ```
>
> At this point of Assert, can prstate->pagefrz.FreezePageConflictXid be InvalidTransactionId? My understanding is no, in that case, would it make sense to also Assert(prstate->pagefrz.FreezePageConflictXid != InvalidTransactionId)?

I think it is possible if we are doing some kind of freezing to a
multixact that we reach here and FreezePageConflictXid is
InvalidTransactionId.

> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>
> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>   TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)

This is covered by TransactionIdPrecedesOrEquals because
InvalidTransactionId is 0. We assume that in many places throughout
the code.

> I will continue with 0005 tomorrow.

Thanks for the review!

I noticed a serious bug in v35-0017: I pass hscan->modifies_base_rel
to heap_page_prune_opt() as rel_read_only, which is the opposite of
what I want to do -- it should be !hscan->modifies_base_rel. I'm going
to wait to fix it though and post a new v36 once I've batched up more
fixups.

- Melanie





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-04 08:59       ` Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Chao Li @ 2026-03-04 08:59 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
> 
> 
>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>> 
>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>  TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
> 
> This is covered by TransactionIdPrecedesOrEquals because
> InvalidTransactionId is 0. We assume that in many places throughout
> the code.
> 

I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.


>> I will continue with 0005 tomorrow.
> 

4 - 0005
```
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
```

I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.

As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.

5  - 0006
```
+ *
+ * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
+ * page, but it does not need to be done in a critical section because
+ * clearing the VM is not WAL-logged.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
```

Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.

6 - 0006
```
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("LP_DEAD item found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all found on page marked as all-visible"),
+					 errdetail("relation \"%s\", page %u, tuple %u",
+							   RelationGetRelationName(prstate->relation),
+							   prstate->block, offnum)));
+		}
```

I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.

7 - 0006
```
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear.  However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck with buffer lock before concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
+						prstate->block,
+						RelationGetRelationName(prstate->relation))));
+	}
```

The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?

8 - 0007
```
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	presult->vmbits = prstate->vmbits;
+
+	if (!PageIsEmpty(page))
+		presult->hastup = true;
+}
```

* Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
* I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
* Do we need to set all_visible and all_frozen to presult?

0008 LGTM

I will continue with 0009 tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
@ 2026-03-05 08:52         ` Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Chao Li @ 2026-03-05 08:52 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 4, 2026, at 16:59, Chao Li <[email protected]> wrote:
> 
> 
> 
>> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
>> 
>> 
>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>>> 
>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>> 
>> This is covered by TransactionIdPrecedesOrEquals because
>> InvalidTransactionId is 0. We assume that in many places throughout
>> the code.
>> 
> 
> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.
> 
> 
>>> I will continue with 0005 tomorrow.
>> 
> 
> 4 - 0005
> ```
>  * Caller must have pin on the buffer, and must *not* have a lock on it.
>  */
> void
> -heap_page_prune_opt(Relation relation, Buffer buffer)
> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> ```
> 
> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.
> 
> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.
> 
> 5  - 0006
> ```
> + *
> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
> + * page, but it does not need to be done in a critical section because
> + * clearing the VM is not WAL-logged.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> ```
> 
> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.
> 
> 6 - 0006
> ```
> + if (prstate->lpdead_items > 0)
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("LP_DEAD item found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> + else
> + {
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("tuple not visible to all found on page marked as all-visible"),
> + errdetail("relation \"%s\", page %u, tuple %u",
> +   RelationGetRelationName(prstate->relation),
> +   prstate->block, offnum)));
> + }
> ```
> 
> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.
> 
> 7 - 0006
> ```
> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> + {
> + /*
> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
> + * the page-level bit is clear.  However, it's possible that the bit
> + * got cleared after heap_vac_scan_next_block() was called, so we must
> + * recheck with buffer lock before concluding that the VM is corrupt.
> + */
> + ereport(WARNING,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
> + prstate->block,
> + RelationGetRelationName(prstate->relation))));
> + }
> ```
> 
> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
> 
> 8 - 0007
> ```
> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> + Page page = prstate->page;
> +
> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> +   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> + !prstate->attempt_freeze));
> +
> + /* We'll fill in presult for the caller */
> + memset(presult, 0, sizeof(PruneFreezeResult));
> +
> + /*
> + * Since the page is all-visible, a count of the normal ItemIds on the
> + * page should be sufficient for vacuum's live tuple count.
> + */
> + for (OffsetNumber off = FirstOffsetNumber;
> + off <= maxoff;
> + off = OffsetNumberNext(off))
> + {
> + if (ItemIdIsNormal(PageGetItemId(page, off)))
> + prstate->live_tuples++;
> + }
> +
> + presult->live_tuples = prstate->live_tuples;
> +
> + /* Clear any stale prune hint */
> + if (TransactionIdIsValid(PageGetPruneXid(page)))
> + {
> + PageClearPrunable(page);
> + MarkBufferDirtyHint(prstate->buffer, true);
> + }
> +
> + presult->vmbits = prstate->vmbits;
> +
> + if (!PageIsEmpty(page))
> + presult->hastup = true;
> +}
> ```
> 
> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
> * Do we need to set all_visible and all_frozen to presult?
> 
> 0008 LGTM
> 
> I will continue with 0009 tomorrow.
> 

9 - 0009
···
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.all_frozen to false,
···

Nit: prstate.all_frozen -> prstate.set_all_frozen

I saw you have fixed this in 0010, but I think it’s better also fix it here.

10 - 0010
```
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
```

These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.

11 - 0010
```
+ BlockNumber new_all_visible_pages;
+ BlockNumber new_all_visible_frozen_pages;
+ BlockNumber new_all_frozen_pages;
```

I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
```
    PruneFreezeResult presult;
```
So, those fields will hold random values.

12 - 0010
```
+	 * conflict would ahve been handled in reaction to the WAL record freezing
```

Nit: ahve -> have

0011 LGTM

13 - 0012 - bufmask.c
```
+	 * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+	 * for more details.
```

I don’t find a function named heap_xlog_prune_and_freeze().

14 - 0012 - heapam_xlog.c
```
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_and_freeze()).
```

Same as 13.

0013 LGTM

I will try to finish the rest 5 commits tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
@ 2026-03-06 02:40           ` Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Chao Li @ 2026-03-06 02:40 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>



> On Mar 5, 2026, at 16:52, Chao Li <[email protected]> wrote:
> 
> 
> 
>> On Mar 4, 2026, at 16:59, Chao Li <[email protected]> wrote:
>> 
>> 
>> 
>>> On Mar 3, 2026, at 23:52, Melanie Plageman <[email protected]> wrote:
>>> 
>>> 
>>>> Otherwise, if prstate->pagefrz.FreezePageConflictXid is still possibly be InvalidTransactionId, then the Assert should be changed to something like:
>>>> 
>>>> Assert(prstate->pagefrz.FreezePageConflictXid == InvalidTransactionId ||
>>>> TransactionIdPrecedesOrEquals(prstate->pagefrz.FreezePageConflictXid, prstate->cutoffs->OldestXmin)
>>> 
>>> This is covered by TransactionIdPrecedesOrEquals because
>>> InvalidTransactionId is 0. We assume that in many places throughout
>>> the code.
>>> 
>> 
>> I understood that TransactionIdPrecedesOrEquals(InvalidTransactionId, prstate->cutoffs->OldestXmin) is true, but that would leave an impression to code readers that prstate->pagefrz.FreezePageConflictXid could not be InvalidTransactionId. Thus I think my version explicitly tells that prstate->pagefrz.FreezePageConflictXid could be InvalidTransactionId at the point.
>> 
>> 
>>>> I will continue with 0005 tomorrow.
>>> 
>> 
>> 4 - 0005
>> ```
>> * Caller must have pin on the buffer, and must *not* have a lock on it.
>> */
>> void
>> -heap_page_prune_opt(Relation relation, Buffer buffer)
>> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>> ```
>> 
>> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.
>> 
>> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.
>> 
>> 5  - 0006
>> ```
>> + *
>> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
>> + * page, but it does not need to be done in a critical section because
>> + * clearing the VM is not WAL-logged.
>> + */
>> +static void
>> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
>> ```
>> 
>> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.
>> 
>> 6 - 0006
>> ```
>> + if (prstate->lpdead_items > 0)
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("LP_DEAD item found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> +   RelationGetRelationName(prstate->relation),
>> +   prstate->block, offnum)));
>> + }
>> + else
>> + {
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("tuple not visible to all found on page marked as all-visible"),
>> + errdetail("relation \"%s\", page %u, tuple %u",
>> +   RelationGetRelationName(prstate->relation),
>> +   prstate->block, offnum)));
>> + }
>> ```
>> 
>> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.
>> 
>> 7 - 0006
>> ```
>> + else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
>> + {
>> + /*
>> + * As of PostgreSQL 9.2, the visibility map bit should never be set if
>> + * the page-level bit is clear.  However, it's possible that the bit
>> + * got cleared after heap_vac_scan_next_block() was called, so we must
>> + * recheck with buffer lock before concluding that the VM is corrupt.
>> + */
>> + ereport(WARNING,
>> + (errcode(ERRCODE_DATA_CORRUPTED),
>> + errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
>> + prstate->block,
>> + RelationGetRelationName(prstate->relation))));
>> + }
>> ```
>> 
>> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?
>> 
>> 8 - 0007
>> ```
>> +static void
>> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
>> +{
>> + OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
>> + Page page = prstate->page;
>> +
>> + Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
>> +   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
>> + !prstate->attempt_freeze));
>> +
>> + /* We'll fill in presult for the caller */
>> + memset(presult, 0, sizeof(PruneFreezeResult));
>> +
>> + /*
>> + * Since the page is all-visible, a count of the normal ItemIds on the
>> + * page should be sufficient for vacuum's live tuple count.
>> + */
>> + for (OffsetNumber off = FirstOffsetNumber;
>> + off <= maxoff;
>> + off = OffsetNumberNext(off))
>> + {
>> + if (ItemIdIsNormal(PageGetItemId(page, off)))
>> + prstate->live_tuples++;
>> + }
>> +
>> + presult->live_tuples = prstate->live_tuples;
>> +
>> + /* Clear any stale prune hint */
>> + if (TransactionIdIsValid(PageGetPruneXid(page)))
>> + {
>> + PageClearPrunable(page);
>> + MarkBufferDirtyHint(prstate->buffer, true);
>> + }
>> +
>> + presult->vmbits = prstate->vmbits;
>> +
>> + if (!PageIsEmpty(page))
>> + presult->hastup = true;
>> +}
>> ```
>> 
>> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.
>> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?
>> * Do we need to set all_visible and all_frozen to presult?
>> 
>> 0008 LGTM
>> 
>> I will continue with 0009 tomorrow.
>> 
> 
> 9 - 0009
> ···
> +  * Currently, only VACUUM performs freezing, but other callers may in the
> +  * future. Other callers must initialize prstate.all_frozen to false,
> ···
> 
> Nit: prstate.all_frozen -> prstate.set_all_frozen
> 
> I saw you have fixed this in 0010, but I think it’s better also fix it here.
> 
> 10 - 0010
> ```
> +  * Whether or not the page was newly set all-visible and all-frozen during
> +  * phase I of vacuuming.
>  */
> - uint8 vmbits;
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
> 
> These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.
> 
> 11 - 0010
> ```
> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
> ```
> 
> I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
> ```
>    PruneFreezeResult presult;
> ```
> So, those fields will hold random values.
> 
> 12 - 0010
> ```
> +  * conflict would ahve been handled in reaction to the WAL record freezing
> ```
> 
> Nit: ahve -> have
> 
> 0011 LGTM
> 
> 13 - 0012 - bufmask.c
> ```
> +  * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
> +  * for more details.
> ```
> 
> I don’t find a function named heap_xlog_prune_and_freeze().
> 
> 14 - 0012 - heapam_xlog.c
> ```
> +  * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
> +  * heap_xlog_prune_and_freeze()).
> ```
> 
> Same as 13.
> 
> 0013 LGTM
> 
> I will try to finish the rest 5 commits tomorrow.
> 

15 - 0014 - execMain.c
```
@@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_range_table_size = parentestate->es_range_table_size;
 	rcestate->es_relations = parentestate->es_relations;
 	rcestate->es_rowmarks = parentestate->es_rowmarks;
+	rcestate->es_modified_relids = parentestate->es_modified_relids;
```

Here it just assigns the BMS pointer to rcestate->es_modified_relids. I am not sure if further bms_add_member() will still happen, if yes, it might be safer to do bms_copy(parentestate->es_modified_relids), because a further bms_add_member() may cause a new memory allocated and the old pointer stale.

16 - 0014 - execUtils.c
```
for (rti = 1; rti <= estate->es_range_table_size; rti++)
```

Nit: I have seen several recent commits that performed cleanups to switch to use for loop var like:
```
for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
```

17 - 0015

The commit message subject line says “Make begin_scan() functions take a flags argument”, where begin_scan() seems inaccurate, for example, table_index_fetch_begin() is not “begin scan”.

Otherwise 0015 LGTM.

18 - 0016 - tableam.h
```
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
```

Nit: maybe add an empty line before the new flag.

19 - 0017 - heapam_handler.c
```
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								hscan->modifies_base_rel);
```

This feels like a bug. heap_page_prune_opt takes the first parameter rel_read_only, but hscan->modifies_base_rel means not read-only, so here we should use “!hscan->modifies_base_rel”.

Oh, when I read back your previous email, you have found this bug.

20 - 0018
In heap_insert(), you do:
```
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
```

But in heap_multi_insert(), you do:
```
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
```

Is the option check " !(options & HEAP_INSERT_FROZEN))” also needed by heap_multi_insert?

~~ Done of this round review ~~

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
@ 2026-03-06 23:33             ` Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-06 23:33 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the review! Attached is v36. I've pushed some of the early
patches in the set and this is what is left. I've also done some of
the performance evaluation and microbenchmarking of the "worst case"
scenario promised in my earlier reply to Andres [1].

I used the following to test the worst-case performance of my patch:

pgbench -n -r -t 9 -f - <<'SQL'
checkpoint;
DROP TABLE IF EXISTS foo;
CREATE TABLE foo(a int, cnt int, data text) WITH (autovacuum_enabled =
false, fillfactor = 10);
ALTER TABLE foo ALTER COLUMN data SET STORAGE PLAIN;
CREATE INDEX ON foo(a);
INSERT INTO foo
SELECT i, 1, repeat(' ', 8192/10)
FROM generate_series(1,100000) i;
vacuum (freeze) foo;
update foo set cnt = cnt + 1;
select * from foo offset 100000000;
update foo set cnt = cnt + 1;
SQL

What I see is an expected slowdown for the SELECT * FROM foo OFFSET --
because it emits slightly more WAL and pins and dirties a few more
buffers. And a slight slowdown for the UPDATE following the SELECT
because it then must clear those VM bits. (This is no different than
if you had run a vacuum before doing the update).

These slowdowns are expected since this microbenchmark is designed to
be a worst case. Every buffer has a single tuple and the SELECT needs
to access no tuples because of the OFFSET. This minimizes all other
overheads to magnify the overhead of setting and clearing the VM.

I also tested if unconditionally pinning the VM even when we don't set
it had any impact on performance of on-access pruning for logged
tables. I used the setup above but patched the code to not set the VM
on-access. I found that there is no negative performance impact to the
SELECT * OFFSET. If foo is an unlogged table I do see a very slight
overhead for the SELECT * OFFSET.

And in all cases, with the patch, the vacuum above is faster because
of using the combined WAL record.

I believe I've addressed all of your review feedback. Below are
combined inline remarks to all three of your emails:

On Wed, Mar 4, 2026 at 4:00 AM Chao Li <[email protected]> wrote:
>
>   * Caller must have pin on the buffer, and must *not* have a lock on it.
>   */
>  void
> -heap_page_prune_opt(Relation relation, Buffer buffer)
> +heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> ```
>
> I don’t see why vmbuffer has to be of pointer type. Buffer type is underlying int, I checked the last commit, vmbuffer only passes in data into the function without passing out anything.

We want to save the vmbuffer in the scan descriptor so we can use it
across calls to heap_page_prune_opt(). Therefore we have to pass it by
reference. We pin the VM in heap_page_prune_opt() and if we don't save
a reference to it, we'll have to pin it again on the next call (see
visibilitymap_pin() code).

> As we add the new parameter vmbuffer, though it’s not used in this commit, I think it’d be better to update the header commit to explain what this parameter will do.

Thanks, I've updated the header comment.

> + * heap_fix_vm_corruption() makes changes to the VM and, potentially, the heap
> + * page, but it does not need to be done in a critical section because
> + * clearing the VM is not WAL-logged.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
>
> Nit: why the last paragraph of the header comments uses the function name instead of “this function”? Looks like a copy-pasto.

Fixed.

> 6 - 0006
> ```
> +               if (prstate->lpdead_items > 0)
> +               {
> +                       ereport(WARNING,
> +                                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                                        errmsg("LP_DEAD item found on page marked as all-visible"),
> +                                        errdetail("relation \"%s\", page %u, tuple %u",
> +                                                          RelationGetRelationName(prstate->relation),
> +                                                          prstate->block, offnum)));
> +               }
> +               else
> +               {
> +                       ereport(WARNING,
> +                                       (errcode(ERRCODE_DATA_CORRUPTED),
> +                                        errmsg("tuple not visible to all found on page marked as all-visible"),
> +                                        errdetail("relation \"%s\", page %u, tuple %u",
> +                                                          RelationGetRelationName(prstate->relation),
> +                                                          prstate->block, offnum)));
> +               }
> ```
>
> I recently just learned that a detail message should use complete sentences, and end each with a period, and capitalize the first word of sentences. See https://www.postgresql.org/docs/current/error-style-guide.html.

Ah thanks for noticing. I've gone ahead and changed them to errcontext
instead of errdetail. I think the messages are more compliant now.

> +       else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> +       {
> +               /*
> +                * As of PostgreSQL 9.2, the visibility map bit should never be set if
> +                * the page-level bit is clear.  However, it's possible that the bit
> +                * got cleared after heap_vac_scan_next_block() was called, so we must
> +                * recheck with buffer lock before concluding that the VM is corrupt.
> +                */
> +               ereport(WARNING,
> +                               (errcode(ERRCODE_DATA_CORRUPTED),
> +                                errmsg("page %u in \"%s\" is not marked all-visible but visibility map bit is set",
> +                                               prstate->block,
> +                                               RelationGetRelationName(prstate->relation))));
> +       }
>
> The comment says “we must recheck with buffer lock before…”, but it only log a warning message. Is the comment stale?

We have the buffer lock here. The comment means that we need to check
now -- a time when we have the buffer lock because when we checked in
heap_vac_scan_next_block() we did not have the buffer lock. I've
updated the comment to try to make that more clear.

> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
>
> * Given this function has done PageIsEmpty(page), that that is true, we don’t need to count live_tuples, right? That could be a tiny optimization.

Okay, I've tried this. I didn't want a lot of indentation, so I
reorganized the code. I'm not sure if it is more error-prone now,
though...

> * I see heap_page_bypass_prune_freeze() is only called in one place and immediately after prune_freeze_setup() and heap_fix_vm_corruption(), so prstate->vmbits must be 0, so do we need to do presult->vmbits = prstate->vmbits;?

Actually vmbits can't be zero, otherwise we won't reach the fast path
code. Or do you mean something else?

> * Do we need to set all_visible and all_frozen to presult?

I memset to 0 the other fields, so it isn't needed.

On Thu, Mar 5, 2026 at 3:53 AM Chao Li <[email protected]> wrote:
>
> 9 - 0009
> ···
> +        * Currently, only VACUUM performs freezing, but other callers may in the
> +        * future. Other callers must initialize prstate.all_frozen to false,
> ···
>
> Nit: prstate.all_frozen -> prstate.set_all_frozen
>
> I saw you have fixed this in 0010, but I think it’s better also fix it here.

Done.

> 10 - 0010
> ```
> +        * Whether or not the page was newly set all-visible and all-frozen during
> +        * phase I of vacuuming.
>          */
> -       uint8           vmbits;
> +       BlockNumber new_all_visible_pages;
> +       BlockNumber new_all_visible_frozen_pages;
> +       BlockNumber new_all_frozen_pages;
> ```
>
> These 3 fields are actually counts rather than pointers to blocks, using type BlockNumber are quite confusing, though underlying BlockNumber is uint32. I think they can be just int type.

Covered in [2].

> + BlockNumber new_all_visible_pages;
> + BlockNumber new_all_visible_frozen_pages;
> + BlockNumber new_all_frozen_pages;
>
> I don’t see where these 3 fields are initialized. In lazy_scan_prune(), presult is defined as:
>     PruneFreezeResult presult;
> So, those fields will hold random values.

Yes, thank you. I've fixed that.

> 13 - 0012 - bufmask.c
> ```
> +        * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
> +        * for more details.
> ```
>
> I don’t find a function named heap_xlog_prune_and_freeze().

Fixed in both places (-> heap_xlog_prune_freeze()).

On Thu, Mar 5, 2026 at 9:41 PM Chao Li <[email protected]> wrote:
>
> 15 - 0014 - execMain.c
> ```
> @@ -3027,6 +3035,7 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
>         rcestate->es_range_table_size = parentestate->es_range_table_size;
>         rcestate->es_relations = parentestate->es_relations;
>         rcestate->es_rowmarks = parentestate->es_rowmarks;
> +       rcestate->es_modified_relids = parentestate->es_modified_relids;
> ```
>
> Here it just assigns the BMS pointer to rcestate->es_modified_relids. I am not sure if further bms_add_member() will still happen, if yes, it might be safer to do bms_copy(parentestate->es_modified_relids), because a further bms_add_member() may cause a new memory allocated and the old pointer stale.

Yes, it's at least a bit of future proofing. Done in v36.

> 16 - 0014 - execUtils.c
> for (rti = 1; rti <= estate->es_range_table_size; rti++)
>
> Nit: I have seen several recent commits that performed cleanups to switch to use for loop var like:
> for (Index rti = 1; rti <= estate->es_range_table_size; rti++)

Updated.

> 17 - 0015
>
> The commit message subject line says “Make begin_scan() functions take a flags argument”, where begin_scan() seems inaccurate, for example, table_index_fetch_begin() is not “begin scan”.
>
> Otherwise 0015 LGTM.

I've rewritten the commit message.

> 20 - 0018
> In heap_insert(), you do:
> +       if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
> +               PageSetPrunable(page, xid);
>
> But in heap_multi_insert(), you do:
> +               if (!all_frozen_set && TransactionIdIsNormal(xid))
> +                       PageSetPrunable(page, xid);
>
> Is the option check " !(options & HEAP_INSERT_FROZEN))” also needed by heap_multi_insert?

heap_multi_insert() incorporates that into the variable
all_frozen_set, so it is not needed.

I've now also added setting prune hint for the new page on updates --
which I forgot before.

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_a1V7TUUYM7qO2c5Z-JyTKOsrryQBrk7Eu69ESzhqgd9w%40mail.gma...
[2] https://www.postgresql.org/message-id/flat/CA%2BFpmFdrM%3DL5f%3De7%2BwqOkFkYK6r_S%3DTdKrHQ5qPbTNaoVG...


Attachments:

  [text/x-patch] v36-0001-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch (6.5K, 2-v36-0001-Use-the-newest-to-be-frozen-xid-as-the-conflict-.patch)
  download | inline diff:
From 6fe999048e0c3d5b268e5b34fb1af8a4621d24fe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 11:39:28 -0500
Subject: [PATCH v36 01/16] Use the newest to-be-frozen xid as the conflict
 horizon for freezing

Previously WAL records that froze tuples used OldestXmin as the snapshot
conflict horizon. However, OldestXmin is newer than the newest frozen
tuple's xid. By tracking the newest to-be-frozen xid and using it as the
snapshot conflict horizon instead, we end up with an older horizon that
will result in fewer query cancellations on the standby.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Peter Geoghegan <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CAAKRu_bbaUV8OUjAfVa_iALgKnTSfB4gO3jnkfpcFgrxEpSGJQ%40mail.gmail.com
---
 src/backend/access/heap/heapam.c    | 12 ++++++++++
 src/backend/access/heap/pruneheap.c | 34 +++++++++--------------------
 src/include/access/heapam.h         |  8 +++++++
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a231563f0df..649ee6e7669 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6781,6 +6781,10 @@ heap_inplace_unlock(Relation relation,
  * NB: Caller should avoid needlessly calling heap_tuple_should_freeze when we
  * have already forced page-level freezing, since that might incur the same
  * SLRU buffer misses that we specifically intended to avoid by freezing.
+ *
+ * We won't update the FreezePageConflictXid because any lockers don't affect
+ * visibility on the standby, and we don't ahve to worry about the update XID
+ * since the only way it can be older than OldestXmin is if it is aborted.
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
@@ -7173,7 +7177,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 
 		/* Verify that xmin committed if and when freeze plan is executed */
 		if (freeze_xmin)
+		{
 			frz->checkflags |= HEAP_FREEZE_CHECK_XMIN_COMMITTED;
+			if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+				pagefrz->FreezePageConflictXid = xid;
+		}
 	}
 
 	/*
@@ -7192,6 +7200,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 */
 		replace_xvac = pagefrz->freeze_required = true;
 
+		if (TransactionIdFollows(xid, pagefrz->FreezePageConflictXid))
+			pagefrz->FreezePageConflictXid = xid;
+
 		/* Will set replace_xvac flags in freeze plan below */
 	}
 
@@ -7501,6 +7512,7 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	pagefrz.freeze_required = true;
 	pagefrz.FreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.FreezePageRelminMxid = MultiXactCutoff;
+	pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	pagefrz.NoFreezePageRelfrozenXid = FreezeLimit;
 	pagefrz.NoFreezePageRelminMxid = MultiXactCutoff;
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 65c9f393f41..eebd6cf57ea 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -377,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	/* initialize page freezing working state */
 	prstate->pagefrz.freeze_required = false;
+	prstate->pagefrz.FreezePageConflictXid = InvalidTransactionId;
 	if (prstate->attempt_freeze)
 	{
 		Assert(new_relfrozen_xid && new_relmin_mxid);
@@ -407,7 +408,6 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * PruneState.
 	 */
 	prstate->deadoffsets = presult->deadoffsets;
-	prstate->frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * Vacuum may update the VM after we're done.  We can keep track of
@@ -746,22 +746,8 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 		 * critical section.
 		 */
 		heap_pre_freeze_checks(prstate->buffer, prstate->frozen, prstate->nfrozen);
-
-		/*
-		 * Calculate what the snapshot conflict horizon should be for a record
-		 * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
-		 * for conflicts when the whole page is eligible to become all-frozen
-		 * in the VM once we're done with it. Otherwise, we generate a
-		 * conservative cutoff by stepping back from OldestXmin.
-		 */
-		if (prstate->set_all_frozen)
-			prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
-		else
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
-			TransactionIdRetreat(prstate->frz_conflict_horizon);
-		}
+		Assert(TransactionIdPrecedes(prstate->pagefrz.FreezePageConflictXid,
+									 prstate->cutoffs->OldestXmin));
 	}
 	else if (prstate->nfrozen > 0)
 	{
@@ -953,17 +939,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 * The snapshotConflictHorizon for the whole record should be the
 			 * most conservative of all the horizons calculated for any of the
 			 * possible modifications.  If this record will prune tuples, any
-			 * transactions on the standby older than the youngest xmax of the
-			 * most recently removed tuple this record will prune will
-			 * conflict.  If this record will freeze tuples, any transactions
-			 * on the standby with xids older than the youngest tuple this
-			 * record will freeze will conflict.
+			 * queries on the standby older than the youngest xid of the most
+			 * recently removed tuple this record will prune will conflict. If
+			 * this record will freeze tuples, any queries on the standby with
+			 * xids older than the youngest tuple this record will freeze will
+			 * conflict.
 			 */
 			TransactionId conflict_xid;
 
-			if (TransactionIdFollows(prstate.frz_conflict_horizon,
+			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
 									 prstate.latest_xid_removed))
-				conflict_xid = prstate.frz_conflict_horizon;
+				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
 			else
 				conflict_xid = prstate.latest_xid_removed;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 24a27cc043a..d083f825b39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -208,6 +208,14 @@ typedef struct HeapPageFreeze
 	TransactionId FreezePageRelfrozenXid;
 	MultiXactId FreezePageRelminMxid;
 
+	/*
+	 * The youngest XID that will be frozen or removed during freezing. It is
+	 * used to calculate the snapshot conflict horizon for a WAL record
+	 * freezing tuples. Because it is only used if we do end up freezing
+	 * tuples, there is no need for a "no freeze" version.
+	 */
+	TransactionId FreezePageConflictXid;
+
 	/*
 	 * "No freeze" NewRelfrozenXid/NewRelminMxid trackers.
 	 *
-- 
2.43.0



  [text/x-patch] v36-0002-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (6.2K, 3-v36-0002-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 9feb39bc384053606879563e81c83920ab6c5568 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v36 02/16] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  6 +++++-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 649ee6e7669..54cd8d6a497 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..47624194f93 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index eebd6cf57ea..8b5044567bf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -214,9 +214,13 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * This function may pin *vmbuffer. It's passed by reference so the caller can
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
+ * responsible for unpinning it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index d083f825b39..281cdd5ee59 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -418,7 +430,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v36-0003-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 4-v36-0003-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From fb34ed4466dab85fb16c948fda3773f5a590014c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v36 03/16] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8b5044567bf..6eca1474a2f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -121,6 +121,21 @@ typedef struct
 	 */
 	TransactionId frz_conflict_horizon;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -175,6 +190,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -182,7 +198,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -216,8 +233,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -284,6 +302,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -291,14 +319,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -361,6 +382,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -777,6 +804,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck now that we have the buffer lock before concluding that the
+		 * VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -837,6 +948,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -980,6 +1095,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1302,7 +1418,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1312,6 +1429,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1395,6 +1519,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1534,7 +1667,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1549,6 +1683,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1573,7 +1711,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1639,6 +1778,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 281cdd5ee59..568358a060a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -258,6 +258,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -320,6 +326,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v36-0004-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 5-v36-0004-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From ed9509518d8b5a0772133d13e9714153dd526858 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v36 04/16] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6eca1474a2f..2cd684873c0 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -191,6 +191,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -889,6 +890,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -952,6 +1015,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v36-0005-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 6-v36-0005-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 89a76ceedd251e74742452f1b6fa57653c7219b9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v36 05/16] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..aee88947393 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1060,6 +1060,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2cd684873c0..f7e9fd51ac9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1035,6 +1035,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1702,29 +1713,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 568358a060a..849ed82bcf2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -475,6 +475,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v36-0006-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 7-v36-0006-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From eedd45cba83b0ff220b03235bbb48af661e7dc92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v36 06/16] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f7e9fd51ac9..0de14a468f6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -136,6 +136,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -167,11 +170,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -181,7 +179,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -442,53 +439,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -718,7 +697,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -973,9 +951,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1041,9 +1018,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1194,7 +1171,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1654,6 +1631,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1701,32 +1679,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v36-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (27.1K, 8-v36-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From a96825f9026c4e9d8c8f55633b0e6dcf6f83c156 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v36 07/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 325 ++++++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 107 +--------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 271 insertions(+), 199 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 0de14a468f6..ec58f717c0b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -129,12 +144,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -164,21 +183,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -216,6 +220,12 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+									  uint8 old_vmbits, uint8 new_vmbits,
+									  TransactionId latest_xid_removed,
+									  TransactionId newest_frozen_xid,
+									  TransactionId newest_live_xid);
 
 
 /*
@@ -382,9 +392,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -783,6 +794,66 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+				 uint8 old_vmbits, uint8 new_vmbits,
+				 TransactionId latest_xid_removed,
+				 TransactionId newest_frozen_xid,
+				 TransactionId newest_live_xid)
+{
+	TransactionId conflict_xid = InvalidTransactionId;
+
+	/*
+	 * We can omit the snapshot conflict horizon if we are not pruning or
+	 * freezing any tuples and are setting an already all-visible page
+	 * all-frozen in the VM. In this case, all of the tuples on the page must
+	 * already be seen as frozen by all MVCC snapshots on the standby (any
+	 * conflict would have been handled in reaction to the WAL record freezing
+	 * those tuples).
+	 */
+	if (!do_prune &&
+		!do_freeze &&
+		(old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+		(new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+		return InvalidTransactionId;
+
+	/*
+	 * The snapshot conflict horizon for the whole record should be the most
+	 * conservative (newest) of all the horizons calculated for any of the
+	 * possible modifications. If this record will prune tuples, any queries
+	 * on the standby with xmin older than the youngest XID of the most
+	 * recently removed tuple this record will prune will conflict.  If this
+	 * record will freeze tuples, any queries on the standby with xmin older
+	 * than the youngest tuple this record will freeze will conflict.
+	 *
+	 * If we are setting the VM, the conflict horizon is almost always the
+	 * newest live XID, except in the situation described above.
+	 *
+	 * By picking the newest of all of those, we can ensure that all changes
+	 * in the record have been taken into account.
+	 */
+	if (do_set_vm)
+		conflict_xid = newest_live_xid;
+	if (do_freeze && TransactionIdFollows(newest_frozen_xid, conflict_xid))
+		conflict_xid = newest_frozen_xid;
+
+	/*
+	 * If we are removing tuples with a younger XID than our so far calculated
+	 * conflict_xid, we must use this as our horizon.
+	 */
+	if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+	{
+		Assert(do_prune);
+		conflict_xid = latest_xid_removed;
+	}
+
+	return conflict_xid;
+}
+
 /*
  * Helper to fix visibility-related corruption on a heap page and its
  * corresponding VM page. An all-visible page cannot have dead items nor can
@@ -847,7 +918,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -865,7 +936,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -894,15 +1001,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -932,7 +1037,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -947,12 +1053,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -980,15 +1084,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -997,8 +1103,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1068,6 +1174,25 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+									prstate.old_vmbits, prstate.new_vmbits,
+									prstate.latest_xid_removed,
+									prstate.pagefrz.FreezePageConflictXid,
+									prstate.newest_live_xid);
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1089,14 +1214,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1110,6 +1238,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1117,29 +1266,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications.  If this record will prune tuples, any
-			 * queries on the standby older than the youngest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the youngest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1149,33 +1281,70 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+	else
+	{
+		presult->new_all_visible_pages = 0;
+		presult->new_all_frozen_pages = 0;
+		presult->new_all_visible_frozen_pages = 0;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 849ed82bcf2..7ef4cbbfb1e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -260,7 +260,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -276,8 +277,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -311,26 +311,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -467,7 +453,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v36-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 9-v36-0008-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 89fde835f59045ae7490cbbdcfc461bef5c24841 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v36 08/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v36-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 10-v36-0009-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From d9acfde0775edefb463df2b373b24cafdd8ba531 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v36 09/16] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..a7005b57e61 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 54cd8d6a497..149cffd1a57 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8890,50 +8890,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..a83f6b03d69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ec58f717c0b..184d7e98064 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1255,8 +1255,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..3bbbdc62743 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4357,7 +4357,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v36-0010-Initialize-missing-fields-in-CreateExecutorState.patch (1.0K, 11-v36-0010-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From 81e2ffc119e0409e40da26b5ad6cd145eecc6ac3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v36 10/16] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v36-0011-Track-which-relations-are-modified-by-a-query.patch (5.5K, 12-v36-0011-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 9f7098111d8520a955b0b4d6d4d62c4a79a5497c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v36 11/16] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v36-0012-Thread-flags-through-begin-scan-APIs.patch (21.5K, 13-v36-0012-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 29cef07ed9ec1858fd81957f2b7a8b422ec81969 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v36 12/16] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 146ee97a47d..de835604cbd 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2845,7 +2845,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c7e38dbe193..d48c85e895c 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2061,7 +2061,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 47624194f93..ebe2e87a28b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 85242dcc245..09796fa4307 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22667,7 +22667,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23131,7 +23131,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e0b6df64767..b3b6da3d7e4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v36-0013-Pass-down-information-on-table-modification-to-s.patch (8.0K, 14-v36-0013-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 099ea3c46847196eba132344ce861f9d74b01be0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v36 13/16] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  3 +++
 7 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ebe2e87a28b..3a8eb9d8b61 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index b3b6da3d7e4..9bcf9a68183 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 7ef4cbbfb1e..c20218f8190 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..51dfd122307 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v36-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 15-v36-0014-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f597b757c55fee445b5f7f08d5cde55a38e197ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v36 14/16] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 149cffd1a57..8273414b430 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3a8eb9d8b61..673f6599613 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 184d7e98064..064264af1e1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -220,7 +222,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
 									  uint8 old_vmbits, uint8 new_vmbits,
 									  TransactionId latest_xid_removed,
@@ -246,7 +249,8 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -328,6 +332,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -384,6 +390,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -946,21 +953,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1176,7 +1199,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c20218f8190..0a3e3df9b2d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -435,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v36-0015-Avoid-BufferGetPage-calls-in-heap_update.patch (5.6K, 16-v36-0015-Avoid-BufferGetPage-calls-in-heap_update.patch)
  download | inline diff:
From bc15d26cb8ee817131e49cdc8f34eee9c1fb7cdc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 6 Mar 2026 16:46:01 -0500
Subject: [PATCH v36 15/16] Avoid BufferGetPage() calls in heap_update()

BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
---
 src/backend/access/heap/heapam.c      | 17 ++++++++------
 src/backend/access/heap/heapam_xlog.c | 34 ++++++++++++++-------------
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8273414b430..c39af2137c2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3339,7 +3339,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 	HeapTuple	heaptup;
 	HeapTuple	old_key_tuple = NULL;
 	bool		old_key_copied = false;
-	Page		page;
+	Page		page,
+				newpage;
 	BlockNumber block;
 	MultiXactStatus mxact_status;
 	Buffer		buffer,
@@ -4065,6 +4066,8 @@ l2:
 		heaptup = newtup;
 	}
 
+	newpage = BufferGetPage(newbuf);
+
 	/*
 	 * We're about to do the actual update -- check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -4179,17 +4182,17 @@ l2:
 	oldtup.t_data->t_ctid = heaptup->t_self;
 
 	/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
-	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
+	if (newbuf != buffer && PageIsAllVisible(newpage))
 	{
 		all_visible_cleared_new = true;
-		PageClearAllVisible(BufferGetPage(newbuf));
+		PageClearAllVisible(newpage);
 		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
 							vmbuffer_new, VISIBILITYMAP_VALID_BITS);
 	}
@@ -4220,9 +4223,9 @@ l2:
 								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
-			PageSetLSN(BufferGetPage(newbuf), recptr);
+			PageSetLSN(newpage, recptr);
 		}
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(page, recptr);
 	}
 
 	END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a83f6b03d69..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -685,7 +685,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	ItemPointerData newtid;
 	Buffer		obuffer,
 				nbuffer;
-	Page		page;
+	Page		opage,
+				npage;
 	OffsetNumber offnum;
 	ItemId		lp;
 	HeapTupleData oldtup;
@@ -749,15 +750,15 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 									  &obuffer);
 	if (oldaction == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(obuffer);
+		opage = BufferGetPage(obuffer);
 		offnum = xlrec->old_offnum;
-		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(page))
+		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(opage))
 			elog(PANIC, "offnum out of range");
-		lp = PageGetItemId(page, offnum);
+		lp = PageGetItemId(opage, offnum);
 		if (!ItemIdIsNormal(lp))
 			elog(PANIC, "invalid lp");
 
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
+		htup = (HeapTupleHeader) PageGetItem(opage, lp);
 
 		oldtup.t_data = htup;
 		oldtup.t_len = ItemIdGetLength(lp);
@@ -776,12 +777,12 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		htup->t_ctid = newtid;
 
 		/* Mark the page as a candidate for pruning */
-		PageSetPrunable(page, XLogRecGetXid(record));
+		PageSetPrunable(opage, XLogRecGetXid(record));
 
 		if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(opage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(opage, lsn);
 		MarkBufferDirty(obuffer);
 	}
 
@@ -796,8 +797,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		nbuffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(nbuffer);
-		PageInit(page, BufferGetPageSize(nbuffer), 0);
+		npage = BufferGetPage(nbuffer);
+		PageInit(npage, BufferGetPageSize(nbuffer), 0);
 		newaction = BLK_NEEDS_REDO;
 	}
 	else
@@ -829,10 +830,10 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		recdata = XLogRecGetBlockData(record, 0, &datalen);
 		recdata_end = recdata + datalen;
 
-		page = BufferGetPage(nbuffer);
+		npage = BufferGetPage(nbuffer);
 
 		offnum = xlrec->new_offnum;
-		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
+		if (PageGetMaxOffsetNumber(npage) + 1 < offnum)
 			elog(PANIC, "invalid max offset number");
 
 		if (xlrec->flags & XLH_UPDATE_PREFIX_FROM_OLD)
@@ -909,16 +910,17 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		/* Make sure there is no forward chain link in t_ctid */
 		htup->t_ctid = newtid;
 
-		offnum = PageAddItem(page, htup, newlen, offnum, true, true);
+		offnum = PageAddItem(npage, htup, newlen, offnum, true, true);
 		if (offnum == InvalidOffsetNumber)
 			elog(PANIC, "failed to add tuple");
 
 		if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(npage);
 
-		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+		/* needed to update FSM below */
+		freespace = PageGetHeapFreeSpace(npage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(npage, lsn);
 		MarkBufferDirty(nbuffer);
 	}
 
-- 
2.43.0



  [text/x-patch] v36-0016-Set-pd_prune_xid-on-insert.patch (10.4K, 17-v36-0016-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 75cc5440779451b5ce177d5cd884c6f1f3109075 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v36 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 14 +++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c39af2137c2..0b8313de2e7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 064264af1e1..0776cb6cfc2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1920,16 +1920,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-11 17:01               ` Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-11 17:01 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 6, 2026 at 6:33 PM Melanie Plageman
<[email protected]> wrote:
>
> Thanks for the review! Attached is v36. I've pushed some of the early
> patches in the set and this is what is left.

I've gone ahead and pushed another of the introductory commits.
Attached v37 has the remaining patches.

The one change is that I've removed get_conflict_xid(). I determined
that in the current code that we cannot end up in the scenario where
we didn't prune or freeze and the page was already all-visible but not
all-frozen. The closest scenario would be one where the page was
all-frozen, we cleared the all-frozen bit because we did a SELECT FOR
UPDATE on one of the tuples, then vacuum freezes the page. Even though
we are just invalidating the xmax, it still counts as freezing.
However, in this case we will not advance the FreezePageConflictXid,
so the snapshot conflict horizon will still correctly be
InvalidTransactionId. I believe in all cases we will correctly set the
conflict horizon to InvalidTransactionId with this much simpler
conflict xid calculation:

    conflict_xid = InvalidTransactionId;
    if (do_set_vm)
        conflict_xid = prstate.newest_live_xid;
    if (do_freeze &&
TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
conflict_xid))
        conflict_xid = prstate.pagefrz.FreezePageConflictXid;
    if (do_prune && TransactionIdFollows(prstate.latest_xid_removed,
conflict_xid))
        conflict_xid = prstate.latest_xid_removed;

The only outstanding question I have is about pd_prune_xid on insert:

I think we only want to set pd_prune_xid on insert if the transaction
ID is normal. Bootstrap mode does call heap_insert(), so we need to
check the xid before setting it. The only question is then if we want
the same guard on replay. Bootstrap mode won't actually insert a WAL
record, so we don't need this check I think. However, I think it is
better to have it for consistency with normal mode.

- Melanie


Attachments:

  [text/x-patch] v37-0001-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch (6.2K, 2-v37-0001-Save-vmbuffer-in-heap-specific-scan-descriptors-.patch)
  download | inline diff:
From 399b94b6cdcadd95d018f51c97bbbf6e6bd26f7d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:23:57 -0500
Subject: [PATCH v37 01/15] Save vmbuffer in heap-specific scan descriptors for
 on-access pruning

Future commits will use the visibility map in on-access pruning to avoid
pruning when a page is all-visible, fix VM corruption, and set the VM if
the page is all-visible.

Saving the vmbuffer in the scan descriptor reduces the number of times
it would need to be pinned and unpinned, making the overhead of doing so
negligible.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/heapam.c         | 12 +++++++++++-
 src/backend/access/heap/heapam_handler.c | 12 ++++++++++--
 src/backend/access/heap/pruneheap.c      |  6 +++++-
 src/include/access/heapam.h              | 19 ++++++++++++++++---
 4 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8f1c11a9350..7ff9a930844 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
@@ -1310,6 +1310,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
 														  sizeof(TBMIterateResult));
 	}
 
+	scan->rs_vmbuffer = InvalidBuffer;
 
 	return (TableScanDesc) scan;
 }
@@ -1348,6 +1349,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
 		scan->rs_cbuf = InvalidBuffer;
 	}
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+	{
+		ReleaseBuffer(scan->rs_vmbuffer);
+		scan->rs_vmbuffer = InvalidBuffer;
+	}
+
 	/*
 	 * SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
 	 * additional data vs a normal HeapScan
@@ -1380,6 +1387,9 @@ heap_endscan(TableScanDesc sscan)
 	if (BufferIsValid(scan->rs_cbuf))
 		ReleaseBuffer(scan->rs_cbuf);
 
+	if (BufferIsValid(scan->rs_vmbuffer))
+		ReleaseBuffer(scan->rs_vmbuffer);
+
 	/*
 	 * Must free the read stream before freeing the BufferAccessStrategy.
 	 */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 5137d2510ea..b6ed5938477 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
 
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
+	hscan->xs_vmbuffer = InvalidBuffer;
 
 	return &hscan->xs_base;
 }
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
 		ReleaseBuffer(hscan->xs_cbuf);
 		hscan->xs_cbuf = InvalidBuffer;
 	}
+
+	if (BufferIsValid(hscan->xs_vmbuffer))
+	{
+		ReleaseBuffer(hscan->xs_vmbuffer);
+		hscan->xs_vmbuffer = InvalidBuffer;
+	}
 }
 
 static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 * Prune page, but only if we weren't already on this page
 		 */
 		if (prev_buf != hscan->xs_cbuf)
-			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+								&hscan->xs_vmbuffer);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2533,7 +2541,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6beeb6956e3..8d9f0694206 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -207,9 +207,13 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * if there's not any use in pruning.
  *
  * Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * This function may pin *vmbuffer. It's passed by reference so the caller can
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
+ * responsible for unpinning it.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ad993c07311..2fdc50b865b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
+	/*
+	 * For sequential scans and bitmap heap scans. The current heap block's
+	 * corresponding page in the visibility map.
+	 */
+	Buffer		rs_vmbuffer;
+
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
 	uint32		rs_cindex;		/* current tuple's index in vistuples */
 	uint32		rs_ntuples;		/* number of visible tuples on page */
@@ -116,8 +122,14 @@ typedef struct IndexFetchHeapData
 {
 	IndexFetchTableData xs_base;	/* AM independent part of the descriptor */
 
-	Buffer		xs_cbuf;		/* current heap buffer in scan, if any */
-	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+	/*
+	 * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+	 * InvalidBuffer, we hold a pin on that buffer.
+	 */
+	Buffer		xs_cbuf;
+
+	/* Current heap block's corresponding page in the visibility map */
+	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
@@ -422,7 +434,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 											  TM_IndexDeleteOp *delstate);
 
 /* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+								Buffer *vmbuffer);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v37-0002-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 3-v37-0002-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 421fdc75faa283d435f4a1a3da7f322be0a8e0f4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v37 02/15] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..2a0d54136b6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, it's possible that the bit
+		 * got cleared after heap_vac_scan_next_block() was called, so we must
+		 * recheck now that we have the buffer lock before concluding that the
+		 * VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v37-0003-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 4-v37-0003-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 1acfb16425bc9adafde80c46ffd97c95b8a79571 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v37 03/15] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2a0d54136b6..b35ebdc134d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v37-0004-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 5-v37-0004-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 6696f6cfa18216ade8943cc27e2c46a1ccc55e2b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v37 04/15] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b35ebdc134d..c5e036053d3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v37-0005-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 6-v37-0005-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From ba68efa89610e45a153591af875b9215bca0e7c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v37 05/15] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5e036053d3..d9a06f3115c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze ? true : false;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v37-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 7-v37-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From b8e7bcf3ad1132b58c7f045465ed61da3a027475 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v37 06/15] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d9a06f3115c..479892b0808 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also is passed and if the page
+ * is found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v37-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v37-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 00c89d9283d8fcbdfc8f309a3903ffcacad7b11e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v37 07/15] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v37-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 9-v37-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 6c344aca95cf22851c300c96509a312a58b19e2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v37 08/15] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..a7005b57e61 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ff9a930844..0d6e3bc7884 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8883,50 +8883,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 6d39a5fff7c..a83f6b03d69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1367,9 +1230,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 479892b0808..94be0348509 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3da19d41413..44948d6d611 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4360,7 +4360,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v37-0009-Initialize-missing-fields-in-CreateExecutorState.patch (1.0K, 10-v37-0009-Initialize-missing-fields-in-CreateExecutorState.patch)
  download | inline diff:
From 52ca0331db0cdf58672562a912de9423217adab9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sun, 1 Mar 2026 16:48:19 -0500
Subject: [PATCH v37 09/15] Initialize missing fields in CreateExecutorState()

d47cbf474ecbd449a4 forgot to initialize a few fields it introduced in
the EState, so do that now.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/executor/execUtils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..cd4d5452cfb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,9 @@ CreateExecutorState(void)
 	estate->es_rteperminfos = NIL;
 	estate->es_plannedstmt = NULL;
 	estate->es_part_prune_infos = NIL;
+	estate->es_part_prune_states = NIL;
+	estate->es_part_prune_results = NIL;
+	estate->es_unpruned_relids = NULL;
 
 	estate->es_junkFilter = NULL;
 
-- 
2.43.0



  [text/x-patch] v37-0010-Track-which-relations-are-modified-by-a-query.patch (5.5K, 11-v37-0010-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 8acf2bbb878ce445a061e0ab18edcd6b66099e55 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v37 10/15] Track which relations are modified by a query

Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..05f032baeaa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -703,6 +703,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v37-0011-Thread-flags-through-begin-scan-APIs.patch (21.5K, 12-v37-0011-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From b1cf736a37b4c5ba7bb390585a7002b969b8abeb Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v37 11/15] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 ++-
 src/backend/access/gin/gininsert.c        |  3 ++-
 src/backend/access/heap/heapam_handler.c  |  6 +++---
 src/backend/access/index/genam.c          |  4 ++--
 src/backend/access/index/indexam.c        |  6 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        |  7 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 ++++----
 src/backend/commands/typecmds.c           |  4 ++--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 ++++----
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  2 +-
 src/backend/executor/nodeIndexscan.c      |  4 ++--
 src/backend/executor/nodeSeqscan.c        |  6 +++---
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  2 +-
 src/include/access/tableam.h              | 17 +++++++++--------
 22 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1909c3254b5..a221e032f5d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c7e38dbe193..d48c85e895c 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2061,7 +2061,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b6ed5938477..f4b169e2c04 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 43f64a0e721..1827208396c 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..b3aeee36ce6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 85242dcc245..09796fa4307 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6388,7 +6388,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13765,7 +13765,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22667,7 +22667,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23131,7 +23131,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e0b6df64767..b3b6da3d7e4 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..cf4d9a4f832 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..a7af2f6628a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e881e4f82a0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -420,7 +420,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +894,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +939,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -1139,7 +1139,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1175,7 +1176,7 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1186,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v37-0012-Pass-down-information-on-table-modification-to-s.patch (8.0K, 13-v37-0012-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1a3685e4cf28fa0668861e3e5b25cfa7cb216c85 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v37 12/15] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 +++++++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 +++++++-
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++--
 src/backend/executor/nodeSeqscan.c        | 26 ++++++++++++++++++++---
 src/include/access/heapam.h               |  6 ++++++
 src/include/access/tableam.h              |  3 +++
 7 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index f4b169e2c04..098ca32fa84 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index b3b6da3d7e4..9bcf9a68183 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index cf4d9a4f832..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a7af2f6628a..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags = SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags = SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..caa5e9b4206 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -130,6 +130,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e881e4f82a0..51dfd122307 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v37-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.8K, 14-v37-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 56278ea42704a13fa6af34bb3dbb797170080e8b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v37 13/15] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 41 +++++++++++++++----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 ++++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0d6e3bc7884..abc6fe904fb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 098ca32fa84..b8a2010c188 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 94be0348509..9f545a1eaf2 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -873,21 +880,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1126,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index caa5e9b4206..21b640d459c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -96,7 +97,8 @@ typedef struct HeapScanDescData
 
 	/*
 	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * corresponding page in the visibility map. If the relation is not
+	 * modified by the query, on-access pruning may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -128,7 +130,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -439,7 +445,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v37-0014-Avoid-BufferGetPage-calls-in-heap_update.patch (5.6K, 15-v37-0014-Avoid-BufferGetPage-calls-in-heap_update.patch)
  download | inline diff:
From cb6b3b22f2a1b56ee9bdda8fd605ab2c956555b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 6 Mar 2026 16:46:01 -0500
Subject: [PATCH v37 14/15] Avoid BufferGetPage() calls in heap_update()

BufferGetPage() isn't cheap and heap_update() calls it multiple times
when it could just save the page from a single call. Do that.
While we are at it, make separate variables for old and new page in
heap_xlog_update(). It's confusing to reuse "page" for both pages.
---
 src/backend/access/heap/heapam.c      | 17 ++++++++------
 src/backend/access/heap/heapam_xlog.c | 34 ++++++++++++++-------------
 2 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index abc6fe904fb..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3339,7 +3339,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
 	HeapTuple	heaptup;
 	HeapTuple	old_key_tuple = NULL;
 	bool		old_key_copied = false;
-	Page		page;
+	Page		page,
+				newpage;
 	BlockNumber block;
 	MultiXactStatus mxact_status;
 	Buffer		buffer,
@@ -4065,6 +4066,8 @@ l2:
 		heaptup = newtup;
 	}
 
+	newpage = BufferGetPage(newbuf);
+
 	/*
 	 * We're about to do the actual update -- check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -4179,17 +4182,17 @@ l2:
 	oldtup.t_data->t_ctid = heaptup->t_self;
 
 	/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
-	if (newbuf != buffer && PageIsAllVisible(BufferGetPage(newbuf)))
+	if (newbuf != buffer && PageIsAllVisible(newpage))
 	{
 		all_visible_cleared_new = true;
-		PageClearAllVisible(BufferGetPage(newbuf));
+		PageClearAllVisible(newpage);
 		visibilitymap_clear(relation, BufferGetBlockNumber(newbuf),
 							vmbuffer_new, VISIBILITYMAP_VALID_BITS);
 	}
@@ -4220,9 +4223,9 @@ l2:
 								 all_visible_cleared_new);
 		if (newbuf != buffer)
 		{
-			PageSetLSN(BufferGetPage(newbuf), recptr);
+			PageSetLSN(newpage, recptr);
 		}
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(page, recptr);
 	}
 
 	END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a83f6b03d69..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -685,7 +685,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	ItemPointerData newtid;
 	Buffer		obuffer,
 				nbuffer;
-	Page		page;
+	Page		opage,
+				npage;
 	OffsetNumber offnum;
 	ItemId		lp;
 	HeapTupleData oldtup;
@@ -749,15 +750,15 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 									  &obuffer);
 	if (oldaction == BLK_NEEDS_REDO)
 	{
-		page = BufferGetPage(obuffer);
+		opage = BufferGetPage(obuffer);
 		offnum = xlrec->old_offnum;
-		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(page))
+		if (offnum < 1 || offnum > PageGetMaxOffsetNumber(opage))
 			elog(PANIC, "offnum out of range");
-		lp = PageGetItemId(page, offnum);
+		lp = PageGetItemId(opage, offnum);
 		if (!ItemIdIsNormal(lp))
 			elog(PANIC, "invalid lp");
 
-		htup = (HeapTupleHeader) PageGetItem(page, lp);
+		htup = (HeapTupleHeader) PageGetItem(opage, lp);
 
 		oldtup.t_data = htup;
 		oldtup.t_len = ItemIdGetLength(lp);
@@ -776,12 +777,12 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		htup->t_ctid = newtid;
 
 		/* Mark the page as a candidate for pruning */
-		PageSetPrunable(page, XLogRecGetXid(record));
+		PageSetPrunable(opage, XLogRecGetXid(record));
 
 		if (xlrec->flags & XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(opage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(opage, lsn);
 		MarkBufferDirty(obuffer);
 	}
 
@@ -796,8 +797,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 	else if (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE)
 	{
 		nbuffer = XLogInitBufferForRedo(record, 0);
-		page = BufferGetPage(nbuffer);
-		PageInit(page, BufferGetPageSize(nbuffer), 0);
+		npage = BufferGetPage(nbuffer);
+		PageInit(npage, BufferGetPageSize(nbuffer), 0);
 		newaction = BLK_NEEDS_REDO;
 	}
 	else
@@ -829,10 +830,10 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		recdata = XLogRecGetBlockData(record, 0, &datalen);
 		recdata_end = recdata + datalen;
 
-		page = BufferGetPage(nbuffer);
+		npage = BufferGetPage(nbuffer);
 
 		offnum = xlrec->new_offnum;
-		if (PageGetMaxOffsetNumber(page) + 1 < offnum)
+		if (PageGetMaxOffsetNumber(npage) + 1 < offnum)
 			elog(PANIC, "invalid max offset number");
 
 		if (xlrec->flags & XLH_UPDATE_PREFIX_FROM_OLD)
@@ -909,16 +910,17 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		/* Make sure there is no forward chain link in t_ctid */
 		htup->t_ctid = newtid;
 
-		offnum = PageAddItem(page, htup, newlen, offnum, true, true);
+		offnum = PageAddItem(npage, htup, newlen, offnum, true, true);
 		if (offnum == InvalidOffsetNumber)
 			elog(PANIC, "failed to add tuple");
 
 		if (xlrec->flags & XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED)
-			PageClearAllVisible(page);
+			PageClearAllVisible(npage);
 
-		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+		/* needed to update FSM below */
+		freespace = PageGetHeapFreeSpace(npage);
 
-		PageSetLSN(page, lsn);
+		PageSetLSN(npage, lsn);
 		MarkBufferDirty(nbuffer);
 	}
 
-- 
2.43.0



  [text/x-patch] v37-0015-Set-pd_prune_xid-on-insert.patch (10.4K, 16-v37-0015-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From ad64476c5afb9180ae99bf202681833d9dbbdfbe Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v37 15/15] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 14 +++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f545a1eaf2..9e51c961c3c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1849,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-15 19:10                 ` Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-15 19:10 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 11, 2026 at 1:01 PM Melanie Plageman
<[email protected]> wrote:
>
> On Fri, Mar 6, 2026 at 6:33 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > Thanks for the review! Attached is v36. I've pushed some of the early
> > patches in the set and this is what is left.
>
> I've gone ahead and pushed another of the introductory commits.
> Attached v37 has the remaining patches.

I've pushed a few more of the trivial commits in the set. Attached v38
has the remaining patches.

- Melanie


Attachments:

  [text/x-patch] v38-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 2-v38-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 0ca92d2ccee0e589a35a79f9046c3a7900ecacf4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v38 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v38-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 3-v38-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 4ebe52f1b060db395d8abe5255ea1a86ed4fdc4a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v38 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..a4a0a916f61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v38-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 4-v38-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 07396958d6588cd82ac420555b4d4b25194ced2d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v38 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4a0a916f61..05fe3deeb95 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v38-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 5-v38-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From a3211750778f6a8bec42edd25f5763e2ae31d21c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v38 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05fe3deeb95..01c19ca8796 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v38-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v38-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 532fa6da3a3b691f0cafcc18a57ae2251a8a7725 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v38 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 01c19ca8796..a127e29144e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v38-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v38-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From db4cc2361ccc446b54df3d2d5afde70f6869dde1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v38 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v38-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v38-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 97a248e7711eaed31954dd7089790e3369b0c58a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v38 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a127e29144e..9b5a0726f2b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ec8513d90b5..4c7ce9bd4b5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4389,7 +4389,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v38-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v38-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From f707d345f3ee43a9b5e914e4d496c83485ea380b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v38 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..57dcdeda056 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -990,6 +994,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3033,6 +3041,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3165,6 +3179,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cd4d5452cfb..0f8364b8720 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -123,6 +123,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -871,6 +873,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -896,6 +925,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82c442d23f8..1411d5276ca 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -705,6 +705,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..610385df12b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -679,6 +679,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v38-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v38-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From e501ec27844ae056c9d5b0439e327ded450c9ce2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v38 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 1909c3254b5..a221e032f5d 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2842,7 +2842,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 97cea5f7d4e..74243efa74f 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2065,7 +2065,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 42bf73d3138..6122603d11e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,7 +79,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -761,7 +761,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -770,7 +770,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 69ef1527e06..bc4eedba4ac 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1927,7 +1927,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6ef7275a64..900199dbe29 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1159,7 +1159,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cd6d720386f..0455b36c41e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6396,7 +6396,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13965,7 +13965,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22867,7 +22867,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23331,7 +23331,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 74eac93284e..620fc7e259a 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -108,7 +108,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9e8ea8ddf22..aefb792ee6e 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -94,7 +94,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -788,7 +788,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -854,7 +854,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 4513b1f7a90..477cd4fcf99 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -111,7 +111,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -207,7 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1723,7 +1723,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1787,7 +1787,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 1b0af70fd7a..47660baf2fa 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -297,7 +297,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..d9d7ec0516a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -374,7 +374,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -407,5 +407,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 503817da65b..461edb8893b 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -459,7 +459,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -493,5 +493,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..9abcc99d6c8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -182,7 +182,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..c2621dc2fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,8 +95,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v38-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v38-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 3a6b08fc3219afd79dc81a5219e6a543d67036f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v38 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6122603d11e..d35b688d751 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -86,6 +86,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 620fc7e259a..a5ab5e2b37f 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -104,11 +104,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index aefb792ee6e..6d7a32c1cb8 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -761,6 +768,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -782,13 +790,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -829,6 +842,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -848,13 +862,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 477cd4fcf99..52b7fc46593 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1696,6 +1710,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1717,13 +1732,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1762,6 +1781,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1781,13 +1801,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 47660baf2fa..62eff19bc4f 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -291,13 +291,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index d9d7ec0516a..65349ea9c54 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 461edb8893b..7fbdf401734 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -451,15 +457,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -489,9 +501,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c2621dc2fac..978ea90ffa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v38-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v38-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From aa05e68336207dbb64c0468ab3f017f8f66f9e05 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v38 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d35b688d751..a083b69ffcd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -147,7 +147,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2542,7 +2543,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b5a0726f2b..3cdc1a36441 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -440,9 +447,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -873,21 +879,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 978ea90ffa2..768d442c39c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -97,7 +98,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -129,7 +131,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -440,7 +446,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v38-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v38-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From fc597950684dad6328114ac0d10f791bc52b53c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v38 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3cdc1a36441..7cb9e1e2aac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1848,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-16 14:53                   ` Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-16 14:53 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sun, Mar 15, 2026 at 3:10 PM Melanie Plageman
<[email protected]> wrote:
>
> I've pushed a few more of the trivial commits in the set. Attached v38
> has the remaining patches.

Looks like cfbot wasn't able to rebase v38 on its own for some reason.
v39 attached.

- Melanie


Attachments:

  [text/x-patch] v39-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.6K, 2-v39-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c49f30de550eb8f7c87a7ae80435abda3021fa3a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v39 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v39-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (4.5K, 3-v39-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c | 75 +++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..a4a0a916f61 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -184,6 +184,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -882,6 +883,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1008,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can exit early. Do this after fixing any
+	 * discrepancy between the page-level visibility hint and the VM.
+	 */
+	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
-- 
2.43.0



  [text/x-patch] v39-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.4K, 4-v39-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 5eac34a809eac866d0cd6bf58e305464d3f2e094 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v39 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a4a0a916f61..05fe3deeb95 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1028,6 +1028,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1695,29 +1706,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad85e1e1738 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2054,13 +2054,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2816,7 +2813,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3577,14 +3574,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3605,7 +3602,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3624,7 +3621,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3705,7 +3702,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3714,16 +3711,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3752,6 +3750,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..bbb223dd0d2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -479,6 +479,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v39-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.8K, 5-v39-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 9d36149f134e4935eda6e37f111faf164a9bd063 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v39 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 05fe3deeb95..01c19ca8796 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,11 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -174,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -435,53 +432,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -711,7 +690,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -966,9 +944,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,9 +1011,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1187,7 +1164,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1647,6 +1624,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1694,32 +1672,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad85e1e1738..23402e7e26c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2789,7 +2789,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2815,14 +2815,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2863,7 +2863,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3576,7 +3576,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3584,7 +3584,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3607,7 +3607,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3625,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3724,9 +3724,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3756,8 +3756,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v39-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v39-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From bc988115bb293945e0d09028bf235976ef90c8c2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v39 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 01c19ca8796..a127e29144e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -209,7 +213,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -375,9 +379,10 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
@@ -840,7 +845,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -858,7 +863,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -887,15 +928,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -925,7 +964,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -940,12 +980,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -973,15 +1011,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -990,8 +1030,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * is not being attempted, we can exit early. Do this after fixing any
 	 * discrepancy between the page-level visibility hint and the VM.
 	 */
-	if (prstate.vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		(prstate.vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
+	if (prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		(prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE && !prstate.attempt_freeze))
 	{
 		heap_page_bypass_prune_freeze(&prstate, presult);
 		return;
@@ -1061,6 +1101,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1082,14 +1146,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1103,6 +1170,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1110,29 +1198,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1142,33 +1213,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23402e7e26c..6b5210d6393 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2038,29 +2029,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2081,6 +2049,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2094,71 +2070,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3572,7 +3483,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bbb223dd0d2..f77a00291bb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -264,7 +264,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -280,8 +281,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -315,26 +315,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -471,7 +457,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v39-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v39-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From 5c17a542a95c880f6a8ffaa1dd92baf12b96a1ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v39 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6b5210d6393..1451c943644 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v39-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v39-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 2e58fcd19b1bf57b0796f2ddcd74a6f2ee760ead Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v39 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a127e29144e..9b5a0726f2b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1187,8 +1187,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1451c943644..8bd178ae7e6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2794,9 +2794,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..3102c61125e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4409,7 +4409,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v39-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v39-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From cb87aa75f03e0c211cfab4f582d10eec7e0a50aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v39 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..8d22b6db867 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -922,6 +922,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3180,6 +3194,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..7dfa95c2cbe 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,6 +125,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -873,6 +875,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -898,6 +927,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..080cfdac48e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -707,6 +707,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..d2f4f8ea748 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -688,6 +688,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v39-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v39-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 335d6419b443b0c574a4458212bde607ad70a89d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v39 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..1e950d8e6e5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,7 +80,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -762,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dfdde986236..4b50d325612 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22888,7 +22888,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23352,7 +23352,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..17bf4976cce 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -790,7 +790,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +856,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..88bdf0a52d1 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +209,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1726,7 +1726,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1790,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..db102803eb5 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +184,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f77a00291bb..c2621dc2fac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,8 +95,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v39-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v39-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From bf3e55f226c1f1aacac0b2739a6f42973942c6c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v39 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1e950d8e6e5..aec5199b2e6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -87,6 +87,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..0f30e6980de 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 17bf4976cce..3fab715f879 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -95,7 +101,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -763,6 +770,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -784,13 +792,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -831,6 +844,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -850,13 +864,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 88bdf0a52d1..6a235ef25ce 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,6 +104,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -113,7 +119,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -200,6 +207,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -209,7 +222,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1699,6 +1713,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1720,13 +1735,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1765,6 +1784,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1784,13 +1804,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..8d36fcda48a 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..9356973802b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +375,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +418,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..04a75e72fe1 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +458,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +502,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c2621dc2fac..978ea90ffa2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v39-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v39-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 31c38dd70fb80b7bc6f2224529b6159a4886f11b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v39 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index aec5199b2e6..17d625944e8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2543,7 +2544,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b5a0726f2b..3cdc1a36441 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -213,7 +215,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -233,7 +236,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -315,6 +319,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = 0;
+			if (rel_read_only)
+				params.options = HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -371,6 +377,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -440,9 +447,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -873,21 +879,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1103,7 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8bd178ae7e6..d2cae77b52a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 978ea90ffa2..768d442c39c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -97,7 +98,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -129,7 +131,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -440,7 +446,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v39-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v39-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 8483ddbb7f3226f73262be80031630638e413f37 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v39 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3cdc1a36441..7cb9e1e2aac 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -255,7 +255,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1848,16 +1849,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-17 09:05                     ` Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Kirill Reshke @ 2026-03-17 09:05 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Chao Li <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, 16 Mar 2026 at 19:53, Melanie Plageman
<[email protected]> wrote:
>
> On Sun, Mar 15, 2026 at 3:10 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > I've pushed a few more of the trivial commits in the set. Attached v38
> > has the remaining patches.
>
> Looks like cfbot wasn't able to rebase v38 on its own for some reason.
> v39 attached.
>
> - Melanie

Hi!

I did take a quick look on v38-v39.

0001 & 0003 looks ok.

> From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 25 Feb 2026 16:48:19 -0500
> Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
 all-frozen pages

For the record, does this work with DISABLE_PAGE_SKIPPING? I think we
don't  want the server to "fast-path" in case this option is set by
the user...



-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-03-17 14:48                       ` Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-17 14:48 UTC (permalink / raw)
  To: Kirill Reshke <[email protected]>; +Cc: Chao Li <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 17, 2026 at 5:05 AM Kirill Reshke <[email protected]> wrote:
>
> > From 788860ded375fcf744201347b9dcbf496070bfb5 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 25 Feb 2026 16:48:19 -0500
> > Subject: [PATCH v39 02/12] Add pruning fast path for all-visible and
>  all-frozen pages
>
> For the record, does this work with DISABLE_PAGE_SKIPPING? I think we
> don't  want the server to "fast-path" in case this option is set by
> the user...

Hmm. This is a good point. The docs for DISABLE_PAGE_SKIPPING say it
is about fixing visibility map corruption and the fast path does
detect and fix one type of visibility map corruption. It does not
investigate for dead line pointers, though. I suppose
DISABLE_PAGE_SKIPPING would want to also do that kind of VM corruption
detection. Thanks for thinking of that. Attached v40 adds an option to
disable the fast path.

- Melanie


Attachments:

  [text/x-patch] v40-0001-Fix-visibility-map-corruption-in-more-cases.patch (18.7K, 2-v40-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From b02011f54bc2d79a2ac9be199aa6d0495ecaa958 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v40 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 176 ++++++++++++++++++++++++---
 src/backend/access/heap/vacuumlazy.c |  89 +-------------
 src/include/access/heapam.h          |  12 ++
 3 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..52cafb23c6b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -168,6 +183,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +191,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +226,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +375,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->vmbits = visibilitymap_get_status(prstate->relation,
+											   prstate->block,
+											   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +797,90 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Helper to fix visibility-related corruption on a heap page and its
+ * corresponding VM page. An all-visible page cannot have dead items nor can
+ * it have tuples that are not visible to all running transactions. It clears
+ * the VM corruption as well as resetting the vmbits used during pruning.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and any dead items must have been discovered under that same lock.
+ * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
+ * buffer is exclusively locked, ensuring that no other backend can update the
+ * VM bits corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	if (PageIsAllVisible(prstate->page))
+	{
+		/*
+		 * It's possible for the value returned by
+		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+		 * wrong for us to see tuples that appear to not be visible to
+		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
+		 * xmin value never moves backwards, but
+		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
+		 * returns a value that's unnecessarily small, so if we see that
+		 * contradiction it just means that the tuples that we think are not
+		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
+		 * is correct.
+		 *
+		 * However, there should never be LP_DEAD items, dead tuple versions,
+		 * or tuples inserted by an in-progress transaction on a page with
+		 * PD_ALL_VISIBLE set.
+		 */
+		if (prstate->lpdead_items > 0)
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+		else
+		{
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+		}
+
+		/*
+		 * Mark the buffer dirty now in case we make no further changes and
+		 * therefore would not mark it dirty later.
+		 */
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	{
+		/*
+		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
+		 * the page-level bit is clear. However, for vacuum, it's possible
+		 * that the bit got cleared after heap_vac_scan_next_block() was
+		 * called, so we must recheck now that we have the buffer lock before
+		 * concluding that the VM is corrupt.
+		 */
+		ereport(WARNING,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("page is not marked all-visible but visibility map bit is set"),
+				 errcontext("relation \"%s\", page %u",
+							relname, prstate->block)));
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1088,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->vmbits = prstate.vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1411,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1422,13 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1512,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1660,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1676,10 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_fix_vm_corruption(prstate, offnum);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1704,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1771,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_fix_vm_corruption(prstate, offnum);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 82c5b28e0ad..957322648ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -425,11 +425,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1964,81 +1959,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2070,6 +1990,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2179,18 +2100,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..c649e5f1980 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
+	 * pruning. It is cleared if VM corruption is found and corrected.
+	 */
+	uint8		vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
-- 
2.43.0



  [text/x-patch] v40-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.4K, 3-v40-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From a503285e012de12539df384d615675c1e48e5cfd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v40 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early. We can't
exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 92 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52cafb23c6b..bf740c37f3d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -184,6 +190,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
 static void heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum);
+static void heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -312,7 +319,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -381,6 +388,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 											   prstate->block,
 											   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -882,6 +899,68 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 	prstate->vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling heap_page_bypass_prune_freeze(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->vmbits = prstate->vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	presult->hastup = true;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -945,6 +1024,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		heap_page_bypass_prune_freeze(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 957322648ca..ad7a3290821 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2019,6 +2019,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c649e5f1980..0b571d7089f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v40-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.5K, 4-v40-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 255fc9aeb721ba96ee3a7b7c3e675a4ee11087d6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v40 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 +++++++++
 src/backend/access/heap/pruneheap.c         | 37 +++++++--------
 src/backend/access/heap/vacuumlazy.c        | 51 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 72 insertions(+), 40 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bf740c37f3d..c85e4172ee8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1043,6 +1043,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1710,29 +1721,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ad7a3290821..7097aa7b772 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -461,13 +461,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2064,13 +2064,10 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
 	}
 #endif
 
@@ -2826,7 +2823,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3587,14 +3584,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3615,7 +3612,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3634,7 +3631,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3715,7 +3712,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3724,16 +3721,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3762,6 +3760,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0b571d7089f..9312886ad4b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v40-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (14.9K, 5-v40-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 127 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++----
 2 files changed, 65 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c85e4172ee8..d276770b9b4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,11 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -180,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /* Local functions */
@@ -451,53 +448,35 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. Other callers must initialize prstate.set_all_frozen to false,
+	 * since we will not call heap_prepare_freeze_tuple() for each tuple.
+	 *
+	 * We only consider opportunistic freezing if the page would become
+	 * all-frozen, or if it would be all-frozen except for dead tuples that
+	 * VACUUM will remove.
+	 *
+	 * Dead tuples that will be removed by the end of vacuum should not
+	 * prevent opportunistic freezing. Therefore, we do not clear
+	 * set_all_visible and set_all_frozen when we encounter LP_DEAD items.
+	 * Instead, we correct them after deciding whether to freeze, but before
+	 * updating the VM, to avoid setting the VM bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -727,7 +706,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -982,9 +960,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1049,9 +1026,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1202,7 +1179,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1662,6 +1639,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1709,32 +1687,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 7097aa7b772..4d52de1a96c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,7 +463,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -471,7 +471,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2799,7 +2799,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2825,14 +2825,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2873,7 +2873,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3586,7 +3586,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3594,7 +3594,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3617,7 +3617,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3635,7 +3635,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3645,7 +3645,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3734,9 +3734,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3766,8 +3766,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v40-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (24.7K, 6-v40-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
  download | inline diff:
From 05dfe8841e4a90dc595775863d58bacce996d70b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v40 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
 prune/freeze

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

This change applies only to vacuum phase I, not to pruning performed
during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 263 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 107 +----------
 src/include/access/heapam.h          |  38 ++--
 3 files changed, 208 insertions(+), 200 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d276770b9b4..633d44adb03 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
-	uint8		vmbits;
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
+	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /* Local functions */
@@ -215,7 +219,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -381,17 +385,18 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
-	prstate->vmbits = visibilitymap_get_status(prstate->relation,
-											   prstate->block,
-											   &prstate->vmbuffer);
+	prstate->new_vmbits = 0;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
 
 	/*
 	 * If the page is already all-frozen, or already all-visible when freezing
 	 * is not being attempted, we can skip pruning and freezing entirely.
 	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
 	 */
-	prstate->fast_path = ((prstate->vmbits & VISIBILITYMAP_ALL_FROZEN) ||
-						  ((prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
 						   !prstate->attempt_freeze)) &&
 		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
 
@@ -856,7 +861,7 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 		PageClearAllVisible(prstate->page);
 		MarkBufferDirtyHint(prstate->buffer, true);
 	}
-	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
+	else if (prstate->old_vmbits & VISIBILITYMAP_VALID_BITS)
 	{
 		/*
 		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
@@ -874,7 +879,43 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
 
 	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
 						VISIBILITYMAP_VALID_BITS);
-	prstate->vmbits = 0;
+	prstate->old_vmbits = 0;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * consider setting the VM.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
 }
 
 /*
@@ -903,15 +944,13 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
 	Page		page = prstate->page;
 
-	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
-		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
 			!prstate->attempt_freeze));
 
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->vmbits = prstate->vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -941,7 +980,8 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -956,12 +996,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -989,15 +1027,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid = InvalidTransactionId;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
-	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
 		!PageIsAllVisible(prstate.page))
 		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
 
@@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1118,6 +1185,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1125,29 +1213,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1157,33 +1228,67 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->vmbits = prstate.vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->new_all_visible_pages = 0;
+	presult->new_all_frozen_pages = 0;
+	presult->new_all_visible_frozen_pages = 0;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->new_all_visible_pages = 1;
+			if (prstate.set_all_frozen)
+				presult->new_all_visible_frozen_pages = 1;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->new_all_frozen_pages = 1;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d52de1a96c..5ea96087fad 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,13 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -1996,8 +1989,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2048,29 +2039,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2091,6 +2059,14 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	vacrel->new_all_visible_pages += presult.new_all_visible_pages;
+	vacrel->new_all_visible_all_frozen_pages += presult.new_all_visible_frozen_pages;
+	vacrel->new_all_frozen_pages += presult.new_all_frozen_pages;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+		presult.new_all_frozen_pages > 0;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2104,71 +2080,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3582,7 +3493,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9312886ad4b..4ce63990326 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * vmbits is the value of the vmbuffer's vmbits at the beginning of
-	 * pruning. It is cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		vmbits;
+	BlockNumber new_all_visible_pages;
+	BlockNumber new_all_visible_frozen_pages;
+	BlockNumber new_all_frozen_pages;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,7 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
 										 Buffer buffer);
-- 
2.43.0



  [text/x-patch] v40-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 7-v40-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
  download | inline diff:
From c47a6270a0a0045347cdb4597b957798d21db4aa Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v40 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5ea96087fad..9bfe3c545ff 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1903,9 +1903,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v40-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (25.0K, 8-v40-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 181c83f0652bfebe0db2f11983ad08b52c8c780b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v40 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   4 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 110 +---------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |  12 +-
 src/include/access/heapam_xlog.h         |  20 ---
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 38 insertions(+), 372 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d9042e1f91d 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
 
 	/*
 	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
+	 * more details.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 633d44adb03..ba00521d834 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1202,8 +1202,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfe3c545ff..93a4437f29b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1926,11 +1926,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2804,9 +2804,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..f1da52b2069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -222,112 +221,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +241,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..66ed51a8aa1 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
 	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
-	 * (original execution of the deletion can reason that a recovery conflict
-	 * which is sufficient for the deletion operation must take place before
-	 * replay of the deletion record itself).
+	 * standby crash or restart, or when replaying a record that marks as
+	 * frozen a page which was already marked all-visible in the visibility
+	 * map.  It's also quite common with records generated during index
+	 * deletion (original execution of the deletion can reason that a recovery
+	 * conflict which is sufficient for the deletion operation must take place
+	 * before replay of the deletion record itself).
 	 */
 	if (!TransactionIdIsValid(snapshotConflictHorizon))
 		return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..3102c61125e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4409,7 +4409,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v40-0008-Track-which-relations-are-modified-by-a-query.patch (5.8K, 9-v40-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v40 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the executor state.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c  | 18 ++++++++++++++++++
 src/backend/executor/execUtils.c | 31 +++++++++++++++++++++++++++++++
 src/include/executor/executor.h  |  3 +++
 src/include/nodes/execnodes.h    |  6 ++++++
 4 files changed, 58 insertions(+)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..8d22b6db867 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -922,6 +922,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 					break;
 			}
 
+			/* If it has a rowmark, the relation may be modified */
+			estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+														rc->rti);
+
 			/* Check that relation is a legal target for marking */
 			if (relation)
 				CheckValidRowMarkRel(relation, rc->markType);
@@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
@@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	rcestate->es_output_cid = parentestate->es_output_cid;
 	rcestate->es_queryEnv = parentestate->es_queryEnv;
 
+	/*
+	 * Use a deep copy to avoid stale pointers since bms_add_member() may
+	 * reallocate the bitmap.
+	 */
+	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
+
 	/*
 	 * ResultRelInfos needed by subplans are initialized from scratch when the
 	 * subplans themselves are initialized.
@@ -3180,6 +3194,10 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
 	 */
 	epqstate->recheckplanstate = ExecInitNode(planTree, rcestate, 0);
 
+#ifdef USE_ASSERT_CHECKING
+	CrossCheckModifiedRelids(rcestate);
+#endif
+
 	MemoryContextSwitchTo(oldcontext);
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..7dfa95c2cbe 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -125,6 +125,8 @@ CreateExecutorState(void)
 	estate->es_part_prune_results = NIL;
 	estate->es_unpruned_relids = NULL;
 
+	estate->es_modified_relids = NULL;
+
 	estate->es_junkFilter = NULL;
 
 	estate->es_output_cid = (CommandId) 0;
@@ -873,6 +875,33 @@ ExecGetRangeTableRelation(EState *estate, Index rti, bool isResultRel)
 	return rel;
 }
 
+#ifdef USE_ASSERT_CHECKING
+/*
+ * Assert that es_modified_relids includes all potentially modified RT
+ * indexes.
+ */
+void
+CrossCheckModifiedRelids(EState *estate)
+{
+	Bitmapset  *expected = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = lfirst_node(ResultRelInfo, lc);
+
+		expected = bms_add_member(expected, rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (Index rti = 1; rti <= estate->es_range_table_size; rti++)
+			if (estate->es_rowmarks[rti - 1] != NULL)
+				expected = bms_add_member(expected, rti);
+	}
+	Assert(bms_is_subset(expected, estate->es_modified_relids));
+}
+#endif
+
 /*
  * ExecInitResultRelation
  *		Open relation given by the passed-in RT index and fill its
@@ -898,6 +927,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 		estate->es_result_relations = (ResultRelInfo **)
 			palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
 	estate->es_result_relations[rti - 1] = resultRelInfo;
+	estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+												rti);
 
 	/*
 	 * Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..080cfdac48e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -707,6 +707,9 @@ extern Relation ExecGetRangeTableRelation(EState *estate, Index rti,
 										  bool isResultRel);
 extern void ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 								   Index rti);
+#ifdef USE_ASSERT_CHECKING
+extern void CrossCheckModifiedRelids(EState *estate);
+#endif
 
 extern int	executor_errposition(EState *estate, int location);
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..d2f4f8ea748 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -688,6 +688,12 @@ typedef struct EState
 									 * ExecDoInitialPruning() */
 	const char *es_sourceText;	/* Source text from QueryDesc */
 
+	/*
+	 * RT indexes of relations modified by the query through a
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+	 */
+	Bitmapset  *es_modified_relids;
+
 	JunkFilter *es_junkFilter;	/* top-level junk filter, if any */
 
 	/* If query can insert/delete tuples, the command ID to mark them with */
-- 
2.43.0



  [text/x-patch] v40-0009-Thread-flags-through-begin-scan-APIs.patch (28.1K, 10-v40-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 05d736fb5b0600effede5e030d5b929274dabe2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:17 -0500
Subject: [PATCH v40 09/12] Thread flags through begin-scan APIs

Add a flags parameter to the index_fetch_begin() table AM callback and
the begin-scan helpers so the executor can pass context for building
scan descriptors. This introduces an extension point for follow-up work
to mark relations as read-only for the current query, without changing
behavior in this patch.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  6 ++--
 src/backend/access/index/genam.c          |  4 +--
 src/backend/access/index/indexam.c        |  8 +++---
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 13 +++++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +++---
 src/backend/commands/typecmds.c           |  4 +--
 src/backend/executor/execIndexing.c       |  2 +-
 src/backend/executor/execReplication.c    |  8 +++---
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  6 ++--
 src/backend/executor/nodeIndexscan.c      |  8 +++---
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 ++--
 src/backend/executor/nodeTidrangescan.c   |  6 ++--
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  2 +-
 src/include/access/genam.h                |  5 ++--
 src/include/access/heapam.h               |  5 ++--
 src/include/access/tableam.h              | 35 ++++++++++++++---------
 25 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..1e950d8e6e5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,7 +80,7 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
@@ -762,7 +762,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +771,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1fe7ffb2487 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +716,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..87219613f0b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys, uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +593,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..e946cfb393a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -190,12 +191,14 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
 
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
 	/* disable syncscan in parallel tid range scan. */
@@ -248,7 +251,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 67e42e5df29..cc2ec9393a8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22881,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23345,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cb3e4f67ea1 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..5b8ca1abf62 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
 
 retry:
 	found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..17bf4976cce 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys, 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -790,7 +790,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +856,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..88bdf0a52d1 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,7 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +209,7 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys, 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1726,7 +1726,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1790,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d4da0e8dea9..5b2165c267d 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7161,7 +7161,7 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0, 0);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..db102803eb5 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys, uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +184,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4ce63990326..3820bbd7f9f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -96,8 +96,9 @@ typedef struct HeapScanDescData
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
 	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map.
 	 */
 	Buffer		rs_vmbuffer;
 
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f1065e30638 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -418,9 +418,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * 'flags' is a bitmask of SO_* flags providing hints from the executor
+	 * about the scan context.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -894,9 +897,9 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	flags |= SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
@@ -939,9 +942,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
 	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
 }
@@ -957,9 +960,9 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	flags |= SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
 		flags |= SO_ALLOW_STRAT;
@@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+
+	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
 
@@ -1139,7 +1143,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1154,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,10 +1178,13 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * 'flags' is a bitmask of SO_* flags providing hints from the executor about
+ * the scan context.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
@@ -1185,7 +1194,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v40-0010-Pass-down-information-on-table-modification-to-s.patch (14.5K, 11-v40-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 7790c8177ba3aa8a8bd1a216ea77fdfd42efc1bf Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v40 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/access/heap/heapam_handler.c  |  1 +
 src/backend/executor/nodeBitmapHeapscan.c |  9 ++++++-
 src/backend/executor/nodeIndexonlyscan.c  | 25 +++++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 32 ++++++++++++++++++++---
 src/backend/executor/nodeSamplescan.c     |  8 +++++-
 src/backend/executor/nodeSeqscan.c        | 26 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 24 ++++++++++++++---
 src/include/access/heapam.h               |  6 +++++
 src/include/access/tableam.h              |  3 +++
 9 files changed, 119 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1e950d8e6e5..aec5199b2e6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -87,6 +87,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
 	hscan->xs_base.rel = rel;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
+	hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
 
 	return &hscan->xs_base;
 }
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..0f30e6980de 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   node->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 17bf4976cce..3fab715f879 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -95,7 +101,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys, 0);
+								   node->ioss_NumOrderByKeys,
+								   flags);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -763,6 +770,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -784,13 +792,18 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -831,6 +844,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -850,13 +864,18 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 88bdf0a52d1..6a235ef25ce 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,6 +104,12 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -113,7 +119,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -200,6 +207,12 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
@@ -209,7 +222,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys, 0);
+								   node->iss_NumOrderByKeys,
+								   flags);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1699,6 +1713,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1720,13 +1735,17 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1765,6 +1784,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	ParallelIndexScanDesc piscan;
 	bool		instrument = node->ss.ps.instrument != NULL;
 	bool		parallel_aware = node->ss.ps.plan->parallel_aware;
+	uint32		flags = 0;
 
 	if (!instrument && !parallel_aware)
 	{
@@ -1784,13 +1804,17 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 		return;
 	}
 
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan, flags);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..8d36fcda48a 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,19 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) scanstate->ss.ps.plan)->scanrelid,
+						   scanstate->ss.ps.state->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..9356973802b 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = 0;
+
+		if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+						   estate->es_modified_relids))
+			flags |= SO_HINT_REL_READ_ONLY;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +375,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +418,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..04a75e72fe1 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,16 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = 0;
+
+			if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+							   estate->es_modified_relids))
+				flags |= SO_HINT_REL_READ_ONLY;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +458,21 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   estate->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +502,15 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+					   node->ss.ps.state->es_modified_relids))
+		flags |= SO_HINT_REL_READ_ONLY;
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3820bbd7f9f..1a7306e2935 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -132,6 +132,12 @@ typedef struct IndexFetchHeapData
 
 	/* Current heap block's corresponding page in the visibility map */
 	Buffer		xs_vmbuffer;
+
+	/*
+	 * Some optimizations can only be performed if the query does not modify
+	 * the underlying relation. Track that here.
+	 */
+	bool		modifies_base_rel;
 } IndexFetchHeapData;
 
 /* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f1065e30638..57ce94a386f 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
-- 
2.43.0



  [text/x-patch] v40-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.5K, 12-v40-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 0a16dad7a4ebe224f35629a39619d0feb03f03a3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v40 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              |  3 +-
 src/backend/access/heap/heapam_handler.c      |  6 ++-
 src/backend/access/heap/pruneheap.c           | 46 ++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c          |  2 +-
 src/include/access/heapam.h                   | 12 +++--
 .../t/035_standby_logical_decoding.pl         |  3 +-
 6 files changed, 52 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index aec5199b2e6..17d625944e8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								!hscan->modifies_base_rel);
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2543,7 +2544,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ba00521d834..4475457fdde 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -219,7 +221,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -239,7 +242,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -321,6 +325,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -377,6 +383,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -456,9 +463,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -889,21 +895,37 @@ heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * consider setting the VM.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1118,7 +1140,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 93a4437f29b..1ddd31c7ead 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2008,7 +2008,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 1a7306e2935..e9617b1e666 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -98,7 +99,8 @@ typedef struct HeapScanDescData
 	/*
 	 * For sequential scans, bitmap heap scans, TID range scans, and sample
 	 * scans. The current heap block's corresponding page in the visibility
-	 * map.
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
 	 */
 	Buffer		rs_vmbuffer;
 
@@ -130,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 
 	/*
@@ -441,7 +447,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
 max_replication_slots = 4
 max_wal_senders = 4
 autovacuum = off
+hot_standby_feedback = on
 });
 $node_primary->dump_info;
 $node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
 $logstart = -s $node_standby->logfile;
 
 reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
-	'no_conflict_', 0, 1);
+	'no_conflict_', 1, 0);
 
 # This should not trigger a conflict
 wait_until_vacuum_can_remove(
-- 
2.43.0



  [text/x-patch] v40-0012-Set-pd_prune_xid-on-insert.patch (10.9K, 13-v40-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From e4c7112d49e650f59dab834d3db6007c69f34f1a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v40 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to run and set the VM
all-visible after a page is filled with newly inserted tuples the first
time it is read.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c              | 40 ++++++++++++-------
 src/backend/access/heap/heapam_xlog.c         | 19 ++++++++-
 src/backend/access/heap/pruneheap.c           | 17 ++++----
 .../modules/index/expected/killtuples.out     |  8 ++--
 4 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..c199646b25d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4475457fdde..2c49bc72f4b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -261,7 +261,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_fix_vm_corruption(prstate, offnum);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..700144d6783 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -329,7 +329,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
 step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
 new_heap_accesses
 -----------------
-                1
+                2
 (1 row)
 
 step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-18 17:14                         ` Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Andres Freund @ 2026-03-18 17:14 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-03-17 10:48:55 -0400, Melanie Plageman wrote:
> @@ -277,6 +295,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>  		{
>  			OffsetNumber dummy_off_loc;
>  			PruneFreezeResult presult;
> +			PruneFreezeParams params;
> +
> +			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);

We also do a BufferGetBlockNumber(buffer) in prune_freeze_setup().  It irks me
a bit to do that twice, but I don't see a non-ugly way to avoid that.


> +			params.relation = relation;
> +			params.buffer = buffer;
> +			params.vmbuffer = *vmbuffer;
> +			params.reason = PRUNE_ON_ACCESS;
> +			params.vistest = vistest;
> +			params.cutoffs = NULL;
>  

>  			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
> @@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
>  			 * cannot safely determine that during on-access pruning with the
>  			 * current implementation.
>  			 */
> -			PruneFreezeParams params = {
> -				.relation = relation,
> -				.buffer = buffer,
> -				.reason = PRUNE_ON_ACCESS,
> -				.options = 0,
> -				.vistest = vistest,
> -				.cutoffs = NULL,
> -			};
> +			params.options = 0;
>  
>  			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
>  									   NULL, NULL);

Why does this change the way the PruneFreezeParams variable is defined?  I
don't really mind, it's just a bit confusing.



> +/*
> + * Helper to fix visibility-related corruption on a heap page and its
> + * corresponding VM page. An all-visible page cannot have dead items nor can
> + * it have tuples that are not visible to all running transactions. It clears
> + * the VM corruption as well as resetting the vmbits used during pruning.

So this is now only called when we already know there's corruption?  I think
that could be clearer in the comments.


Seems a bit odd that the function then figures out what it should do from the
page & VM contents, given that the caller already needs to have known that
something is wrong?


> + * This function must be called while holding an exclusive lock on the heap
> + * buffer, and any dead items must have been discovered under that same lock.
> + * Although we do not hold a lock on the VM buffer, it is pinned, and the heap
> + * buffer is exclusively locked, ensuring that no other backend can update the
> + * VM bits corresponding to this heap page.
> + *
> + * This function makes changes to the VM and, potentially, the heap page, but
> + * it does not need to be done in a critical section.
> + */
> +static void
> +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> +{
> +	const char *relname = RelationGetRelationName(prstate->relation);
> +
> +	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
> +
> +	if (PageIsAllVisible(prstate->page))
> +	{
> +		/*
> +		 * It's possible for the value returned by
> +		 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
> +		 * wrong for us to see tuples that appear to not be visible to
> +		 * everyone yet, while PD_ALL_VISIBLE is already set. The real safe
> +		 * xmin value never moves backwards, but
> +		 * GetOldestNonRemovableTransactionId() is conservative and sometimes
> +		 * returns a value that's unnecessarily small, so if we see that
> +		 * contradiction it just means that the tuples that we think are not
> +		 * visible to everyone yet actually are, and the PD_ALL_VISIBLE flag
> +		 * is correct.
> +		 *
> +		 * However, there should never be LP_DEAD items, dead tuple versions,
> +		 * or tuples inserted by an in-progress transaction on a page with
> +		 * PD_ALL_VISIBLE set.
> +		 */
> +		if (prstate->lpdead_items > 0)
> +		{
> +			ereport(WARNING,
> +					(errcode(ERRCODE_DATA_CORRUPTED),
> +					 errmsg("dead line pointer found on page marked all-visible"),
> +					 errcontext("relation \"%s\", page %u, tuple %u",
> +								relname, prstate->block, offnum)));
> +		}
> +		else
> +		{
> +			ereport(WARNING,
> +					(errcode(ERRCODE_DATA_CORRUPTED),
> +					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
> +					 errcontext("relation \"%s\", page %u, tuple %u",
> +								relname, prstate->block, offnum)));
> +		}

Wait, why are we now WARNING about the PageIsAllVisible() &&
prstate->lpdead_items == 0 case? Seems to run flatly counter to the comment
above about GetOldestNonRemovableTransactionId() going backward?


> +		/*
> +		 * Mark the buffer dirty now in case we make no further changes and
> +		 * therefore would not mark it dirty later.
> +		 */
> +		PageClearAllVisible(prstate->page);
> +		MarkBufferDirtyHint(prstate->buffer, true);
> +	}
> +	else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> +	{
> +		/*
> +		 * As of PostgreSQL 9.2, the visibility map bit should never be set if
> +		 * the page-level bit is clear. However, for vacuum, it's possible
> +		 * that the bit got cleared after heap_vac_scan_next_block() was
> +		 * called, so we must recheck now that we have the buffer lock before
> +		 * concluding that the VM is corrupt.
> +		 */
> +		ereport(WARNING,
> +				(errcode(ERRCODE_DATA_CORRUPTED),
> +				 errmsg("page is not marked all-visible but visibility map bit is set"),
> +				 errcontext("relation \"%s\", page %u",
> +							relname, prstate->block)));
> +	}
> +
> +	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
> +						VISIBILITYMAP_VALID_BITS);
> +	prstate->vmbits = 0;

So we can end up clearing the VM without emitting any warning?


>  /*
>   * Prune and repair fragmentation and potentially freeze tuples on the
> @@ -830,6 +941,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  					   new_relfrozen_xid, new_relmin_mxid,
>  					   presult, &prstate);
>  
> +	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
> +		!PageIsAllVisible(prstate.page))
> +		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);
> +
>  	/*
>  	 * Examine all line pointers and tuple visibility information to determine
>  	 * which line pointers should change state and which tuples may be frozen.

Feels like there should be an explanation here for why we are clearing the VM?




> From a503285e012de12539df384d615675c1e48e5cfd Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 25 Feb 2026 16:48:19 -0500
> Subject: [PATCH v40 02/12] Add pruning fast path for all-visible and
>  all-frozen pages
> 
> Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
> heap_page_prune_and_freeze() can be invoked for pages with no pruning or
> freezing work. To avoid this, if a page is already all-frozen or it is
> all-visible and no freezing will be attempted, we exit early. We can't
> exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.
> 



> +static void
> +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> +{
> +	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> +	Page		page = prstate->page;
> +
> +	Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> +		   (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> +			!prstate->attempt_freeze));
> +
> +	/* We'll fill in presult for the caller */
> +	memset(presult, 0, sizeof(PruneFreezeResult));
> +
> +	presult->vmbits = prstate->vmbits;
> +
> +	/* Clear any stale prune hint */
> +	if (TransactionIdIsValid(PageGetPruneXid(page)))
> +	{
> +		PageClearPrunable(page);
> +		MarkBufferDirtyHint(prstate->buffer, true);
> +	}
> +
> +	if (PageIsEmpty(page))
> +		return;
> +
> +	presult->hastup = true;

Is that actually a given? Couldn't the page consist solely out of unused
items? That'd make PageIsEmpty() return false, but should still allow
truncation.





> From 255fc9aeb721ba96ee3a7b7c3e675a4ee11087d6 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 17 Dec 2025 16:51:05 -0500
> Subject: [PATCH v40 03/12] Use GlobalVisState in vacuum to determine page
>  level visibility
> 
> During vacuum's first and third phases, we examine tuples' visibility
> to determine if we can set the page all-visible in the visibility map.
> 
> Previously, this check compared tuple xmins against a single XID chosen at
> the start of vacuum (OldestXmin). We now use GlobalVisState, which also
> enables future work to set the VM during on-access pruning, since ordinary
> queries have access to GlobalVisState but not OldestXmin.
> 
> This also benefits vacuum: in some cases, GlobalVisState may advance
> during a vacuum, allowing more pages to become considered all-visible.
> And, in the future, we could easily add a heuristic to update
> GlobalVisState more frequently during vacuums of large tables.
> 
> OldestXmin is still used for freezing and as a backstop to ensure we
> don't freeze a dead tuple that wasn't yet prunable according to
> GlobalVisState in the rare occurrences where GlobalVisState moves
> backwards.

> Because comparing a transaction ID against GlobalVisState is more
> expensive than comparing against a single XID, we defer this check until
> after scanning all tuples on the page. Therefore, we perform the
> GlobalVisState check only once per page. This is safe because
> visibility_cutoff_xid records the newest live xmin on the page;
> if it is globally visible, then the entire page is all-visible.
> 
> Using GlobalVisState means on-access pruning can also maintain
> visibility_cutoff_xid. This approach will result in examining more tuple
> xmins than before; however, the additional cost should not be
> significant. And doing so will enable us to set the visibility map on
> access in the future.


I wish there were a good way to trigger errors if visibility_cutoff_xid were
ever read after prstate->set_all_frozen is set to false... But I guess that'll
be moot in a few commits.



> diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
> index bf740c37f3d..c85e4172ee8 100644
> --- a/src/backend/access/heap/pruneheap.c
> +++ b/src/backend/access/heap/pruneheap.c
> @@ -1043,6 +1043,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	 */
>  	prune_freeze_plan(&prstate, off_loc);
>  
> +	/*
> +	 * After processing all the live tuples on the page, if the newest xmin
> +	 * amongst them may be considered running by any snapshot, the page cannot
> +	 * be all-visible.
> +	 */
> +	if (prstate.set_all_visible &&
> +		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
> +		GlobalVisTestXidMaybeRunning(prstate.vistest,
> +									 prstate.visibility_cutoff_xid))
> +		prstate.set_all_visible = prstate.set_all_frozen = false;
> +

So the docs for prstate.visibility_cutoff_xid say:

	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
	 * The caller can use it as the conflict horizon, when setting the VM
	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
	 * true.

But here we look at it without checking that we froze some tuples.  I guess
the comment is outdated?



> @@ -3615,7 +3612,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
>   * Returns true if the page is all-visible other than the provided
>   * deadoffsets and false otherwise.
>   *
> - * OldestXmin is used to determine visibility.
> + * vistest is used to determine visibility.
>   *
>   * Output parameters:
>   *

Could the "going backward" thing possibly trigger a spurious assert in

        Assert(heap_page_is_all_visible(vacrel->rel, buf,
                                        vacrel->vistest, &debug_all_frozen,
                                        &debug_cutoff, &vacrel->offnum));



> From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 28 Feb 2026 16:06:51 -0500
> Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
>  all-visible

I guess I'd have expected 03 and 04 to be swapped... But whatever.


> +	 * Currently, only VACUUM performs freezing, but other callers may in the
> +	 * future. Other callers must initialize prstate.set_all_frozen to false,
> +	 * since we will not call heap_prepare_freeze_tuple() for each tuple.

What does it mean that other callers need to "initialize
prstate.set_all_frozen to false"? It's not like they can do that, because
prstate is defined in heap_page_prune_and_freeze().


> +	 * We only consider opportunistic freezing if the page would become
> +	 * all-frozen, or if it would be all-frozen except for dead tuples that
> +	 * VACUUM will remove.

It kinda feels like "opportunistic freezing" is not defined at this point.  It
wasn't super clear before either, but there was at least this:

-     * In addition to telling the caller whether it can set the VM bit, we
-     * also use 'set_all_visible' and 'set_all_frozen' for our own
-     * decision-making. If the whole page would become frozen, we consider
-     * opportunistically freezing tuples.  We will not be able to freeze the
-     * whole page if there are tuples present that are not visible to everyone
-     * or if there are dead tuples which are not yet removable.  However, dead
-     * tuples which will be removed by the end of vacuuming should not
-     * preclude us from opportunistically freezing.  Because of that, we do

Which seems to provide a bit more explanation than "We only consider
opportunistic freezing"...


> From 05dfe8841e4a90dc595775863d58bacce996d70b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 2 Dec 2025 15:07:42 -0500
> Subject: [PATCH v40 05/12] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
>  prune/freeze

"Eliminate" kinda makes me think this just removes WAL logging for
visibilitymap sets or such. Perhaps consider rephrasing it as something like
"WAL log setting VM as part of XLOG_HEAP2_PRUNE_*"


> This change applies only to vacuum phase I, not to pruning performed
> during normal page access.

Maybe + "For now this ..."


> @@ -215,7 +219,7 @@ static void page_verify_redirects(Page page);
>  
>  static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
>  								  PruneState *prstate);
> -
> +static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
>  
>  /*
>   * Optionally prune and repair fragmentation in the specified page.

Previously there were two newlines between the declarations and code, now only
one. Intentional?




> +/*
> + * Decide whether to set the visibility map bits (all-visible and all-frozen)
> + * for heap_blk using information from the PruneState and VM.
> + *
> + * This function does not actually set the VM bits or page-level visibility
> + * hint, PD_ALL_VISIBLE.
> + *
> + * Returns true if one or both VM bits should be set and false otherwise.
> + */
> +static bool
> +heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
> +{
> +	/*
> +	 * Though on-access pruning maintains prstate->set_all_visible, we don't
> +	 * consider setting the VM.
> +	 */
> +	if (reason == PRUNE_ON_ACCESS)
> +		return false;

Nitpick^2: We kind of are considering based on this comment :).  I'd just
s/consider setting/set/, maybe with a +for now.


> @@ -956,12 +996,10 @@ heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
>   * tuples if it's required in order to advance relfrozenxid / relminmxid, or
>   * if it's considered advantageous for overall system performance to do so
>   * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
> - * 'new_relmin_mxid' arguments are required when freezing.  When
> - * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
> - * presult->set_all_visible and presult->set_all_frozen after determining
> - * whether or not to opportunistically freeze, to indicate if the VM bits can
> - * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
> - * option is not passed.
> + * 'new_relmin_mxid' arguments are required when freezing.
> + *
> + * A vmbuffer corresponding to the heap page is also passed and if the page is
> + * found to be all-visible/all-frozen, we will set it in the VM.
>   *
>   * presult contains output parameters needed by callers, such as the number of
>   * tuples removed and the offsets of dead items on the page after pruning.
> @@ -989,15 +1027,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  	bool		do_freeze;
>  	bool		do_prune;
>  	bool		do_hint_prune;
> +	bool		do_set_vm;
>  	bool		did_tuple_hint_fpi;
>  	int64		fpi_before = pgWalUsage.wal_fpi;
> +	TransactionId conflict_xid = InvalidTransactionId;
>  
>  	/* Initialize prstate */
>  	prune_freeze_setup(params,
>  					   new_relfrozen_xid, new_relmin_mxid,
>  					   presult, &prstate);
>  
> -	if ((prstate.vmbits & VISIBILITYMAP_VALID_BITS) &&
> +	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
>  		!PageIsAllVisible(prstate.page))
>  		heap_fix_vm_corruption(&prstate, InvalidOffsetNumber);

There are so many changes related to s/vmbits/old_vmbits/. How about naming it
old_vmbits from the start? That'll make this commit a lot less noisy.



> @@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  		prstate.set_all_visible = prstate.set_all_frozen = false;
>  
>  	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
> +	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));

Why didn't we have this assert earlier?


> +	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);

Most of the other heap_page_prune_and_freeze() helpers are named
heap_prune_xyz(), why not follow that here?

I guess this holds for a few other helpers added in earlier commits
too. E.g. heap_page_bypass_prune_freeze() should probably be
heap_prune_bypass_prune_freeze() or such.


> +	/*
> +	 * new_vmbits should be 0 regardless of whether or not the page is
> +	 * all-visible if we do not intend to set the VM.
> +	 */
> +	Assert(do_set_vm || prstate.new_vmbits == 0);
> +
> +	/*
> +	 * The snapshot conflict horizon for the whole record is the most
> +	 * conservative (newest) horizon required by any change in the record.
> +	 */
> +	if (do_set_vm)
> +		conflict_xid = prstate.newest_live_xid;
> +	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
> +		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
> +	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
> +		conflict_xid = prstate.latest_xid_removed;

I guess I'd personally move the initialization of conflict_xid to
InvalidTransactionId to just before the if, to make it clearer where we start
from if !do_set_vm.


> @@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  
>  		/*
>  		 * If that's all we had to do to the page, this is a non-WAL-logged
> -		 * hint.  If we are going to freeze or prune the page, we will mark
> -		 * the buffer dirty below.
> +		 * hint.  If we are going to freeze or prune the page or set
> +		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
> +		 *
> +		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
> +		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
>  		 */
> -		if (!do_freeze && !do_prune)
> +		if (!do_freeze && !do_prune && !do_set_vm)
>  			MarkBufferDirtyHint(prstate.buffer, true);
>  	}

This block is gated by if (do_hint_prune) which is computed as:

	/*
	 * Even if we don't prune anything, if we found a new value for the
	 * pd_prune_xid field or the page was marked full, we will update the hint
	 * bit.
	 */
	do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
		PageIsFull(prstate.page);

It's not really related to this change, but I'm just confused a bit by the
"|| PageIsFull(prstate.page)". What is that about? Why do we want to mark the
buffer DirtyHint if the page is full? It very well might already have been
marked as such, no?



> -	if (do_prune || do_freeze)
> +	if (do_prune || do_freeze || do_set_vm)
>  	{
>  		/* Apply the planned item changes and repair page fragmentation. */
>  		if (do_prune)
> @@ -1118,6 +1185,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
>  		if (do_freeze)
>  			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
>  
> +		/* Set the visibility map and page visibility hint */
> +		if (do_set_vm)
> +		{
> +			/*
> +			 * While it is valid for PD_ALL_VISIBLE to be set when the
> +			 * corresponding VM bit is clear, we strongly prefer to keep them
> +			 * in sync.
> +			 *
> +			 * The heap buffer must be marked dirty before adding it to the
> +			 * WAL chain when setting the VM. We don't worry about
> +			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
> +			 * already set, though. It is extremely rare to have a clean heap
> +			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
> +			 * so there is no point in optimizing it.
> +			 */
> +			PageSetAllVisible(prstate.page);
> +			PageClearPrunable(prstate.page);

Idle thought, not to be acted on now: Eventually it could make sense to not do
PageClearPrunable() if we are not marking the page frozen, but instead replace
the prune xid with something triggering on-access pruning when freezing is
reasonable.


> +	/*
> +	 * During its second pass over the heap, VACUUM calls
> +	 * heap_page_would_be_all_visible() to determine whether a page is
> +	 * all-visible and all-frozen. The logic here is similar. After completing
> +	 * pruning and freezing, use an assertion to verify that our results
> +	 * remain consistent with heap_page_would_be_all_visible().
> +	 */
> +#ifdef USE_ASSERT_CHECKING
> +	if (prstate.set_all_visible)
> +	{
> +		TransactionId debug_cutoff;
> +		bool		debug_all_frozen;
> +
> +		Assert(prstate.lpdead_items == 0);
> +
> +		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
> +										prstate.vistest,
> +										&debug_all_frozen,
> +										&debug_cutoff, off_loc));
> +
> +		/*
> +		 * It's possible the page is composed entirely of frozen tuples but is
> +		 * not set all-frozen in the VM and did not pass
> +		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
> +		 * heap_page_is_all_visible() finds the page completely frozen, even
> +		 * though prstate.set_all_frozen is false.
> +		 */
> +		Assert(!prstate.set_all_frozen || debug_all_frozen);

Seems like we could verify that debug_cutoff isn't newer than conflict_xid?


> +	}
> +#endif

Hm.  I guess aborting after we did incorrect pruning/freezing/VMing is better
than not, but it'd be even better if we did it before corrupting things. But I
guess it'd be not trivial to add something like the debug_cutoff assertion I
suggest above, when freezing of tuples is only executed after
heap_page_is_all_visible() (for dead tuples heap_page_would_be_all_visible()
already has provisions).

It's probably more a theoretical concern than a real worry.

> +	presult->new_all_visible_pages = 0;
> +	presult->new_all_frozen_pages = 0;
> +	presult->new_all_visible_frozen_pages = 0;

Isn't it odd to talk about pages here? Given that heap_page_prune_and_freeze()
only ever operates on exactly one page.  Is that just so you can do

> +	vacrel->new_all_visible_pages += presult.new_all_visible_pages;

etc?


> +	if (do_set_vm)
> +	{
> +		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> +		{
> +			presult->new_all_visible_pages = 1;
> +			if (prstate.set_all_frozen)
> +				presult->new_all_visible_frozen_pages = 1;
> +		}
> +		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> +				 prstate.set_all_frozen)
> +			presult->new_all_frozen_pages = 1;
> +	}
> +
>  	if (prstate.attempt_freeze)
>  	{
>  		if (presult->nfrozen > 0)

Feels like this is kinda redoing what heap_page_will_set_vm already did.


> @@ -472,7 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
>  /* in heap/vacuumlazy.c */
>  extern void heap_vacuum_rel(Relation rel,
>  							const VacuumParams params, BufferAccessStrategy bstrategy);
> -
> +#ifdef USE_ASSERT_CHECKING
> +extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
> +									 GlobalVisState *vistest,
> +									 bool *all_frozen,
> +									 TransactionId *visibility_cutoff_xid,
> +									 OffsetNumber *logging_offnum);
> +#endif
>  /* in heap/heapam_visibility.c */
>  extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
>  										 Buffer buffer);

I'd not remove the newline before "/* in heap/heapam_visibility.c */". Other
"sections" do have that newline before the "/* in $filename */" comment too.


> From c47a6270a0a0045347cdb4597b957798d21db4aa Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:21 -0400
> Subject: [PATCH v40 06/12] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum

Same comment about Eliminate as in the prior commit.

Perhaps worth mentioning more explicitly that this doesn't really have an
advantage other than getting rid of the last user of XLOG_HEAP2_VISIBLE?



> @@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
>  			PageSetAllVisible(page);
>  			PageClearPrunable(page);
> -			visibilitymap_set(vacrel->rel, blkno, buf,
> -							  InvalidXLogRecPtr,
> -							  vmbuffer, InvalidTransactionId,
> -							  VISIBILITYMAP_ALL_VISIBLE |
> -							  VISIBILITYMAP_ALL_FROZEN);
> +			visibilitymap_set_vmbits(blkno,
> +									 vmbuffer,
> +									 VISIBILITYMAP_ALL_VISIBLE |
> +									 VISIBILITYMAP_ALL_FROZEN,
> +									 vacrel->rel->rd_locator);
> +
> +			/*
> +			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
> +			 * setting the VM.
> +			 */
> +			if (RelationNeedsWAL(vacrel->rel))
> +				log_heap_prune_and_freeze(vacrel->rel, buf,
> +										  vmbuffer,
> +										  VISIBILITYMAP_ALL_VISIBLE |
> +										  VISIBILITYMAP_ALL_FROZEN,
> +										  InvalidTransactionId, /* conflict xid */
> +										  false,	/* cleanup lock */
> +										  PRUNE_VACUUM_SCAN,	/* reason */
> +										  NULL, 0,
> +										  NULL, 0,
> +										  NULL, 0,
> +										  NULL, 0);
> +
>  			END_CRIT_SECTION();

It's a tad odd that we do:

			/*
			 * It's possible that another backend has extended the heap,
			 * initialized the page, and then failed to WAL-log the page due
			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
			 * might try to replay our record setting the page all-visible and
			 * find that the page isn't initialized, which will cause a PANIC.
			 * To prevent that, check whether the page has been previously
			 * WAL-logged, and if not, do that now.
			 */
			if (RelationNeedsWAL(vacrel->rel) &&
				!XLogRecPtrIsValid(PageGetLSN(page)))
				log_newpage_buffer(buf, true);

if we then immediately afterwards emit a WAL record that could just as well
have included in FPI of the heap page.



> From 181c83f0652bfebe0db2f11983ad08b52c8c780b Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Sat, 27 Sep 2025 11:55:36 -0400
> Subject: [PATCH v40 07/12] Remove XLOG_HEAP2_VISIBLE entirely
> 
> There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
> can be removed. This includes deleting the xl_heap_visible struct and
> all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
> records.

> This changes the visibility map API, so any external users/consumers of
> the VM-only WAL record will need to change.

I hope there aren't any. Not sure I can really see scenarios in which that'd
be a safe thing to do from an external user...




> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
> index ce3566ba949..5eed567a8e5 100644
> --- a/src/include/access/heapam_xlog.h
> +++ b/src/include/access/heapam_xlog.h
> @@ -60,7 +60,6 @@
>  #define XLOG_HEAP2_PRUNE_ON_ACCESS      0x10
>  #define XLOG_HEAP2_PRUNE_VACUUM_SCAN    0x20
>  #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
> -#define XLOG_HEAP2_VISIBLE      0x40
>  #define XLOG_HEAP2_MULTI_INSERT 0x50
>  #define XLOG_HEAP2_LOCK_UPDATED 0x60
>  #define XLOG_HEAP2_NEW_CID      0x70
> @@ -443,20 +442,6 @@ typedef struct xl_heap_inplace

I think other places with a gap in the "actions" mention that some value is
now unused.


> diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
> index 8a67bfa1aff..d9042e1f91d 100644
> --- a/src/backend/access/common/bufmask.c
> +++ b/src/backend/access/common/bufmask.c
> @@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
>  
>  	/*
>  	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
> -	 * we don't mark the page all-visible. See heap_xlog_visible() for
> -	 * details.
> +	 * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
> +	 * more details.
>  	 */
>  	PageClearAllVisible(page);
>  }

Not introduced by your change, but isn't it rather terrifying that the
wal_consistency_checking infrastructure doesn't verify whether the page is
marked all-visible? Wasn't aware of this. Seems bonkers to me.

I don't even know what specifically in heap_xlog_visible() that comment is
referring to? Just that we only do PageSetAllVisible() if BLK_NEEDS_REDO? But
uh, what does that have to do with anything?


> diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> index e21b96281a6..f1da52b2069 100644
> --- a/src/backend/access/heap/visibilitymap.c
> +++ b/src/backend/access/heap/visibilitymap.c
> @@ -14,8 +14,7 @@
>   *		visibilitymap_clear  - clear bits for one page in the visibility map
>   *		visibilitymap_pin	 - pin a map page for setting a bit
>   *		visibilitymap_pin_ok - check whether correct map page is already pinned
> - *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
> - *		visibilitymap_set_vmbits - set bit(s) in a pinned page
> + *		visibilitymap_set	 - set bit(s) in a previously pinned page
>   *		visibilitymap_get_status - get status of bits
>   *		visibilitymap_count  - count number of bits set in visibility map
>   *		visibilitymap_prepare_truncate -

There's a comment saying:

 * Clearing visibility map bits is not separately WAL-logged.  The callers
 * must make sure that whenever a bit is cleared, the bit is cleared on WAL
 * replay of the updating operation as well.

Which kinda implies that setting the VM *is* separately WAL logged. But that's
not true anymore.  Maybe rephrase that ever so slightly?


> @@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
>  	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
>  	 *
>  	 * This can happen when replaying already-applied WAL records after a
> -	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
> -	 * record that marks as frozen a page which was already all-visible.  It's
> -	 * also quite common with records generated during index deletion
> -	 * (original execution of the deletion can reason that a recovery conflict
> -	 * which is sufficient for the deletion operation must take place before
> -	 * replay of the deletion record itself).
> +	 * standby crash or restart

Again not about your patch: I don't understand how already applied WAL can
lead to InvalidTransactionId being passed here. The record doesn't change just
because we had already applied the WAL?



> From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 3 Dec 2025 15:07:24 -0500
> Subject: [PATCH v40 08/12] Track which relations are modified by a query
> 
> Save the relids of modified relations in a bitmap in the executor state.
> A later commit will pass this information down to scan nodes to control
> whether or not on-access pruning is allowed to set the visibility map.
> Setting the visibility map during a scan is counterproductive if the
> query is going to modify the page immediately after.
> 
> Relations are considered modified if they are the target of INSERT,
> UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
> FOR UPDATE/SHARE). All row mark types are included, even those which
> don't actually modify tuples, because this bitmap is only used as a hint
> to avoid unnecessary work.

You're probably going to hate me for the question, but is there a reason to
not compute es_modified_relids at plan time?


> @@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
>  	 */
>  	planstate = ExecInitNode(plan, estate, eflags);
>  
> +#ifdef USE_ASSERT_CHECKING
> +	CrossCheckModifiedRelids(estate);
> +#endif

Not sure that buys you much, given it pretty much is just a restatement of the
code building estate->es_modified_relids.

What about checking against PlannedStmt->{resultRelations, permInfos} or
asserting membership at the places that actually lock/modify?


> @@ -3048,6 +3056,12 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
>  	rcestate->es_output_cid = parentestate->es_output_cid;
>  	rcestate->es_queryEnv = parentestate->es_queryEnv;
>  
> +	/*
> +	 * Use a deep copy to avoid stale pointers since bms_add_member() may
> +	 * reallocate the bitmap.
> +	 */
> +	rcestate->es_modified_relids = bms_copy(parentestate->es_modified_relids);
> +
>  	/*
>  	 * ResultRelInfos needed by subplans are initialized from scratch when the
>  	 * subplans themselves are initialized.

Hm. Why copy at all from the parent? Afaict we'll just redo the computation of
es_modified_relids from scratch anyway?  Not sure about it though.



> From 05d736fb5b0600effede5e030d5b929274dabe2c Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 2 Mar 2026 16:31:17 -0500
> Subject: [PATCH v40 09/12] Thread flags through begin-scan APIs
> 
> Add a flags parameter to the index_fetch_begin() table AM callback and
> the begin-scan helpers so the executor can pass context for building
> scan descriptors. This introduces an extension point for follow-up work
> to mark relations as read-only for the current query, without changing
> behavior in this patch.



> diff --git a/src/include/access/genam.h b/src/include/access/genam.h
> index 1a27bf060b3..db102803eb5 100644
> --- a/src/include/access/genam.h
> +++ b/src/include/access/genam.h
> @@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
>  									 Relation indexRelation,
>  									 Snapshot snapshot,
>  									 IndexScanInstrumentation *instrument,
> -									 int nkeys, int norderbys);
> +									 int nkeys, int norderbys, uint32 flags);

I'd probably put flags in a position where it's not as easily confused with
nkeys or norderbys.


> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 4ce63990326..3820bbd7f9f 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -96,8 +96,9 @@ typedef struct HeapScanDescData
>  	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
>  
>  	/*
> -	 * For sequential scans and bitmap heap scans. The current heap block's
> -	 * corresponding page in the visibility map.
> +	 * For sequential scans, bitmap heap scans, TID range scans, and sample
> +	 * scans. The current heap block's corresponding page in the visibility
> +	 * map.
>  	 */
>  	Buffer		rs_vmbuffer;

As you already can see here, exhaustively listing scan types is unlikely to be
maintained over time...


> @@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
>  static inline TableScanDesc
>  table_beginscan_tidrange(Relation rel, Snapshot snapshot,
>  						 ItemPointer mintid,
> -						 ItemPointer maxtid)
> +						 ItemPointer maxtid, uint32 flags)
>  {
>  	TableScanDesc sscan;
> -	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> +
> +	flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
>  
>  	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);

Hm. Would it perhaps be a good idea to have an assert as to which flags are
specified by the "user"? If e.g. another SO_TYPE_* were specified it might
result in some odd behaviour.

Perhaps this would be best done by adding an argument to
table_beginscan_common() specifying the "internal" flags (i.e. the ones that
specified inside table_beginscan_*) and user specified flags?  Then
table_beginscan_common could check the set of user specified flags being sane.



> From 7790c8177ba3aa8a8bd1a216ea77fdfd42efc1bf Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Mon, 2 Mar 2026 16:31:33 -0500
> Subject: [PATCH v40 10/12] Pass down information on table modification to scan
>  node

> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 3820bbd7f9f..1a7306e2935 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -132,6 +132,12 @@ typedef struct IndexFetchHeapData
>  
>  	/* Current heap block's corresponding page in the visibility map */
>  	Buffer		xs_vmbuffer;
> +
> +	/*
> +	 * Some optimizations can only be performed if the query does not modify
> +	 * the underlying relation. Track that here.
> +	 */
> +	bool		modifies_base_rel;
>  } IndexFetchHeapData;
>  

The other members are prefixed with xs_, I don't see a reason to diverge for
this one.

Wonder if this should be in the generic IndexFetchTableData?


> From 0a16dad7a4ebe224f35629a39619d0feb03f03a3 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Fri, 27 Feb 2026 16:33:40 -0500
> Subject: [PATCH v40 11/12] Allow on-access pruning to set pages all-visible
> 
> Many queries do not modify the underlying relation. For such queries, if
> on-access pruning occurs during the scan, we can check whether the page
> has become all-visible and update the visibility map accordingly.
> Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> all-frozen.

> This commit implements on-access VM setting for sequential scans as well
> as for the underlying heap relation in index scans and bitmap heap
> scans.

I'd mention that this often can:
- avoid write amplification, due to vacuum later having to PageSetAllVisible()
  (often triggering another data write and another FPI)
- allow index only scans much earlier than before

I think those are pretty huge benefits, so they should be mentioned.


> diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
> index d264a698ff6..a5536ba4ff6 100644
> --- a/src/test/recovery/t/035_standby_logical_decoding.pl
> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> @@ -296,6 +296,7 @@ wal_level = 'logical'
>  max_replication_slots = 4
>  max_wal_senders = 4
>  autovacuum = off
> +hot_standby_feedback = on
>  });
>  $node_primary->dump_info;
>  $node_primary->start;
> @@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
>  $logstart = -s $node_standby->logfile;
>  
>  reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
> -	'no_conflict_', 0, 1);
> +	'no_conflict_', 1, 0);
>  
>  # This should not trigger a conflict
>  wait_until_vacuum_can_remove(
> -- 
> 2.43.0

Why does this patch need to change anything here? Is the test buggy
independently?



> From e4c7112d49e650f59dab834d3db6007c69f34f1a Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 29 Jul 2025 16:12:56 -0400
> Subject: [PATCH v40 12/12] Set pd_prune_xid on insert
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Now that visibility map (VM) updates can occur during read-only queries,
> it makes sense to also set the page’s pd_prune_xid hint during inserts
> and on the new page during updates.
> 
> This enables heap_page_prune_and_freeze() to run and set the VM
> all-visible after a page is filled with newly inserted tuples the first
> time it is read.
> 
> This change also addresses a long-standing note in heap_insert() and
> heap_multi_insert(), which observed that setting pd_prune_xid would
> help clean up aborted insertions sooner. Without it, such tuples might
> linger until VACUUM, whereas now they can be pruned earlier.

I think this commit message should also mention more what the benefits of
doing this are (i.e. a good potential for reduced write amplicifation and
increased IOS potential).


> The index killtuples test had to be updated to reflect a larger number
> of hits by some accesses. Since the prune_xid is set by the fill/insert
> step, on-access pruning can happen during the first access step (before
> the DELETE). This is when the VM is extended. After the DELETE, the next
> access hits the VM block instead of extending it. Thus, an additional
> buffer hit is counted for the table.

I think that may since already have been solved by f5eb854ab6d.

> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  	TransactionId xid = GetCurrentTransactionId();
>  	HeapTuple	heaptup;
>  	Buffer		buffer;
> +	Page		page;
>  	Buffer		vmbuffer = InvalidBuffer;
>  	bool		all_visible_cleared = false;
>  
> @@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  									   &vmbuffer, NULL,
>  									   0);
>  
> +	page = BufferGetPage(buffer);
> +
>  	/*
>  	 * We're about to do the actual insert -- but check for conflict first, to
>  	 * avoid possibly having to roll back work we've just done.
> @@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>  	RelationPutHeapTuple(relation, buffer, heaptup,
>  						 (options & HEAP_INSERT_SPECULATIVE) != 0);
>  
> -	if (PageIsAllVisible(BufferGetPage(buffer)))
> +	if (PageIsAllVisible(page))
>  	{
>  		all_visible_cleared = true;
> -		PageClearAllVisible(BufferGetPage(buffer));
> +		PageClearAllVisible(page);
>  		visibilitymap_clear(relation,
>  							ItemPointerGetBlockNumber(&(heaptup->t_self)),
>  							vmbuffer, VISIBILITYMAP_VALID_BITS);
>  	}

The repeated BufferGetPage()s have been bothering me, good :)


>  	/*
> -	 * XXX Should we set PageSetPrunable on this page ?
> +	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
> +	 * is full so that we can set the page all-visible in the VM on the next
> +	 * page access.
>  	 *
> -	 * The inserting transaction may eventually abort thus making this tuple
> -	 * DEAD and hence available for pruning. Though we don't want to optimize
> -	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
> -	 * aborted tuple will never be pruned until next vacuum is triggered.
> +	 * Setting pd_prune_xid is also handy if the inserting transaction
> +	 * eventually aborts making this tuple DEAD and hence available for
> +	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
> +	 * tuple would never otherwise be pruned until next vacuum is triggered.
>  	 *
> -	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
> +	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
> +	 * tuple.
>  	 */
> +	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
> +		PageSetPrunable(page, xid);
>  
>  	MarkBufferDirty(buffer);
>  

Perhaps add "as neither of those can be pruned anyway." or such to the last
sentence?



> @@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
>  			prstate->set_all_visible = false;
>  			prstate->set_all_frozen = false;
>  
> -			/* The page should not be marked all-visible */
> -			if (PageIsAllVisible(page))
> -				heap_fix_vm_corruption(prstate, offnum);
> -

Huh?


Getting close, I think.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
@ 2026-03-20 02:38                           ` Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-20 02:38 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the detailed review! Unless otherwise specified, attached
v41 includes all of your straightforward review points.

On Wed, Mar 18, 2026 at 1:14 PM Andres Freund <[email protected]> wrote:
>
> > +                     params.relation = relation;
> > +                     params.buffer = buffer;
> > +                     params.vmbuffer = *vmbuffer;
> > +                     params.reason = PRUNE_ON_ACCESS;
> > +                     params.vistest = vistest;
> > +                     params.cutoffs = NULL;
> >
>
> >                        * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
> > @@ -284,14 +312,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
> >                        * cannot safely determine that during on-access pruning with the
> >                        * current implementation.
> >                        */
> > -                     PruneFreezeParams params = {
> > -                             .relation = relation,
> > -                             .buffer = buffer,
> > -                             .reason = PRUNE_ON_ACCESS,
> > -                             .options = 0,
> > -                             .vistest = vistest,
> > -                             .cutoffs = NULL,
> > -                     };
> > +                     params.options = 0;
> >
> >                       heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
> >                                                                          NULL, NULL);
>
> Why does this change the way the PruneFreezeParams variable is defined?  I
> don't really mind, it's just a bit confusing.

I couldn't use the designated initializer after visibilitymap_pin()
and I thought it was worse to have the designated initializer
nitialize vmbuffer to InvalidBuffer and then have to set vmbuffer to
the real vmbuffer  after visibilitymap_pin().

> > + * Helper to fix visibility-related corruption on a heap page and its
> > + * corresponding VM page. An all-visible page cannot have dead items nor can
> > + * it have tuples that are not visible to all running transactions. It clears
> > + * the VM corruption as well as resetting the vmbits used during pruning.
>
> So this is now only called when we already know there's corruption?  I think
> that could be clearer in the comments.
>
> Seems a bit odd that the function then figures out what it should do from the
> page & VM contents, given that the caller already needs to have known that
> something is wrong?

Yea, it was all a bit off. I agree. I've tried something new and made
a VMCorruptionType enum for the caller to pass in which tells this
function what to do (clear PD_ALL_VISIBLE and/or clear VM) and what
warning to emit.

> > +static void
> > +heap_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum)
> > +{
> > +             {
> > +                     ereport(WARNING,
> > +                                     (errcode(ERRCODE_DATA_CORRUPTED),
> > +                                      errmsg("tuple not visible to all transactions found on page marked all-visible"),
> > +                                      errcontext("relation \"%s\", page %u, tuple %u",
> > +                                                             relname, prstate->block, offnum)));
> > +             }
>
> Wait, why are we now WARNING about the PageIsAllVisible() &&
> prstate->lpdead_items == 0 case? Seems to run flatly counter to the comment
> above about GetOldestNonRemovableTransactionId() going backward?

Only if the page has tuples with HTSV_Result
HEAPTUPLE_RECENTLY_DEAD/DELETE_IN_PROGRESS/INSERT_IN_PROGRESS. Even if
GetOldestNonRemovableTransactionId() goes backwards that should only
make it so that xids we previously thought were visible now show as
not visible to all. But those have to be HEAPTUPLE_LIVE tuples. We
should never thought it was all-visible if there were in-progress
deletes/inserts. So, I think it is okay. Now (in v41), the caller
would need to pass VM_CORRUPT_TUPLE_VISIBILITY and intend to emit the
warning.

> > +     else if (prstate->vmbits & VISIBILITYMAP_VALID_BITS)
> > +     {
> > +             /*
> > +              * As of PostgreSQL 9.2, the visibility map bit should never be set if
> > +              * the page-level bit is clear. However, for vacuum, it's possible
> > +              * that the bit got cleared after heap_vac_scan_next_block() was
> > +              * called, so we must recheck now that we have the buffer lock before
> > +              * concluding that the VM is corrupt.
> > +              */
> > +             ereport(WARNING,
> > +                             (errcode(ERRCODE_DATA_CORRUPTED),
> > +                              errmsg("page is not marked all-visible but visibility map bit is set"),
> > +                              errcontext("relation \"%s\", page %u",
> > +                                                     relname, prstate->block)));
> > +     }
> > +
> > +     visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
> > +                                             VISIBILITYMAP_VALID_BITS);
> > +     prstate->vmbits = 0;
>
> So we can end up clearing the VM without emitting any warning?

This was me trying to avoid duplicating code in the branches. In v41,
I error out if the caller doesn't specify a valid corruption type, so
anything that clears the VM will have emitted a warning.

> > +static void
> > +heap_page_bypass_prune_freeze(PruneState *prstate, PruneFreezeResult *presult)
> > +{
> > +     OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
> > +     Page            page = prstate->page;
> > +
> > +     Assert(prstate->vmbits & VISIBILITYMAP_ALL_FROZEN ||
> > +                (prstate->vmbits & VISIBILITYMAP_ALL_VISIBLE &&
> > +                     !prstate->attempt_freeze));
> > +
> > +     /* We'll fill in presult for the caller */
> > +     memset(presult, 0, sizeof(PruneFreezeResult));
> > +
> > +     presult->vmbits = prstate->vmbits;
> > +
> > +     /* Clear any stale prune hint */
> > +     if (TransactionIdIsValid(PageGetPruneXid(page)))
> > +     {
> > +             PageClearPrunable(page);
> > +             MarkBufferDirtyHint(prstate->buffer, true);
> > +     }
> > +
> > +     if (PageIsEmpty(page))
> > +             return;
> > +
> > +     presult->hastup = true;
>
> Is that actually a given? Couldn't the page consist solely out of unused
> items? That'd make PageIsEmpty() return false, but should still allow
> truncation.

Good point. I've changed it to set hastup when counting live tuples.

But should I set hastup it if I see an LP_REDIRECT pointer? I know I
should always see a LP_NORMAL pointer if I see an LP_REDIRECT pointer,
but I just wondered if I should explicitly set hastup when I see
LP_REDIRECT since heap_prune_record_redirect() sets hastup = true.

> > diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
>
> > +     /*
> > +      * After processing all the live tuples on the page, if the newest xmin
> > +      * amongst them may be considered running by any snapshot, the page cannot
> > +      * be all-visible.
> > +      */
> > +     if (prstate.set_all_visible &&
> > +             TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
> > +             GlobalVisTestXidMaybeRunning(prstate.vistest,
> > +                                                                      prstate.visibility_cutoff_xid))
> > +             prstate.set_all_visible = prstate.set_all_frozen = false;
> > +
>
> So the docs for prstate.visibility_cutoff_xid say:
>
>          * visibility_cutoff_xid is the newest xmin of live tuples on the page.
>          * The caller can use it as the conflict horizon, when setting the VM
>          * bits.  It is only valid if we froze some tuples, and set_all_frozen is
>          * true.
>
> But here we look at it without checking that we froze some tuples.  I guess
> the comment is outdated?

That comment was never correct -- or I have chopped it into
unrecognizable bits over the last two years.

> Could the "going backward" thing possibly trigger a spurious assert in
>
>         Assert(heap_page_is_all_visible(vacrel->rel, buf,
>                                         vacrel->vistest, &debug_all_frozen,
>                                         &debug_cutoff, &vacrel->offnum));

I don't think anything (today) updates GlobalVisState between
GlobalVisTestXidMaybeRunning() and the heap_page_is_all_visible()
assert.

I had removed the visibility_cutoff_xid part of the assertion on the
intuition that comparing an exact horizon would no longer work when
using GlobalVisState. I can't remember if I actually saw failing
tests, but I don't see them anymore (so I've put it back).

The heap_page_is_all_visible() assertion moves into
heap_page_prune_and_freeze() in a later patch in this set, and while
it is also in a place where I don't think GlobalVisState can have
moved between making the page changes and calling
heap_page_is_all_visible(), I suspect it won't be a totally reliable
assertion now that it uses a moving target for comparison. What do you
think?

> > From a1d768a8cea8ac13e250188ec96c01d98acda94a Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Sat, 28 Feb 2026 16:06:51 -0500
> > Subject: [PATCH v40 04/12] Keep newest live XID up-to-date even if page not
> >  all-visible
>
> I guess I'd have expected 03 and 04 to be swapped... But whatever.

It couldn't be because I used GlobalVisState to always keep it
up-to-date (even for on-access pruning).

> > @@ -1076,6 +1116,30 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >               prstate.set_all_visible = prstate.set_all_frozen = false;
> >
> >       Assert(!prstate.set_all_frozen || prstate.set_all_visible);
> > +     Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
>
> Why didn't we have this assert earlier?

It was in lazy_scan_prune() as:
    Assert(!presult.set_all_visible || !(*has_lpdead_items));

> > +     do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
>
> Most of the other heap_page_prune_and_freeze() helpers are named
> heap_prune_xyz(), why not follow that here?
>
> I guess this holds for a few other helpers added in earlier commits
> too. E.g. heap_page_bypass_prune_freeze() should probably be
> heap_prune_bypass_prune_freeze() or such.

Most of the helpers prefixed with "heap_prune" now directly do
something related to pruning like recording line pointers and
traversing hot chains. heap_page_will_set_vm() and
heap_page_will_freeze() have nothing to do with pruning, so I think it
makes sense they are named differently.

And I don't think we are in any danger of folks using functions not
prefixed with heap_prune for other purposes, given that most of them
take a PruneState as an argument.

I'll do a big rename if you feel strongly about it, though.

> > @@ -1097,14 +1161,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
> >
> >               /*
> >                * If that's all we had to do to the page, this is a non-WAL-logged
> > -              * hint.  If we are going to freeze or prune the page, we will mark
> > -              * the buffer dirty below.
> > +              * hint.  If we are going to freeze or prune the page or set
> > +              * PD_ALL_VISIBLE, we will mark the buffer dirty below.
> > +              *
> > +              * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
> > +              * for the VM to be set and PD_ALL_VISIBLE to be clear.
> >                */
> > -             if (!do_freeze && !do_prune)
> > +             if (!do_freeze && !do_prune && !do_set_vm)
> >                       MarkBufferDirtyHint(prstate.buffer, true);
> >       }
>
> This block is gated by if (do_hint_prune) which is computed as:
>
>         /*
>          * Even if we don't prune anything, if we found a new value for the
>          * pd_prune_xid field or the page was marked full, we will update the hint
>          * bit.
>          */
>         do_hint_prune = PageGetPruneXid(prstate.page) != prstate.new_prune_xid ||
>                 PageIsFull(prstate.page);
>
> It's not really related to this change, but I'm just confused a bit by the
> "|| PageIsFull(prstate.page)". What is that about? Why do we want to mark the
> buffer DirtyHint if the page is full? It very well might already have been
> marked as such, no?

Because if the page is marked full, we clear that hint, and, if that's
the only change we make to the page, we need to do
MarkBufferDirtyHint().

> > +#ifdef USE_ASSERT_CHECKING
> > +     if (prstate.set_all_visible)
> > +     {
> > +             TransactionId debug_cutoff;
> > +             bool            debug_all_frozen;
> > +
> > +             Assert(prstate.lpdead_items == 0);
> > +
> > +             Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
> > +                                                                             prstate.vistest,
> > +                                                                             &debug_all_frozen,
> > +                                                                             &debug_cutoff, off_loc));
> > +
> > +             /*
> > +              * It's possible the page is composed entirely of frozen tuples but is
> > +              * not set all-frozen in the VM and did not pass
> > +              * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
> > +              * heap_page_is_all_visible() finds the page completely frozen, even
> > +              * though prstate.set_all_frozen is false.
> > +              */
> > +             Assert(!prstate.set_all_frozen || debug_all_frozen);
>
> Seems like we could verify that debug_cutoff isn't newer than conflict_xid?

Well, not conflict_xid, but newest_xid, yes.

> Hm.  I guess aborting after we did incorrect pruning/freezing/VMing is better
> than not, but it'd be even better if we did it before corrupting things. But I
> guess it'd be not trivial to add something like the debug_cutoff assertion I
> suggest above, when freezing of tuples is only executed after
> heap_page_is_all_visible() (for dead tuples heap_page_would_be_all_visible()
> already has provisions).
>
> It's probably more a theoretical concern than a real worry.

Yea, I think the work it would take to make
heap_page_would_be_all_visible() work for frozen tuples wouldn't be
worth it just to get it to assert out before executing the page
changes.

> > +     presult->new_all_visible_pages = 0;
> > +     presult->new_all_frozen_pages = 0;
> > +     presult->new_all_visible_frozen_pages = 0;
>
> Isn't it odd to talk about pages here? Given that heap_page_prune_and_freeze()
> only ever operates on exactly one page.  Is that just so you can do
>
> > +     vacrel->new_all_visible_pages += presult.new_all_visible_pages;

I made this change because you didn't like it when I passed old_vmbits
and new_vmbits back out to lazy_scan_prune() to derive these counters.
FWIW I think it's better not to have lazy_scan_prune() compare new and
old vmbits to increment counters, because lazy_scan_prune() shouldn't
have to know about the VM anymore once it is not setting it.

> > +     if (do_set_vm)
> > +     {
> > +             if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > +             {
> > +                     presult->new_all_visible_pages = 1;
> > +                     if (prstate.set_all_frozen)
> > +                             presult->new_all_visible_frozen_pages = 1;
> > +             }
> > +             else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> > +                              prstate.set_all_frozen)
> > +                     presult->new_all_frozen_pages = 1;
> > +     }
> > +
> >       if (prstate.attempt_freeze)
> >       {
> >               if (presult->nfrozen > 0)
>
> Feels like this is kinda redoing what heap_page_will_set_vm already did.

The logic is different than what is in heap_page_will_set_vm() because
there we don't care about what old_vmbits is. We are simply concerned
with whether we should set new_vmbits to something.

So we need to have logic somewhere that is figuring out if the vmbits
were set before and whether we newly set them. That can either go in
heap_page_prune_and_freeze() and we can use that to set the counters
in the LVRelState or it can go in lazy_scan_prune().

I think it makes more sense in heap_page_prune_and_freeze() so that
lazy_scan_prune() doesn't have to know about the VM's new/old state,
which it otherwise no longer deals with.


> > @@ -1923,13 +1926,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
> >                       PageSetAllVisible(page);
> >                       PageClearPrunable(page);
> > -                     visibilitymap_set(vacrel->rel, blkno, buf,
> > -                                                       InvalidXLogRecPtr,
> > -                                                       vmbuffer, InvalidTransactionId,
> > -                                                       VISIBILITYMAP_ALL_VISIBLE |
> > -                                                       VISIBILITYMAP_ALL_FROZEN);
> > +                     visibilitymap_set_vmbits(blkno,
> > +                                                                      vmbuffer,
> > +                                                                      VISIBILITYMAP_ALL_VISIBLE |
> > +                                                                      VISIBILITYMAP_ALL_FROZEN,
> > +                                                                      vacrel->rel->rd_locator);
> > +
> > +                     /*
> > +                      * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
> > +                      * setting the VM.
> > +                      */
> > +                     if (RelationNeedsWAL(vacrel->rel))
> > +                             log_heap_prune_and_freeze(vacrel->rel, buf,
> > +                                                                               vmbuffer,
> > +                                                                               VISIBILITYMAP_ALL_VISIBLE |
> > +                                                                               VISIBILITYMAP_ALL_FROZEN,
> > +                                                                               InvalidTransactionId, /* conflict xid */
> > +                                                                               false,        /* cleanup lock */
> > +                                                                               PRUNE_VACUUM_SCAN,    /* reason */
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0,
> > +                                                                               NULL, 0);
> > +
> >                       END_CRIT_SECTION();
>
> It's a tad odd that we do:
>
>                         /*
>                          * It's possible that another backend has extended the heap,
>                          * initialized the page, and then failed to WAL-log the page due
>                          * to an ERROR.  Since heap extension is not WAL-logged, recovery
>                          * might try to replay our record setting the page all-visible and
>                          * find that the page isn't initialized, which will cause a PANIC.
>                          * To prevent that, check whether the page has been previously
>                          * WAL-logged, and if not, do that now.
>                          */
>                         if (RelationNeedsWAL(vacrel->rel) &&
>                                 !XLogRecPtrIsValid(PageGetLSN(page)))
>                                 log_newpage_buffer(buf, true);
>
> if we then immediately afterwards emit a WAL record that could just as well
> have included in FPI of the heap page.

I originally added a flag to log_heap_prune_and_freeze() that could
force an FPI but Robert disliked it, saying he found it more
confusing. He said:

> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?

I can put it back that way, I don't have strong feelings either way.
Though I imagine if I add another argument to
log_heap_prune_and_freeze(), you'll bring up creating a struct for its
arguments again...


> > diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
> > index 8a67bfa1aff..d9042e1f91d 100644
> > --- a/src/backend/access/common/bufmask.c
> > +++ b/src/backend/access/common/bufmask.c
> > @@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
> >
> >       /*
> >        * During replay, if the page LSN has advanced past our XLOG record's LSN,
> > -      * we don't mark the page all-visible. See heap_xlog_visible() for
> > -      * details.
> > +      * we don't mark the page all-visible. See heap_xlog_prune_freeze() for
> > +      * more details.
> >        */
> >       PageClearAllVisible(page);
> >  }
>
> Not introduced by your change, but isn't it rather terrifying that the
> wal_consistency_checking infrastructure doesn't verify whether the page is
> marked all-visible? Wasn't aware of this. Seems bonkers to me.

Agreed. I wonder what it would take to start.

> I don't even know what specifically in heap_xlog_visible() that comment is
> referring to? Just that we only do PageSetAllVisible() if BLK_NEEDS_REDO? But
> uh, what does that have to do with anything?

Yea, this comment doesn't make sense. I think we should remove it.

But regarding why we mask PD_ALL_VISIBLE in wal consistency checking,
I wonder if this is the scenario:

Record 1 sets the VM and PD_ALL_VISIBLE
Record 2 inserts a tuple and clears PD_ALL_VISIBLE
the heap page is flushed to disk, but the VM page is not
crash
replay R1; skip setting PD_ALL_VISIBLE because the page has R2's LSN;
set the bits on the VM page

Even though we'll clear the VM when we replay R2, if we cross-check
the page and VM after replaying only R1, the VM will be set and
PD_ALL_VISIBLE will be clear. I think this is okay because no one
should see them at this time. But it might not work with wal
consistency checking.

> > @@ -477,12 +477,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
> >        * If we get passed InvalidTransactionId then we do nothing (no conflict).
> >        *
> >        * This can happen when replaying already-applied WAL records after a
> > -      * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
> > -      * record that marks as frozen a page which was already all-visible.  It's
> > -      * also quite common with records generated during index deletion
> > -      * (original execution of the deletion can reason that a recovery conflict
> > -      * which is sufficient for the deletion operation must take place before
> > -      * replay of the deletion record itself).
> > +      * standby crash or restart
>
> Again not about your patch: I don't understand how already applied WAL can
> lead to InvalidTransactionId being passed here. The record doesn't change just
> because we had already applied the WAL?

Yea, I think the comment is just wrong. I realized the comment still
needed to reference my code, so I've updated it.

> > From 04b03c1ec3abcee75e464fef994b482df41b35f4 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Wed, 3 Dec 2025 15:07:24 -0500
> > Subject: [PATCH v40 08/12] Track which relations are modified by a query
> >
> > Save the relids of modified relations in a bitmap in the executor state.
> > A later commit will pass this information down to scan nodes to control
> > whether or not on-access pruning is allowed to set the visibility map.
> > Setting the visibility map during a scan is counterproductive if the
> > query is going to modify the page immediately after.
> >
> > Relations are considered modified if they are the target of INSERT,
> > UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
> > FOR UPDATE/SHARE). All row mark types are included, even those which
> > don't actually modify tuples, because this bitmap is only used as a hint
> > to avoid unnecessary work.
>
> You're probably going to hate me for the question, but is there a reason to
> not compute es_modified_relids at plan time?

Yea, it probably does make more sense there. The only thing is that by
doing it in planner, it could include relids of leaf partitions that
get run-time pruned. But we won't scan those, so it is no issue for
this feature. I'm just wondering if it dilutes the meaning of
"modified relids", though.

In v41, I've implemented it in planner (which also made me realize
parallel workers previously didn't have es_modified_relids, oops).

> > @@ -992,6 +996,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
> >        */
> >       planstate = ExecInitNode(plan, estate, eflags);
> >
> > +#ifdef USE_ASSERT_CHECKING
> > +     CrossCheckModifiedRelids(estate);
> > +#endif
>
> Not sure that buys you much, given it pretty much is just a restatement of the
> code building estate->es_modified_relids.

Yea, now that I've done it in planner, I cross-check in the executor.

> What about checking against PlannedStmt->{resultRelations, permInfos} or

I don't think it makes sense to use permInfos because according to
expand_single_inheritance_child() there is no permission checking for
child RTEs, so I think permInfos won't include everything we need.

> asserting membership at the places that actually lock/modify?

Are you thinking I should also add some in ExecInsert, ExecDelete,
ExecUpdate, and ExecLockRows? Think this might be redundant with the
executor cross-check I have now after InitPlan(). (I've done it anyway
so we can discuss).

> > diff --git a/src/include/access/genam.h b/src/include/access/genam.h
> > index 1a27bf060b3..db102803eb5 100644
> > --- a/src/include/access/genam.h
> > +++ b/src/include/access/genam.h
> > @@ -158,7 +158,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
> >                                                                        Relation indexRelation,
> >                                                                        Snapshot snapshot,
> >                                                                        IndexScanInstrumentation *instrument,
> > -                                                                      int nkeys, int norderbys);
> > +                                                                      int nkeys, int norderbys, uint32 flags);
>
> I'd probably put flags in a position where it's not as easily confused with
> nkeys or norderbys.

Do you mean like move it before nkeys and norderbys or move it
earlier? I did the latter but not sure if it's weird to have flags
before snapshot (especially since the other table am routines pass it
last). I think it looks kind of weird when all of the other ones have
flags as the last argument.

> > @@ -1059,10 +1062,11 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
> >  static inline TableScanDesc
> >  table_beginscan_tidrange(Relation rel, Snapshot snapshot,
> >                                                ItemPointer mintid,
> > -                                              ItemPointer maxtid)
> > +                                              ItemPointer maxtid, uint32 flags)
> >  {
> >       TableScanDesc sscan;
> > -     uint32          flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> > +
> > +     flags |= SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
> >
> >       sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
>
> Hm. Would it perhaps be a good idea to have an assert as to which flags are
> specified by the "user"? If e.g. another SO_TYPE_* were specified it might
> result in some odd behaviour.
>
> Perhaps this would be best done by adding an argument to
> table_beginscan_common() specifying the "internal" flags (i.e. the ones that
> specified inside table_beginscan_*) and user specified flags?  Then
> table_beginscan_common could check the set of user specified flags being sane.

Yes, good idea. Done in attached v41.

It's unclear to me which flags should be considered internal though. I
think it makes sense that the SO_TYPE* flags are considered internal
because you can only specify one.  But all of the other current
ScanOptions are specified inside table_beginscan_* so do you mean that
we should consider all of those internal flags?


> > +      * Some optimizations can only be performed if the query does not modify
> > +      * the underlying relation. Track that here.
> > +      */
> > +     bool            modifies_base_rel;
> >  } IndexFetchHeapData;
>
> Wonder if this should be in the generic IndexFetchTableData?

I added flags to the IndexFetchTableData in much the same way as the
regular table scan descriptor has them.

> > diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
> > index d264a698ff6..a5536ba4ff6 100644
> > --- a/src/test/recovery/t/035_standby_logical_decoding.pl
> > +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
> > @@ -296,6 +296,7 @@ wal_level = 'logical'
> >  max_replication_slots = 4
> >  max_wal_senders = 4
> >  autovacuum = off
> > +hot_standby_feedback = on
> >  });
> >  $node_primary->dump_info;
> >  $node_primary->start;
> > @@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
> >  $logstart = -s $node_standby->logfile;
> >
> >  reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
> > -     'no_conflict_', 0, 1);
> > +     'no_conflict_', 1, 0);
> >
> >  # This should not trigger a conflict
> >  wait_until_vacuum_can_remove(
> > --
> > 2.43.0
>
> Why does this patch need to change anything here? Is the test buggy
> independently?

Nope. I guess that was a mistake during development. No change needed.

> > @@ -1863,16 +1864,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
> >                       prstate->set_all_visible = false;
> >                       prstate->set_all_frozen = false;
> >
> > -                     /* The page should not be marked all-visible */
> > -                     if (PageIsAllVisible(page))
> > -                             heap_fix_vm_corruption(prstate, offnum);
> > -
>
> Huh?

heap_prune_record_prunable() already does the corruption check, so I
don't need to do it separately for INSERT_IN_PROGRESS tuples once we
call heap_prune_record_prunable() for them.

- Melanie


Attachments:

  [text/x-patch] v41-0001-Fix-visibility-map-corruption-in-more-cases.patch (20.5K, 2-v41-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From c87dd03ec309e12247c8ccdc3adf289a1e451255 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v41 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 215 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +----------
 src/include/access/heapam.h          |  12 ++
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 215 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..e452d25cae6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		old_vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -162,12 +177,30 @@ typedef struct
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
+
+/*
+ * Type of visibility map corruption detected on a heap page.  Passed to
+ * heap_page_fix_vm_corruption() so the caller can specify what it found rather
+ * than having the function re-derive the corruption from page state.
+ */
+typedef enum VMCorruptionType
+{
+	/* VM bits are set but the page-level PD_ALL_VISIBLE flag is not */
+	VM_CORRUPT_MISSING_PAGE_HINT,
+	/* LP_DEAD line pointers found on a page marked all-visible */
+	VM_CORRUPT_LPDEAD,
+	/* Tuple not visible to all transactions on a page marked all-visible */
+	VM_CORRUPT_TUPLE_VISIBILITY,
+} VMCorruptionType;
+
 /* Local functions */
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+										VMCorruptionType ctype);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +208,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +243,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +312,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +329,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +392,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +814,104 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Emit a warning about and fix visibility map corruption on the given page.
+ *
+ * The caller specifies the type of corruption it has already detected via
+ * corruption_type, so that we can emit the appropriate warning. All cases
+ * result in the VM bits being cleared; page-level corruption types also clear
+ * PD_ALL_VISIBLE.
+ *
+ * Must be called while holding an exclusive lock on the heap buffer. Dead
+ * items must have been discovered under that same lock. Although we do not
+ * hold a lock on the VM buffer, it is pinned, and the heap buffer is
+ * exclusively locked, ensuring that no other backend can update the VM bits
+ * corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+							VMCorruptionType corruption_type)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	switch (corruption_type)
+	{
+		case VM_CORRUPT_LPDEAD:
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			break;
+
+		case VM_CORRUPT_TUPLE_VISIBILITY:
+
+			/*
+			 * A HEAPTUPLE_LIVE tuple on an all-visible page can appear to not
+			 * be visible to everyone when
+			 * GetOldestNonRemovableTransactionId() returns a conservative
+			 * value that's older than the real safe xmin. That is not
+			 * corruption -- the PD_ALL_VISIBLE flag is still correct.
+			 *
+			 * However, dead tuple versions, in-progress inserts, and
+			 * in-progress deletes should never appear on a page marked
+			 * all-visible. That indicates real corruption. PD_ALL_VISIBLE
+			 * should have been cleared by the DML operation that deleted or
+			 * inserted the tuple.
+			 */
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			break;
+
+		case VM_CORRUPT_MISSING_PAGE_HINT:
+
+			/*
+			 * As of PostgreSQL 9.2, the visibility map bit should never be
+			 * set if the page-level bit is clear. However, for vacuum, it's
+			 * possible that the bit got cleared after
+			 * heap_vac_scan_next_block() was called, so we must recheck now
+			 * that we have the buffer lock before concluding that the VM is
+			 * corrupt.
+			 */
+			Assert(!PageIsAllVisible(prstate->page));
+			Assert(prstate->old_vmbits & VISIBILITYMAP_VALID_BITS);
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("page is not marked all-visible but visibility map bit is set"),
+					 errcontext("relation \"%s\", page %u",
+								relname, prstate->block)));
+			break;
+
+		default:
+			elog(ERROR, "unrecognized VM corruption type: %d",
+				 (int) corruption_type);
+			break;
+	}
+
+	/*
+	 * Clear PD_ALL_VISIBLE on the heap page if it is set.
+	 * VM_CORRUPT_MISSING_PAGE_HINT is already clear by definition, so avoid
+	 * marking the buffer dirty.
+	 */
+	if (corruption_type != VM_CORRUPT_MISSING_PAGE_HINT)
+	{
+		Assert(PageIsAllVisible(prstate->page));
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+						VISIBILITYMAP_VALID_BITS);
+	prstate->old_vmbits = 0;
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +972,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	/*
+	 * If the VM is set but PD_ALL_VISIBLE is clear, fix that corruption
+	 * before pruning and freezing so that the page and VM start out in a
+	 * consistent state.
+	 */
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
+									VM_CORRUPT_MISSING_PAGE_HINT);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1125,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->old_vmbits = prstate.old_vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1448,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1459,14 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum,
+									VM_CORRUPT_TUPLE_VISIBILITY);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1550,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1698,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1714,11 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_page_fix_vm_corruption(prstate, offnum,
+											VM_CORRUPT_TUPLE_VISIBILITY);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1743,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1810,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c57432670e7..56722556417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -432,11 +432,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1989,81 +1984,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2095,6 +2015,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2204,18 +2125,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.old_vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..00134012137 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
+	 * cleared if VM corruption is found and corrected.
+	 */
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a4a2ed07816..480614d483b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3273,6 +3273,7 @@ UserAuth
 UserContext
 UserMapping
 UserOpts
+VMCorruptionType
 VacAttrStats
 VacAttrStatsP
 VacDeadItemsInfo
-- 
2.43.0



  [text/x-patch] v41-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.7K, 3-v41-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 0c9f91eec0127bc914c7fbe79256c6e5b689cde8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v41 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, we exit early. We can't
exit early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 97 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e452d25cae6..19a72ac6b27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -201,6 +207,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneState *prstate);
 static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 										VMCorruptionType ctype);
+static void prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -329,7 +336,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -398,6 +405,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 												   prstate->block,
 												   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -913,6 +930,73 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	prstate->old_vmbits = 0;
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->old_vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling prune_freeze_bypass(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->old_vmbits = prstate->old_vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		if (ItemIdIsNormal(PageGetItemId(page, off)))
+		{
+			/*
+			 * Now that we've found an actual tuple, set hastup. If the page
+			 * is entirely LP_UNUSED, we want vacuum to still truncate it.
+			 */
+			presult->hastup = true;
+			prstate->live_tuples++;
+		}
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -982,6 +1066,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
 									VM_CORRUPT_MISSING_PAGE_HINT);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		prune_freeze_bypass(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56722556417..1a446050d85 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2044,6 +2044,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00134012137..305ecc31a9e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v41-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (12.3K, 4-v41-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 933ccfc1fa4f652c9a9f0be7cac5abebf4ddf7c1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v41 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page;
if it is globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid. This approach will result in examining more tuple
xmins than before; however, the additional cost should not be
significant. And doing so will enable us to set the visibility map on
access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 22 ++++++++++
 src/backend/access/heap/pruneheap.c         | 48 ++++++++++-----------
 src/backend/access/heap/vacuumlazy.c        | 48 +++++++++++++--------
 src/include/access/heapam.h                 |  2 +
 4 files changed, 79 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..d70fab3a763 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 19a72ac6b27..f437579076e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -166,10 +166,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -1085,6 +1088,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(prstate.vistest,
+									 prstate.visibility_cutoff_xid))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1753,29 +1767,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..2a94ba3a387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,13 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2089,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2852,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3614,14 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
 
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3642,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
  *
  * Output parameters:
  *
@@ -3661,7 +3661,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3742,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3751,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3790,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..f9dbd70c1c4 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
-- 
2.43.0



  [text/x-patch] v41-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.4K, 5-v41-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From de07645b084c2e01050ac5fa8a6c80240842673e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v41 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f437579076e..9451e9417f7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,14 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -183,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 
@@ -471,53 +465,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -747,7 +730,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1021,9 +1003,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1094,9 +1075,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidMaybeRunning(prstate.vistest,
-									 prstate.visibility_cutoff_xid))
+									 prstate.newest_live_xid))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	/*
@@ -1247,7 +1228,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1708,6 +1689,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1755,32 +1737,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2a94ba3a387..8599dd7fcfa 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -478,7 +478,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2828,7 +2828,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2854,14 +2854,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2902,7 +2902,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3616,7 +3616,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 
@@ -3624,7 +3624,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3647,7 +3647,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3665,7 +3665,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3675,7 +3675,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3764,9 +3764,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3796,8 +3796,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidMaybeRunning(vistest, *newest_live_xid))
 	{
 		all_visible = false;
 		*all_frozen = false;
-- 
2.43.0



  [text/x-patch] v41-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 6-v41-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From b6975991b391b979bbffac9cc0bd8896ba181ba8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v41 05/12] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 244 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++-----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 204 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9451e9417f7..dd8ac173ca1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 
@@ -232,6 +236,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -398,6 +403,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -915,6 +921,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	prstate->old_vmbits = 0;
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * is not being attempted, there is no remaining work and we can bypass the
@@ -948,8 +990,6 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -984,7 +1024,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -999,12 +1040,10 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1032,8 +1071,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1125,6 +1166,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1146,14 +1212,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1167,6 +1236,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1174,29 +1264,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1206,33 +1279,70 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible().
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8599dd7fcfa..d144e0f642b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   OffsetNumber *deadoffsets,
@@ -2021,8 +2014,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2073,32 +2064,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2119,6 +2084,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2132,71 +2108,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3612,7 +3523,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f9dbd70c1c4..b9577c24844 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v41-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (2.7K, 7-v41-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 561637633c1417af7dea0509bbbf55dca3c2fead Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v41 06/12] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d144e0f642b..de93bff4a8e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1928,9 +1928,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
-			/* mark buffer dirty before writing a WAL record */
+			/* Mark buffer dirty before writing any WAL records */
 			MarkBufferDirty(buf);
 
 			/*
@@ -1948,13 +1951,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
+			/*
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
+			 */
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
+
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v41-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 8-v41-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 8c63ade694ce8ed5bcb2d67c15203f0bd41d6b3f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v41 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index dd8ac173ca1..fac7194dcba 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1253,8 +1253,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index de93bff4a8e..461fdf4ed83 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1951,11 +1951,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2833,9 +2833,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 480614d483b..f12f2deec43 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4416,7 +4416,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v41-0008-Track-which-relations-are-modified-by-a-query.patch (8.7K, 9-v41-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From f5de33c173ca5216ea042475ec8ee2d8f0ddd3a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v41 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v41-0009-Thread-flags-through-begin-scan-APIs.patch (32.9K, 10-v41-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From a2aca8dcd8e944058ded737184343a6960f7cee6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v41 09/12] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..fb791c7990b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1160,7 +1160,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 67e42e5df29..cc2ec9393a8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22881,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23345,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..decfd792809 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -790,7 +791,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +857,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..a37fa9abece 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1726,7 +1728,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1792,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 86b55c9bb8b..1d64d286881 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b9577c24844..e32f28d7acb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v41-0010-Pass-down-information-on-table-modification-to-s.patch (11.3K, 11-v41-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 512c2a19651aecab3a15712992586dce288b83fd Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v41 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index decfd792809..b977719c295 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,7 +794,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -857,7 +863,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a37fa9abece..ad460c11679 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1728,7 +1734,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1792,7 +1800,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v41-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 12-v41-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From f02a7e880af91c4fb14edc75076d448f00462270 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v41 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they requrie pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fac7194dcba..deb7b948c1e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -236,7 +238,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -257,7 +260,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -339,6 +343,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -395,6 +401,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -474,9 +481,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -928,21 +934,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1168,7 +1190,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 461fdf4ed83..37dba4cb3ec 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2033,7 +2033,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e32f28d7acb..78c85536d39 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v41-0012-Set-pd_prune_xid-on-insert.patch (8.8K, 13-v41-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 99c5b17c45e413680919f2623fc528bb35873dc9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v41 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index deb7b948c1e..9f8c83aa7d3 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -279,7 +279,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-20 23:37                             ` Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-20 23:37 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 19, 2026 at 10:38 PM Melanie Plageman
<[email protected]> wrote:
>
> Thanks for the detailed review! Unless otherwise specified, attached
> v41 includes all of your straightforward review points.

I've made several minor updates and two notable updates in attached v42:

- no separate log_newpage_buffer() for empty page vacuum.
log_heap_prune_and_freeze() now handles pages without a valid LSN on
its own
- the heap_page_is_all_visible() assertion should be stable even once
it uses GlobalVisState because I've updated the GloablVisState
functions to avoid updating the GlobalVisState boundaries in this case

- Melanie


Attachments:

  [text/x-patch] v42-0001-Fix-visibility-map-corruption-in-more-cases.patch (20.5K, 2-v42-0001-Fix-visibility-map-corruption-in-more-cases.patch)
  download | inline diff:
From 866d7257a7024a018d1c39b09c0026bea374f8f6 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:23:09 -0500
Subject: [PATCH v42 01/12] Fix visibility map corruption in more cases

Move VM corruption detection and repair into pruning. This allows VM
repair during on-access pruning, not only during vacuum.

Also, expand corruption detection to cover pages marked all-visible that
contain dead tuples and tuples inserted or updated by in-progress
transactions, rather than only all-visible pages with LP_DEAD items.

Pinning the correct VM page before on-access pruning is cheap when
compared to the cost of actually pruning. The vmbuffer is saved in the
scan descriptor, so a query should only need to pin each VM page once
and a single VM page covers a large number of heap pages.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 217 +++++++++++++++++++++++++--
 src/backend/access/heap/vacuumlazy.c |  89 +----------
 src/include/access/heapam.h          |  12 ++
 src/tools/pgindent/typedefs.list     |   1 +
 4 files changed, 217 insertions(+), 102 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8d9f0694206..be3ae21f94c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
 #include "commands/vacuum.h"
@@ -114,6 +114,21 @@ typedef struct
 	 */
 	HeapPageFreeze pagefrz;
 
+	/*-------------------------------------------------------
+	 * Working state for visibility map processing
+	 *-------------------------------------------------------
+	 */
+
+	/*
+	 * Caller must provide a pinned vmbuffer corresponding to the heap block
+	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
+	 * found in the VM.
+	 */
+	Buffer		vmbuffer;
+
+	/* Bits in the vmbuffer for this heap page */
+	uint8		old_vmbits;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -162,12 +177,30 @@ typedef struct
 	TransactionId visibility_cutoff_xid;
 } PruneState;
 
+
+/*
+ * Type of visibility map corruption detected on a heap page. Passed to
+ * heap_page_fix_vm_corruption() so the caller can specify what it found
+ * rather than having the function rederive the corruption from page state.
+ */
+typedef enum VMCorruptionType
+{
+	/* VM bits are set but the page-level PD_ALL_VISIBLE flag is not */
+	VM_CORRUPT_MISSING_PAGE_HINT,
+	/* LP_DEAD line pointers found on a page marked all-visible */
+	VM_CORRUPT_LPDEAD,
+	/* Tuple not visible to all transactions on a page marked all-visible */
+	VM_CORRUPT_TUPLE_VISIBILITY,
+} VMCorruptionType;
+
 /* Local functions */
 static void prune_freeze_setup(PruneFreezeParams *params,
 							   TransactionId *new_relfrozen_xid,
 							   MultiXactId *new_relmin_mxid,
 							   PruneFreezeResult *presult,
 							   PruneState *prstate);
+static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+										VMCorruptionType ctype);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -175,7 +208,8 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 static inline HTSV_Result htsv_get_valid_status(int status);
 static void heap_prune_chain(OffsetNumber maxoff,
 							 OffsetNumber rootoffnum, PruneState *prstate);
-static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
+static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+									   OffsetNumber offnum);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
 									   bool was_normal);
@@ -209,8 +243,9 @@ static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool d
  * Caller must have pin on the buffer, and must *not* have a lock on it.
  *
  * This function may pin *vmbuffer. It's passed by reference so the caller can
- * reuse the pin across calls, avoiding repeated pin/unpin cycles. Caller is
- * responsible for unpinning it.
+ * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
+ * VM corruption during pruning, we will fix it. Caller is responsible for
+ * unpinning *vmbuffer.
  */
 void
 heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
@@ -277,6 +312,16 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 		{
 			OffsetNumber dummy_off_loc;
 			PruneFreezeResult presult;
+			PruneFreezeParams params;
+
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
+			params.relation = relation;
+			params.buffer = buffer;
+			params.vmbuffer = *vmbuffer;
+			params.reason = PRUNE_ON_ACCESS;
+			params.vistest = vistest;
+			params.cutoffs = NULL;
 
 			/*
 			 * We don't pass the HEAP_PAGE_PRUNE_MARK_UNUSED_NOW option
@@ -284,14 +329,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			PruneFreezeParams params = {
-				.relation = relation,
-				.buffer = buffer,
-				.reason = PRUNE_ON_ACCESS,
-				.options = 0,
-				.vistest = vistest,
-				.cutoffs = NULL,
-			};
+			params.options = 0;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -354,6 +392,12 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->buffer = params->buffer;
 	prstate->page = BufferGetPage(params->buffer);
 
+	Assert(BufferIsValid(params->vmbuffer));
+	prstate->vmbuffer = params->vmbuffer;
+	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
+												   prstate->block,
+												   &prstate->vmbuffer);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -770,6 +814,106 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	return do_freeze;
 }
 
+/*
+ * Emit a warning about and fix visibility map corruption on the given page.
+ *
+ * The caller specifies the type of corruption it has already detected via
+ * corruption_type, so that we can emit the appropriate warning. All cases
+ * result in the VM bits being cleared; page-level corruption types also clear
+ * PD_ALL_VISIBLE.
+ *
+ * Must be called while holding an exclusive lock on the heap buffer. Dead
+ * items must have been discovered under that same lock. Although we do not
+ * hold a lock on the VM buffer, it is pinned, and the heap buffer is
+ * exclusively locked, ensuring that no other backend can update the VM bits
+ * corresponding to this heap page.
+ *
+ * This function makes changes to the VM and, potentially, the heap page, but
+ * it does not need to be done in a critical section.
+ */
+static void
+heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
+							VMCorruptionType corruption_type)
+{
+	const char *relname = RelationGetRelationName(prstate->relation);
+	bool		clear_vm = false;
+	bool		clear_heap = false;
+
+	Assert(BufferIsLockedByMeInMode(prstate->buffer, BUFFER_LOCK_EXCLUSIVE));
+
+	switch (corruption_type)
+	{
+		case VM_CORRUPT_LPDEAD:
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("dead line pointer found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			clear_vm = true;
+			break;
+
+		case VM_CORRUPT_TUPLE_VISIBILITY:
+
+			/*
+			 * A HEAPTUPLE_LIVE tuple on an all-visible page can appear to not
+			 * be visible to everyone when
+			 * GetOldestNonRemovableTransactionId() returns a conservative
+			 * value that's older than the real safe xmin. That is not
+			 * corruption -- the PD_ALL_VISIBLE flag is still correct.
+			 *
+			 * However, dead tuple versions, in-progress inserts, and
+			 * in-progress deletes should never appear on a page marked
+			 * all-visible. That indicates real corruption. PD_ALL_VISIBLE
+			 * should have been cleared by the DML operation that deleted or
+			 * inserted the tuple.
+			 */
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("tuple not visible to all transactions found on page marked all-visible"),
+					 errcontext("relation \"%s\", page %u, tuple %u",
+								relname, prstate->block, offnum)));
+			clear_vm = true;
+			break;
+
+		case VM_CORRUPT_MISSING_PAGE_HINT:
+
+			/*
+			 * As of PostgreSQL 9.2, the visibility map bit should never be
+			 * set if the page-level bit is clear. However, for vacuum, it's
+			 * possible that the bit got cleared after
+			 * heap_vac_scan_next_block() was called, so we must recheck now
+			 * that we have the buffer lock before concluding that the VM is
+			 * corrupt.
+			 */
+			Assert(!PageIsAllVisible(prstate->page));
+			Assert(prstate->old_vmbits & VISIBILITYMAP_VALID_BITS);
+			ereport(WARNING,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("page is not marked all-visible but visibility map bit is set"),
+					 errcontext("relation \"%s\", page %u",
+								relname, prstate->block)));
+			clear_vm = true;
+			clear_heap = true;
+			break;
+	}
+
+	Assert(clear_heap || clear_vm);
+
+	/* Avoid marking the buffer dirty if PD_ALL_VISIBLE is already clear */
+	if (clear_heap)
+	{
+		Assert(PageIsAllVisible(prstate->page));
+		PageClearAllVisible(prstate->page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (clear_vm)
+	{
+		visibilitymap_clear(prstate->relation, prstate->block, prstate->vmbuffer,
+							VISIBILITYMAP_VALID_BITS);
+		prstate->old_vmbits = 0;
+	}
+}
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
@@ -830,6 +974,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 					   new_relfrozen_xid, new_relmin_mxid,
 					   presult, &prstate);
 
+	/*
+	 * If the VM is set but PD_ALL_VISIBLE is clear, fix that corruption
+	 * before pruning and freezing so that the page and VM start out in a
+	 * consistent state.
+	 */
+	if ((prstate.old_vmbits & VISIBILITYMAP_VALID_BITS) &&
+		!PageIsAllVisible(prstate.page))
+		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
+									VM_CORRUPT_MISSING_PAGE_HINT);
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
@@ -973,6 +1127,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	presult->set_all_visible = prstate.set_all_visible;
 	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
+	presult->old_vmbits = prstate.old_vmbits;
 
 	/*
 	 * For callers planning to update the visibility map, the conflict horizon
@@ -1295,7 +1450,8 @@ process_chain:
 
 /* Record lowest soon-prunable XID */
 static void
-heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
+heap_prune_record_prunable(PruneState *prstate, TransactionId xid,
+						   OffsetNumber offnum)
 {
 	/*
 	 * This should exactly match the PageSetPrunable macro.  We can't store
@@ -1305,6 +1461,14 @@ heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
 	if (!TransactionIdIsValid(prstate->new_prune_xid) ||
 		TransactionIdPrecedes(xid, prstate->new_prune_xid))
 		prstate->new_prune_xid = xid;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains
+	 * prunable items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum,
+									VM_CORRUPT_TUPLE_VISIBILITY);
 }
 
 /* Record line pointer to be redirected */
@@ -1388,6 +1552,15 @@ heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber offnum,
 		heap_prune_record_unused(prstate, offnum, was_normal);
 	else
 		heap_prune_record_dead(prstate, offnum, was_normal);
+
+	/*
+	 * It's incorrect for the page to be set all-visible if it contains dead
+	 * items. Fix that on the heap page and check the VM for corruption as
+	 * well. Do that here rather than in heap_prune_record_dead() so we also
+	 * cover tuples that are directly marked LP_UNUSED via mark_unused_now.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /* Record line pointer to be marked unused */
@@ -1527,7 +1700,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * that the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_INSERT_IN_PROGRESS:
@@ -1542,6 +1716,11 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
+			/* The page should not be marked all-visible */
+			if (PageIsAllVisible(page))
+				heap_page_fix_vm_corruption(prstate, offnum,
+											VM_CORRUPT_TUPLE_VISIBILITY);
+
 			/*
 			 * If we wanted to optimize for aborts, we might consider marking
 			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
@@ -1566,7 +1745,8 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * the page is reconsidered for pruning in future.
 			 */
 			heap_prune_record_prunable(prstate,
-									   HeapTupleHeaderGetUpdateXid(htup));
+									   HeapTupleHeaderGetUpdateXid(htup),
+									   offnum);
 			break;
 
 		default:
@@ -1632,6 +1812,13 @@ heap_prune_record_unchanged_lp_dead(PruneState *prstate, OffsetNumber offnum)
 
 	/* Record the dead offset for vacuum */
 	prstate->deadoffsets[prstate->lpdead_items++] = offnum;
+
+	/*
+	 * It's incorrect for a page to be marked all-visible if it contains dead
+	 * items.
+	 */
+	if (PageIsAllVisible(prstate->page))
+		heap_page_fix_vm_corruption(prstate, offnum, VM_CORRUPT_LPDEAD);
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c57432670e7..56722556417 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -432,11 +432,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
 static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
 								   BlockNumber blkno, Page page,
 								   bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-										   BlockNumber heap_blk, Page heap_page,
-										   int nlpdead_items,
-										   Buffer vmbuffer,
-										   uint8 *vmbits);
 static int	lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 							BlockNumber blkno, Page page,
 							Buffer vmbuffer,
@@ -1989,81 +1984,6 @@ cmpOffsetNumbers(const void *a, const void *b)
 	return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
 }
 
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
-							   BlockNumber heap_blk, Page heap_page,
-							   int nlpdead_items,
-							   Buffer vmbuffer,
-							   uint8 *vmbits)
-{
-	Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
-	Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
-	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
-	 */
-	if (!PageIsAllVisible(heap_page) &&
-		((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-
-	/*
-	 * It's possible for the value returned by
-	 * GetOldestNonRemovableTransactionId() to move backwards, so it's not
-	 * wrong for us to see tuples that appear to not be visible to everyone
-	 * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
-	 * never moves backwards, but GetOldestNonRemovableTransactionId() is
-	 * conservative and sometimes returns a value that's unnecessarily small,
-	 * so if we see that contradiction it just means that the tuples that we
-	 * think are not visible to everyone yet actually are, and the
-	 * PD_ALL_VISIBLE flag is correct.
-	 *
-	 * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
-	 * however.
-	 */
-	else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
-	{
-		ereport(WARNING,
-				(errcode(ERRCODE_DATA_CORRUPTED),
-				 errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
-						RelationGetRelationName(rel), heap_blk)));
-
-		PageClearAllVisible(heap_page);
-		MarkBufferDirty(heap_buffer);
-		visibilitymap_clear(rel, heap_blk, vmbuffer,
-							VISIBILITYMAP_VALID_BITS);
-		*vmbits = 0;
-	}
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2095,6 +2015,7 @@ lazy_scan_prune(LVRelState *vacrel,
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
 		.options = HEAP_PAGE_PRUNE_FREEZE,
 		.vistest = vacrel->vistest,
@@ -2204,18 +2125,12 @@ lazy_scan_prune(LVRelState *vacrel,
 	Assert(!presult.set_all_visible || !(*has_lpdead_items));
 	Assert(!presult.set_all_frozen || presult.set_all_visible);
 
-	old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
-	identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
-								   presult.lpdead_items, vmbuffer,
-								   &old_vmbits);
-
 	if (!presult.set_all_visible)
 		return presult.ndeleted;
 
 	/* Set the visibility map and page visibility hint */
+	old_vmbits = presult.old_vmbits;
 	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
 	if (presult.set_all_frozen)
 		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..00134012137 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -262,6 +262,12 @@ typedef struct PruneFreezeParams
 	Relation	relation;		/* relation containing buffer to be pruned */
 	Buffer		buffer;			/* buffer to be pruned */
 
+	/*
+	 * Callers should provide a pinned vmbuffer corresponding to the heap
+	 * block in buffer. We will check for and repair any corruption in the VM.
+	 */
+	Buffer		vmbuffer;
+
 	/*
 	 * The reason pruning was performed.  It is used to set the WAL record
 	 * opcode which is used for debugging and analysis purposes.
@@ -324,6 +330,12 @@ typedef struct PruneFreezeResult
 	bool		set_all_frozen;
 	TransactionId vm_conflict_horizon;
 
+	/*
+	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
+	 * cleared if VM corruption is found and corrected.
+	 */
+	uint8		old_vmbits;
+
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
 	 * 'true', even if the page contains LP_DEAD items.  VACUUM will remove
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0042c33fa66..0c07c945f05 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3277,6 +3277,7 @@ UserAuth
 UserContext
 UserMapping
 UserOpts
+VMCorruptionType
 VacAttrStats
 VacAttrStatsP
 VacDeadItemsInfo
-- 
2.43.0



  [text/x-patch] v42-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch (7.5K, 3-v42-0002-Add-pruning-fast-path-for-all-visible-and-all-fr.patch)
  download | inline diff:
From 4bd5502d07562a0e6ce5cbf315833c5baa676028 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 25 Feb 2026 16:48:19 -0500
Subject: [PATCH v42 02/12] Add pruning fast path for all-visible and
 all-frozen pages

Because of the SKIP_PAGES_THRESHOLD optimization or a stale prune XID,
heap_page_prune_and_freeze() can be invoked for pages with no pruning or
freezing work. To avoid this, if a page is already all-frozen or it is
all-visible and no freezing will be attempted, exit early. We can't exit
early if vacuum passed DISABLE_PAGE_SKIPPING, though.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 97 +++++++++++++++++++++++++++-
 src/backend/access/heap/vacuumlazy.c | 10 +++
 src/include/access/heapam.h          |  1 +
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index be3ae21f94c..22f2d9d9798 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,12 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/*
+	 * True if the page can bypass full page inspection during pruning and
+	 * freezing based on its visibility map status and the caller's options.
+	 */
+	bool		fast_path;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -201,6 +207,7 @@ static void prune_freeze_setup(PruneFreezeParams *params,
 							   PruneState *prstate);
 static void heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 										VMCorruptionType ctype);
+static void prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult);
 static void prune_freeze_plan(PruneState *prstate,
 							  OffsetNumber *off_loc);
 static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
@@ -329,7 +336,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * cannot safely determine that during on-access pruning with the
 			 * current implementation.
 			 */
-			params.options = 0;
+			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -398,6 +405,16 @@ prune_freeze_setup(PruneFreezeParams *params,
 												   prstate->block,
 												   &prstate->vmbuffer);
 
+	/*
+	 * If the page is already all-frozen, or already all-visible when freezing
+	 * is not being attempted, we can skip pruning and freezing entirely.
+	 * Callers must opt in by setting HEAP_PAGE_PRUNE_ALLOW_FAST_PATH.
+	 */
+	prstate->fast_path = ((prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN) ||
+						  ((prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE) &&
+						   !prstate->attempt_freeze)) &&
+		(params->options & HEAP_PAGE_PRUNE_ALLOW_FAST_PATH);
+
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -915,6 +932,73 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * If the page is already all-frozen, or already all-visible and freezing
+ * is not being attempted, there is no remaining work and we can bypass the
+ * expensive overhead of heap_page_prune_and_freeze().
+ *
+ * This can happen when the page has a stale prune hint, or if VACUUM is
+ * scanning an already all-frozen page due to SKIP_PAGES_THRESHOLD.
+ *
+ * The caller must already have examined the visibility map and saved the
+ * status for the page's VM bits in prstate->old_vmbits. Caller must hold a
+ * content lock on the heap page since it will examine line pointers.
+ *
+ * Before calling prune_freeze_bypass(), the caller should first
+ * check for and fix any discrepancy between the page-level visibility hint
+ * and the visibility map. Otherwise, the fast path will always prevent us
+ * from getting them in sync. Note that if there are tuples on the page that
+ * are not visible to all but the VM is incorrectly marked
+ * all-visible/all-frozen, we will not get the chance to fix that corruption
+ * when using the fast path.
+ */
+static void
+prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
+{
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(prstate->page);
+	Page		page = prstate->page;
+
+	Assert(prstate->old_vmbits & VISIBILITYMAP_ALL_FROZEN ||
+		   (prstate->old_vmbits & VISIBILITYMAP_ALL_VISIBLE &&
+			!prstate->attempt_freeze));
+
+	/* We'll fill in presult for the caller */
+	memset(presult, 0, sizeof(PruneFreezeResult));
+
+	presult->old_vmbits = prstate->old_vmbits;
+
+	/* Clear any stale prune hint */
+	if (TransactionIdIsValid(PageGetPruneXid(page)))
+	{
+		PageClearPrunable(page);
+		MarkBufferDirtyHint(prstate->buffer, true);
+	}
+
+	if (PageIsEmpty(page))
+		return;
+
+	/*
+	 * Since the page is all-visible, a count of the normal ItemIds on the
+	 * page should be sufficient for vacuum's live tuple count.
+	 */
+	for (OffsetNumber off = FirstOffsetNumber;
+		 off <= maxoff;
+		 off = OffsetNumberNext(off))
+	{
+		ItemId		lp = PageGetItemId(page, off);
+
+		if (!ItemIdIsUsed(lp))
+			continue;
+
+		presult->hastup = true;
+
+		if (ItemIdIsNormal(lp))
+			prstate->live_tuples++;
+	}
+
+	presult->live_tuples = prstate->live_tuples;
+}
+
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
  * specified page.
@@ -984,6 +1068,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		heap_page_fix_vm_corruption(&prstate, InvalidOffsetNumber,
 									VM_CORRUPT_MISSING_PAGE_HINT);
 
+	/*
+	 * If the visibility map status allows it, bypass pruning and freezing
+	 * entirely. This must be done after fixing any discrepancy between the
+	 * page-level visibility hint and the VM.
+	 */
+	if (prstate.fast_path)
+	{
+		prune_freeze_bypass(&prstate, presult);
+		return;
+	}
+
 	/*
 	 * Examine all line pointers and tuple visibility information to determine
 	 * which line pointers should change state and which tuples may be frozen.
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 56722556417..1a446050d85 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2044,6 +2044,16 @@ lazy_scan_prune(LVRelState *vacrel,
 	if (vacrel->nindexes == 0)
 		params.options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
 
+	/*
+	 * Allow skipping full inspection of pages that the VM indicates are
+	 * already all-frozen (which may be scanned due to SKIP_PAGES_THRESHOLD).
+	 * However, if DISABLE_PAGE_SKIPPING was specified, we can't trust the VM,
+	 * so we must examine the page to make sure it is truly all-frozen and fix
+	 * it otherwise.
+	 */
+	if (vacrel->skipwithvm)
+		params.options |= HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+
 	heap_page_prune_and_freeze(&params,
 							   &presult,
 							   &vacrel->offnum,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 00134012137..305ecc31a9e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
 /* "options" flag bits for heap_page_prune_and_freeze */
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
+#define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
-- 
2.43.0



  [text/x-patch] v42-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.2K, 4-v42-0003-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From e0d1a4724fcef6826bdd86c7fc2d068641624b5f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v42 03/12] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 30 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 53 +++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  2 +
 src/include/utils/snapmgr.h                 |  4 +-
 7 files changed, 115 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..c678f5a3c8f 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,31 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1379,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1445,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 22f2d9d9798..f9db97a6edf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -166,10 +166,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -285,7 +288,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1087,6 +1090,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1289,7 +1304,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1755,29 +1770,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..8815acccafb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid, bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..db903709c49 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid, bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid, bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v42-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.5K, 5-v42-0004-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From df7b68460f9b6166e7179640161c3452c712d2a5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v42 04/12] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f9db97a6edf..fd5ff4e4e0a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*
 	 * True if the page can bypass full page inspection during pruning and
 	 * freezing based on its visibility map status and the caller's options.
@@ -166,14 +169,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -183,7 +178,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 
@@ -471,53 +465,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -747,7 +730,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1023,9 +1005,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1096,9 +1077,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1250,7 +1231,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1711,6 +1692,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1758,32 +1740,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v42-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 6-v42-0005-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From 33e0c4824f7405ac5711bacda7c21af28a3ac0be Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v42 05/12] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fd5ff4e4e0a..04a0580e313 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -163,21 +182,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 
@@ -232,6 +236,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -398,6 +403,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -917,6 +923,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * is not being attempted, there is no remaining work and we can bypass the
@@ -950,8 +992,6 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -986,7 +1026,8 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -1001,12 +1042,10 @@ prune_freeze_bypass(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1034,8 +1073,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1128,6 +1169,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1149,14 +1215,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1170,6 +1239,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1177,29 +1267,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1209,33 +1282,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8815acccafb..e123dda090f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v42-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.6K, 7-v42-0006-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 45ceda895d959ec957b8dd99155abbd221c9d52d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v42 06/12] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 04a0580e313..48d5d9fb906 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2545,6 +2545,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2563,14 +2565,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2685,12 +2691,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v42-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 8-v42-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 7061229f052018ecda5ccc31445509ccd5bbef2f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v42 07/12] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 48d5d9fb906..ea1afa5c58a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1256,8 +1256,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v42-0008-Track-which-relations-are-modified-by-a-query.patch (8.7K, 9-v42-0008-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 40415fe2723303786248a1a5d53389c48216d6da Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v42 08/12] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v42-0009-Thread-flags-through-begin-scan-APIs.patch (32.9K, 10-v42-0009-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From f1c3a40ff3fa8b5f63073b13306082c880ef1c06 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v42 09/12] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c8db357e69f..decfd792809 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -790,7 +791,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -856,7 +857,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index bd83e4712b3..a37fa9abece 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1726,7 +1728,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1790,7 +1792,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e123dda090f..c6aec63a505 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v42-0010-Pass-down-information-on-table-modification-to-s.patch (11.3K, 11-v42-0010-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 6b34d8f1380b7ba224c6e240289ca93705005a66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v42 10/12] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index decfd792809..b977719c295 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,7 +794,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -857,7 +863,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 &node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a37fa9abece..ad460c11679 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   &node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1728,7 +1734,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1792,7 +1800,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 &node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v42-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 12-v42-0011-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From a6d391c12f03706e8d9feb07c7cd647d91594cf2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v42 11/12] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea1afa5c58a..c5647b1494b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -236,7 +238,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -257,7 +260,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -339,6 +343,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -395,6 +401,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -474,9 +481,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -930,21 +936,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1171,7 +1193,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c6aec63a505..90ca5a2cfa8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v42-0012-Set-pd_prune_xid-on-insert.patch (8.8K, 13-v42-0012-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From cf680166b60099ca720fe70820034d3bf3837df9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v42 12/12] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5647b1494b..07e47d8927b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -279,7 +279,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1922,17 +1923,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-22 19:58                               ` Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-22 19:58 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 20, 2026 at 7:37 PM Melanie Plageman
<[email protected]> wrote:
>
> I've made several minor updates and two notable updates in attached v42:
>
> - no separate log_newpage_buffer() for empty page vacuum.
> log_heap_prune_and_freeze() now handles pages without a valid LSN on
> its own
> - the heap_page_is_all_visible() assertion should be stable even once
> it uses GlobalVisState because I've updated the GloablVisState
> functions to avoid updating the GlobalVisState boundaries in this case

I've pushed the first two patches. Attached are the remaining 10. No
changes were made to those from the previous version.

- Melanie


Attachments:

  [text/x-patch] v43-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.2K, 2-v43-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From 35251d668c2efdc82f6a40198272fa7ee5afe82a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v43 01/10] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 30 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 53 +++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  2 +
 src/include/utils/snapmgr.h                 |  4 +-
 7 files changed, 115 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..c678f5a3c8f 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,31 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1379,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1445,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b383b0fca8b..718f3a78c46 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,10 +160,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -281,7 +284,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1081,6 +1084,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,7 +1298,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1749,29 +1764,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..8815acccafb 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid, bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..db903709c49 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid, bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid, bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v43-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.4K, 3-v43-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 204a645f106b3e212cac17734b313675a6236bed Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v43 02/10] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 137 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 72 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 718f3a78c46..cebd78603cb 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,14 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -177,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /*
@@ -458,53 +452,42 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -734,7 +717,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1012,9 +994,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
+ * option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1090,9 +1071,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * be all-visible.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1244,7 +1225,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1705,6 +1686,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1752,32 +1734,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v43-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.1K, 4-v43-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From 58b3187b1bb54585c5b81e261678fb448ad9cea0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v43 03/10] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cebd78603cb..c43b192b163 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /*
@@ -228,6 +232,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -395,6 +400,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -906,6 +912,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * won't be attempted, there is no remaining work and we can use the fast path
@@ -939,8 +981,6 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -975,7 +1015,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -990,12 +1031,10 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'all-frozen' is always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1023,8 +1062,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1122,6 +1163,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1143,14 +1209,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint.  If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1164,6 +1233,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1171,29 +1261,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1203,33 +1276,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 8815acccafb..e123dda090f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *visibility_cutoff_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v43-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.6K, 5-v43-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 7dc943a7141988e2568a73136cab96829ea0b625 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v43 04/10] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so if we make this change we can remove all of the
XLOH_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c43b192b163..4f7220d17af 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2539,6 +2539,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2557,14 +2559,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2679,12 +2685,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v43-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.5K, 6-v43-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 2257e5018bdb68eda19ab05d0ea3689f7d94a6f9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v43 05/10] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 150 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 63 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..d64c403f2f0 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * XXX: We should consider not masking PD_ALL_VISIBLE during WAL
+	 * consistency checking.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4f7220d17af..41bfb6711c1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1250,8 +1250,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..21e89c38f0a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,31 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and incorrectly advance relfrozenxid
+ * or skip reading heap pages for an index-only scan and return wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +231,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +251,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v43-0006-Track-which-relations-are-modified-by-a-query.patch (8.7K, 7-v43-0006-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 636349f40265854b318ccf46700ec57731db8793 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v43 06/10] Track which relations are modified by a query

Save the relids of modified relations in a bitmap in the PlannedStmt.
A later commit will pass this information down to scan nodes to control
whether or not on-access pruning is allowed to set the visibility map.
Setting the visibility map during a scan is counterproductive if the
query is going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this bitmap is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execMain.c        | 47 ++++++++++++++++++++++++++
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  4 +++
 src/backend/executor/nodeModifyTable.c | 18 ++++++++++
 src/backend/optimizer/plan/planner.c   | 21 +++++++++++-
 src/include/nodes/plannodes.h          | 10 ++++++
 6 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..3f134f9a34d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -90,6 +90,9 @@ static bool ExecCheckPermissionsModified(Oid relOid, Oid userid,
 										 Bitmapset *modifiedCols,
 										 AclMode requiredPerms);
 static void ExecCheckXactReadOnly(PlannedStmt *plannedstmt);
+#ifdef USE_ASSERT_CHECKING
+static void ExecCheckModifiedRelIds(EState *estate);
+#endif
 static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
 static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
 										TupleTableSlot *slot,
@@ -827,6 +830,46 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
 }
 
 
+/*
+ * ExecCheckModifiedRelIds
+ *		Verify that every relation the executor actually opened for modification
+ *		or row locking is present in the planner's modifiedRelids.
+ *
+ * The planner's set may be a superset of what the executor touches, because it
+ * includes partitions that were pruned at runtime and parent row marks that the
+ * executor skips.
+ */
+#ifdef USE_ASSERT_CHECKING
+static void
+ExecCheckModifiedRelIds(EState *estate)
+{
+	PlannedStmt *plannedstmt = estate->es_plannedstmt;
+	Bitmapset  *executor_relids = NULL;
+	ListCell   *lc;
+
+	foreach(lc, estate->es_opened_result_relations)
+	{
+		ResultRelInfo *rri = (ResultRelInfo *) lfirst(lc);
+
+		if (rri->ri_RangeTableIndex != 0)
+			executor_relids = bms_add_member(executor_relids,
+											 rri->ri_RangeTableIndex);
+	}
+	if (estate->es_rowmarks)
+	{
+		for (int i = 0; i < estate->es_range_table_size; i++)
+		{
+			if (estate->es_rowmarks[i] != NULL)
+				executor_relids = bms_add_member(executor_relids,
+												 estate->es_rowmarks[i]->rti);
+		}
+	}
+	Assert(bms_is_subset(executor_relids, plannedstmt->modifiedRelids));
+	bms_free(executor_relids);
+}
+#endif
+
+
 /* ----------------------------------------------------------------
  *		InitPlan
  *
@@ -992,6 +1035,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	 */
 	planstate = ExecInitNode(plan, estate, eflags);
 
+#ifdef USE_ASSERT_CHECKING
+	ExecCheckModifiedRelIds(estate);
+#endif
+
 	/*
 	 * Get the tuple descriptor describing the type of tuples to return.
 	 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..d67f24fca8c 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,10 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		/* verify this relation is in the planner's modifiedRelids */
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..6b4ee4f9378 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,16 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * Verify this relation is in the planner's set of modified relations.
+	 * Partitions opened by tuple routing have ri_RangeTableIndex == 0 because
+	 * they have no range table entry, so we can only check relations that are
+	 * in the range table.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1533,10 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2219,10 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..847af979e31 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.  This is a
+	 * superset of what the executor will actually modify/lock at runtime,
+	 * because runtime partition pruning may eliminate some result relations,
+	 * and parent row marks are included here but skipped by the executor.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+
+		modifiedRelids = bms_add_member(modifiedRelids, rc->rti);
+	}
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..841c7707c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,16 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 *
+	 * Computed by the planner, this is a superset of what the executor will
+	 * actually touch at runtime, because it includes partitions that may be
+	 * pruned and parent row marks that the executor skips.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v43-0007-Thread-flags-through-begin-scan-APIs.patch (32.9K, 8-v43-0007-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 9433b29071a93383251c37773b6d1b4a512f9565 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v43 07/10] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f733be0220c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -794,7 +795,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +861,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..1a101df492b 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1730,7 +1732,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1796,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e123dda090f..c6aec63a505 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v43-0008-Pass-down-information-on-table-modification-to-s.patch (11.3K, 9-v43-0008-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 1648a5f09de8bacd005d383469436f21a73ced7d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v43 08/10] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           | 10 ++++++++++
 8 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f733be0220c..de9db45322c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -795,7 +798,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -861,7 +867,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 1a101df492b..9df4a699504 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1732,7 +1738,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1796,7 +1804,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..31c4192b67e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,16 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ */
+static inline bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v43-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 10-v43-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From a8ec9732ff892dff8146a1d0e637dd30de2dcf53 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v43 09/10] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 41bfb6711c1..235d21c1a41 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -919,21 +925,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1165,7 +1187,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c6aec63a505..90ca5a2cfa8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v43-0010-Set-pd_prune_xid-on-insert.patch (8.8K, 11-v43-0010-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 0d4edeee146d7b7f24efa1de4d7ded1e0f5c5111 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v43 10/10] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 235d21c1a41..aa9221f5eb6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1916,17 +1917,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-23 21:54                                 ` Melanie Plageman <[email protected]>
  2026-03-24 06:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  0 siblings, 2 replies; 34+ messages in thread

From: Melanie Plageman @ 2026-03-23 21:54 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Sun, Mar 22, 2026 at 3:58 PM Melanie Plageman
<[email protected]> wrote:
>
> I've pushed the first two patches. Attached are the remaining 10. No
> changes were made to those from the previous version.

I'm planning on pushing 0001-0005 in the morning.

I've made some significant changes to 0006 and realized I need some
help. 0006 tracks what relations are modified by a query. This new
version (v44) uses relation oids instead of rt indexes to handle cases
where the same relation appears more than once in the range table
(e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
computes modifiedRelOids (a list of relation OIDs modified by the
query) in the planner and stores them in the PlannedStmt. There is one
big issue I'm not sure how to solve:

For queries like INSERT INTO ptable SELECT * FROM ptable, where ptable
is a partitioned table, though we scan ptable, we don't know when
executing that scan that we will then modify ptable with the insert.

In my patch, I've added find_all_inheritors() when populating
modifiedRelOids, but I realize this probably isn't acceptable to add
to planner from a performance perspective.

I'm looking for other ways to solve the problem. Now, for my use case
(setting the VM), we don't mind setting the VM during the table scan
part of the query. Whatever page gets the inserted tuple will clear
all-visible -- but that is just one page out of many. However, future
users of modifiedRelOids will likely expect it to contain all modified
relation oids.

I could also check when setting up the scan descriptor if the leaf
partition's parents (would have to check full ancestry) are in
modifiedRelOids. This also doesn't address the problem of future users
thinking modifiedRelOids is complete.

Note that it also means partitions that aren't modified will be
included in modifiedRelOids if one of the partitions is being
modified.

I could also just change the name of the modifiedRelOids to something
that doesn't make future users think it's exhaustive.

- Melanie


Attachments:

  [text/x-patch] v44-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (18.4K, 2-v44-0001-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
  download | inline diff:
From cb89cc1c911f74f66d7febe69cbef95cef5c614e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v44 01/10] Use GlobalVisState in vacuum to determine page
 level visibility

During vacuum's first and third phases, we examine tuples' visibility to
determine if we can set the page all-visible in the visibility map.

Previously, this check compared tuple xmins against a single XID chosen
at the start of vacuum (OldestXmin). We now use GlobalVisState, which
also enables future work to set the VM during on-access pruning, since
ordinary queries have access to GlobalVisState but not OldestXmin.

This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.

OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.

Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. Therefore, we perform the
GlobalVisState check only once per page. This is safe because
visibility_cutoff_xid records the newest live xmin on the page; if it is
globally visible, then the entire page is all-visible.

Using GlobalVisState means on-access pruning can also maintain
visibility_cutoff_xid, which is required to set the visibility map
on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk#c755ef151507aba58471ffaca607e493
---
 src/backend/access/heap/heapam_visibility.c | 31 ++++++++++-
 src/backend/access/heap/pruneheap.c         | 54 ++++++++++---------
 src/backend/access/heap/vacuumlazy.c        | 60 ++++++++++++++-------
 src/backend/access/spgist/spgvacuum.c       |  2 +-
 src/backend/storage/ipc/procarray.c         | 19 ++++---
 src/include/access/heapam.h                 |  4 ++
 src/include/utils/snapmgr.h                 |  8 ++-
 7 files changed, 123 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index fc64f4343ce..9a7bf331df7 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1131,6 +1131,32 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	return res;
 }
 
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * If allow_update is true, the GlobalVisState boundaries may be updated. If
+ * it is false, they definitely will not be updated.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on
+ * the required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidConsideredRunning(GlobalVisState *state, TransactionId xid,
+								  bool allow_update)
+{
+	return !GlobalVisTestIsRemovableXid(state, xid, allow_update);
+}
+
 /*
  * Work horse for HeapTupleSatisfiesVacuum and similar routines.
  *
@@ -1354,7 +1380,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
 	{
 		Assert(TransactionIdIsValid(dead_after));
 
-		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+		if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after, true))
 			res = HEAPTUPLE_DEAD;
 	}
 	else
@@ -1420,7 +1446,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
 
 	/* Deleter committed, so tuple is dead if the XID is old enough. */
 	return GlobalVisTestIsRemovableXid(vistest,
-									   HeapTupleHeaderGetRawXmax(tuple));
+									   HeapTupleHeaderGetRawXmax(tuple),
+									   true);
 }
 
 /*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b383b0fca8b..8eb3afda4bf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -160,10 +160,13 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page.
-	 * The caller can use it as the conflict horizon, when setting the VM
-	 * bits.  It is only valid if we froze some tuples, and set_all_frozen is
-	 * true.
+	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+	 * is used after processing all tuples to determine if the page can be
+	 * considered all-visible (if the newest xmin is still considered running
+	 * by some snapshot, it cannot be). It is also used by the caller as the
+	 * conflict horizon when setting the VM bits, unless we froze all tuples
+	 * on the page (in which case the conflict xid was already included in the
+	 * WAL record).
 	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
@@ -281,7 +284,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 	 */
 	vistest = GlobalVisTestFor(relation);
 
-	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+	if (!GlobalVisTestIsRemovableXid(vistest, prune_xid, true))
 		return;
 
 	/*
@@ -1081,6 +1084,19 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 */
 	prune_freeze_plan(&prstate, off_loc);
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * amongst them may be considered running by any snapshot, the page cannot
+	 * be all-visible. This should be done before determining whether or not
+	 * to opportunistically freeze.
+	 */
+	if (prstate.set_all_visible &&
+		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(prstate.vistest,
+										  prstate.visibility_cutoff_xid,
+										  true))
+		prstate.set_all_visible = prstate.set_all_frozen = false;
+
 	/*
 	 * If checksums are enabled, calling heap_prune_satisfies_vacuum() while
 	 * checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,7 +1299,7 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup)
 	 * if the GlobalVisState has been updated since the beginning of vacuuming
 	 * the relation.
 	 */
-	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+	if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after, true))
 		return HEAPTUPLE_DEAD;
 
 	return res;
@@ -1749,29 +1765,15 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 				}
 
 				/*
-				 * The inserter definitely committed.  But is it old enough
-				 * that everyone sees it as committed?  A FrozenTransactionId
-				 * is seen as committed to everyone.  Otherwise, we check if
-				 * there is a snapshot that considers this xid to still be
-				 * running, and if so, we don't consider the page all-visible.
+				 * The inserter definitely committed. But we don't know if it
+				 * is old enough that everyone sees it as committed. Later,
+				 * after processing all the tuples on the page, we'll check if
+				 * there is any snapshot that still considers the newest xid
+				 * on the page to be running. If so, we don't consider the
+				 * page all-visible.
 				 */
 				xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * For now always use prstate->cutoffs for this test, because
-				 * we only update 'set_all_visible' and 'set_all_frozen' when
-				 * freezing is requested. We could use
-				 * GlobalVisTestIsRemovableXid instead, if a non-freezing
-				 * caller wanted to set the VM bit.
-				 */
-				Assert(prstate->cutoffs);
-				if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
-
 				/* Track newest xmin on page. */
 				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
 					TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1a446050d85..797973d7bd0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -468,13 +468,14 @@ static void dead_items_cleanup(LVRelState *vacrel);
 
 #ifdef USE_ASSERT_CHECKING
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 TransactionId OldestXmin,
+									 GlobalVisState *vistest,
 									 bool *all_frozen,
 									 TransactionId *visibility_cutoff_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
-										   TransactionId OldestXmin,
+										   GlobalVisState *vistest,
+										   bool allow_update_vistest,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
@@ -2089,7 +2090,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		Assert(presult.lpdead_items == 0);
 
 		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+										vacrel->vistest, &debug_all_frozen,
 										&debug_cutoff, &vacrel->offnum));
 
 		Assert(presult.set_all_frozen == debug_all_frozen);
@@ -2852,7 +2853,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	 * done outside the critical section.
 	 */
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
-									   vacrel->cutoffs.OldestXmin,
+									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
 									   &all_frozen, &visibility_cutoff_xid,
 									   &vacrel->offnum))
@@ -3614,14 +3615,19 @@ dead_items_cleanup(LVRelState *vacrel)
  */
 static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
-						 TransactionId OldestXmin,
+						 GlobalVisState *vistest,
 						 bool *all_frozen,
 						 TransactionId *visibility_cutoff_xid,
 						 OffsetNumber *logging_offnum)
 {
-
+	/*
+	 * Pass allow_update_vistest as false so that the GlobalVisState
+	 * boundaries used here match those used by the pruning code we are
+	 * cross-checking. Allowing an update could move the boundaries between
+	 * the two calls, causing a spurious assertion failure.
+	 */
 	return heap_page_would_be_all_visible(rel, buf,
-										  OldestXmin,
+										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
 										  visibility_cutoff_xid,
@@ -3642,7 +3648,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Returns true if the page is all-visible other than the provided
  * deadoffsets and false otherwise.
  *
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility. If allow_update_vistest is true,
+ * the boundaries of the GlobalVisState may be updated when checking the
+ * visibility of the newest live XID on the page.
  *
  * Output parameters:
  *
@@ -3661,7 +3669,8 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  */
 static bool
 heap_page_would_be_all_visible(Relation rel, Buffer buf,
-							   TransactionId OldestXmin,
+							   GlobalVisState *vistest,
+							   bool allow_update_vistest,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
@@ -3742,7 +3751,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 				{
 					TransactionId xmin;
 
-					/* Check comments in lazy_scan_prune. */
+					/* Check heap_prune_record_unchanged_lp_normal comments */
 					if (!HeapTupleHeaderXminCommitted(tuple.t_data))
 					{
 						all_visible = false;
@@ -3751,16 +3760,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					}
 
 					/*
-					 * The inserter definitely committed. But is it old enough
-					 * that everyone sees it as committed?
+					 * The inserter definitely committed. But we don't know if
+					 * it is old enough that everyone sees it as committed.
+					 * Don't check that now.
+					 *
+					 * If we scan all tuples without finding one that prevents
+					 * the page from being all-visible, we then check whether
+					 * any snapshot still considers the newest XID on the page
+					 * to be running. In that case, the page is not considered
+					 * all-visible.
 					 */
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
-					if (!TransactionIdPrecedes(xmin, OldestXmin))
-					{
-						all_visible = false;
-						*all_frozen = false;
-						break;
-					}
 
 					/* Track newest xmin on page. */
 					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3789,6 +3799,20 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 		}
 	}							/* scan along page */
 
+	/*
+	 * After processing all the live tuples on the page, if the newest xmin
+	 * among them may still be considered running by any snapshot, the page
+	 * cannot be all-visible.
+	 */
+	if (all_visible &&
+		TransactionIdIsNormal(*visibility_cutoff_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+										  allow_update_vistest))
+	{
+		all_visible = false;
+		*all_frozen = false;
+	}
+
 	/* Clear the offset information once we have processed the given page. */
 	*logging_offnum = InvalidOffsetNumber;
 
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 6b7117b56b2..c461f8dc02d 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
 		 */
 		if (dt->tupstate == SPGIST_REDIRECT &&
 			(!TransactionIdIsValid(dt->xid) ||
-			 GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+			 GlobalVisTestIsRemovableXid(vistest, dt->xid, true)))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 0f913897acc..27e5adeebfb 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4223,11 +4223,17 @@ GlobalVisUpdate(void)
  * The state passed needs to have been initialized for the relation fxid is
  * from (NULL is also OK), otherwise the result may not be correct.
  *
+ * If allow_update is false, the GlobalVisState boundaries will not be updated
+ * even if it would otherwise be beneficial. This is useful for callers that
+ * do not want GlobalVisState to advance at all, for example because they need
+ * a conservative answer based on the current boundaries.
+ *
  * See comment for GlobalVisState for details.
  */
 bool
 GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
-								FullTransactionId fxid)
+								FullTransactionId fxid,
+								bool allow_update)
 {
 	/*
 	 * If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4248,7 +4254,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
 	 * might not exist a snapshot considering fxid running. If it makes sense,
 	 * update boundaries and recheck.
 	 */
-	if (GlobalVisTestShouldUpdate(state))
+	if (allow_update && GlobalVisTestShouldUpdate(state))
 	{
 		GlobalVisUpdate();
 
@@ -4268,7 +4274,8 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
  * relfrozenxid).
  */
 bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid,
+							bool allow_update)
 {
 	FullTransactionId fxid;
 
@@ -4282,7 +4289,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
 	 */
 	fxid = FullXidRelativeTo(state->definitely_needed, xid);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, allow_update);
 }
 
 /*
@@ -4296,7 +4303,7 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableFullXid(state, fxid);
+	return GlobalVisTestIsRemovableFullXid(state, fxid, true);
 }
 
 /*
@@ -4310,7 +4317,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
 
 	state = GlobalVisTestFor(rel);
 
-	return GlobalVisTestIsRemovableXid(state, xid);
+	return GlobalVisTestIsRemovableXid(state, xid, true);
 }
 
 /*
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..ca5e8d1794f 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -480,6 +480,10 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 										  Buffer buffer);
 extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 											Buffer buffer);
+
+extern bool GlobalVisTestXidConsideredRunning(GlobalVisState *state,
+											  TransactionId xid,
+											  bool allow_update);
 extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
 												   TransactionId *dead_after);
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 8c919d2640e..c7a869bc2b2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -115,8 +115,12 @@ extern char *ExportSnapshot(Snapshot snapshot);
  */
 typedef struct GlobalVisState GlobalVisState;
 extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state,
+										TransactionId xid,
+										bool allow_update);
+extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
+											FullTransactionId fxid,
+											bool allow_update);
 extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
 extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
 
-- 
2.43.0



  [text/x-patch] v44-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch (15.6K, 3-v44-0002-Keep-newest-live-XID-up-to-date-even-if-page-not.patch)
  download | inline diff:
From 0968ef2bc8aabd448a3ce97365f74859d83cb68d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 28 Feb 2026 16:06:51 -0500
Subject: [PATCH v44 02/10] Keep newest live XID up-to-date even if page not
 all-visible

During pruning, we keep track of the newest xmin of live tuples on the
page visible to all running and future transactions so that we can use
it later as the snapshot conflict horizon when setting the VM if the
page turns out to be all-visible.

Previously, we stopped updating this value once we determined the page
was not all-visible. However, maintaining it even when the page is not
all-visible is inexpensive and makes the snapshot conflict horizon
calculation clearer. This guarantees it won't contain a stale value.

Since we'll keep it up to date all the time now anyway, there's no
reason not to maintain set_all_visible for on-access pruning. This will
allow us to set the VM on-access in the future.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/bqc4kh5midfn44gnjiqez3bjqv4zogydguvdn446riw45jcf3y%404ez66il7ebvk
---
 src/backend/access/heap/pruneheap.c  | 138 +++++++++++----------------
 src/backend/access/heap/vacuumlazy.c |  30 +++---
 2 files changed, 73 insertions(+), 95 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 8eb3afda4bf..301fcfe7024 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -129,6 +129,9 @@ typedef struct
 	/* Bits in the vmbuffer for this heap page */
 	uint8		old_vmbits;
 
+	/* The newest xmin of live tuples on the page */
+	TransactionId newest_live_xid;
+
 	/*-------------------------------------------------------
 	 * Information about what was done
 	 *
@@ -160,14 +163,6 @@ typedef struct
 	 * all-frozen bits in the visibility map can be set for this page after
 	 * pruning.
 	 *
-	 * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
-	 * is used after processing all tuples to determine if the page can be
-	 * considered all-visible (if the newest xmin is still considered running
-	 * by some snapshot, it cannot be). It is also used by the caller as the
-	 * conflict horizon when setting the VM bits, unless we froze all tuples
-	 * on the page (in which case the conflict xid was already included in the
-	 * WAL record).
-	 *
 	 * NOTE: set_all_visible and set_all_frozen initially don't include
 	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
 	 * use them to decide whether to freeze the page or not.  The
@@ -177,7 +172,6 @@ typedef struct
 	 */
 	bool		set_all_visible;
 	bool		set_all_frozen;
-	TransactionId visibility_cutoff_xid;
 } PruneState;
 
 /*
@@ -458,53 +452,43 @@ prune_freeze_setup(PruneFreezeParams *params,
 	prstate->deadoffsets = presult->deadoffsets;
 
 	/*
-	 * Vacuum may update the VM after we're done.  We can keep track of
-	 * whether the page will be all-visible and all-frozen after pruning and
-	 * freezing to help the caller to do that.
-	 *
-	 * Currently, only VACUUM sets the VM bits.  To save the effort, only do
-	 * the bookkeeping if the caller needs it.  Currently, that's tied to
-	 * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
-	 * to update the VM bits without also freezing or freeze without also
-	 * setting the VM bits.
+	 * We track whether the page will be all-visible/all-frozen at the end of
+	 * pruning and freezing. While examining tuple visibility, we'll set
+	 * set_all_visible to false if there are tuples on the page not visible to
+	 * all running and future transactions. set_all_visible is always
+	 * maintained but only VACUUM will set the VM if the page ends up being
+	 * all-visible.
 	 *
-	 * In addition to telling the caller whether it can set the VM bit, we
-	 * also use 'set_all_visible' and 'set_all_frozen' for our own
-	 * decision-making. If the whole page would become frozen, we consider
-	 * opportunistically freezing tuples.  We will not be able to freeze the
-	 * whole page if there are tuples present that are not visible to everyone
-	 * or if there are dead tuples which are not yet removable.  However, dead
-	 * tuples which will be removed by the end of vacuuming should not
-	 * preclude us from opportunistically freezing.  Because of that, we do
-	 * not immediately clear set_all_visible and set_all_frozen when we see
-	 * LP_DEAD items.  We fix that after scanning the line pointers. We must
-	 * correct set_all_visible and set_all_frozen before we return them to the
-	 * caller, so that the caller doesn't set the VM bits incorrectly.
+	 * We also keep track of the newest live XID, which is used to calculate
+	 * the snapshot conflict horizon for a WAL record setting the VM.
 	 */
-	if (prstate->attempt_freeze)
-	{
-		prstate->set_all_visible = true;
-		prstate->set_all_frozen = true;
-	}
-	else
-	{
-		/*
-		 * Initializing to false allows skipping the work to update them in
-		 * heap_prune_record_unchanged_lp_normal().
-		 */
-		prstate->set_all_visible = false;
-		prstate->set_all_frozen = false;
-	}
+	prstate->set_all_visible = true;
+	prstate->newest_live_xid = InvalidTransactionId;
 
 	/*
-	 * The visibility cutoff xid is the newest xmin of live tuples on the
-	 * page.  In the common case, this will be set as the conflict horizon the
-	 * caller can use for updating the VM.  If, at the end of freezing and
-	 * pruning, the page is all-frozen, there is no possibility that any
-	 * running transaction on the standby does not see tuples on the page as
-	 * all-visible, so the conflict horizon remains InvalidTransactionId.
+	 * Currently, only VACUUM performs freezing, but other callers may in the
+	 * future. We must initialize set_all_frozen based on whether or not the
+	 * caller passed HEAP_PAGE_PRUNE_FREEZE, because if they did not, we won't
+	 * call heap_prepare_freeze_tuple() for each tuple, and set_all_frozen
+	 * will never be cleared for tuples that need freezing. This would lead to
+	 * incorrectly setting the visibility map all-frozen for this page.
+	 *
+	 * When freezing is not required (no XIDs/MXIDs older than the freeze
+	 * cutoff), we may still choose to "opportunistically" freeze if doing so
+	 * would make the page all-frozen.
+	 *
+	 * We will not be able to freeze the whole page at the end of vacuum if
+	 * there are tuples present that are not visible to everyone or if there
+	 * are dead tuples which will not be removable. However, dead tuples that
+	 * will be removed by the end of vacuum should not prevent this
+	 * opportunistic freezing.
+	 *
+	 * Therefore, we do not clear set_all_visible and set_all_frozen when we
+	 * encounter LP_DEAD items. Instead, we correct them after deciding
+	 * whether to freeze, but before updating the VM, to avoid setting the VM
+	 * bits incorrectly.
 	 */
-	prstate->visibility_cutoff_xid = InvalidTransactionId;
+	prstate->set_all_frozen = prstate->attempt_freeze;
 }
 
 /*
@@ -734,7 +718,6 @@ heap_page_will_freeze(bool did_tuple_hint_fpi,
 	if (!prstate->attempt_freeze)
 	{
 		Assert(!prstate->set_all_frozen && prstate->nfrozen == 0);
-		Assert(prstate->lpdead_items == 0 || !prstate->set_all_visible);
 		return false;
 	}
 
@@ -1012,9 +995,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
  * presult->set_all_visible and presult->set_all_frozen after determining
  * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set.  They are always set to false when the HEAP_PAGE_PRUNE_FREEZE
- * option is not passed, because at the moment only callers that also freeze
- * need that information.
+ * be set. 'set_all_frozen' is always set to false when the
+ * HEAP_PAGE_PRUNE_FREEZE option is not passed.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1091,9 +1073,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	 * to opportunistically freeze.
 	 */
 	if (prstate.set_all_visible &&
-		TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+		TransactionIdIsNormal(prstate.newest_live_xid) &&
 		GlobalVisTestXidConsideredRunning(prstate.vistest,
-										  prstate.visibility_cutoff_xid,
+										  prstate.newest_live_xid,
 										  true))
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
@@ -1245,7 +1227,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	if (presult->set_all_frozen)
 		presult->vm_conflict_horizon = InvalidTransactionId;
 	else
-		presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
@@ -1706,6 +1688,7 @@ static void
 heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 {
 	HeapTupleHeader htup;
+	TransactionId xmin;
 	Page		page = prstate->page;
 
 	Assert(!prstate->processed[offnum]);
@@ -1753,32 +1736,27 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			 * See SetHintBits for more info.  Check that the tuple is hinted
 			 * xmin-committed because of that.
 			 */
-			if (prstate->set_all_visible)
+			if (!HeapTupleHeaderXminCommitted(htup))
 			{
-				TransactionId xmin;
+				prstate->set_all_visible = false;
+				prstate->set_all_frozen = false;
+				break;
+			}
 
-				if (!HeapTupleHeaderXminCommitted(htup))
-				{
-					prstate->set_all_visible = false;
-					prstate->set_all_frozen = false;
-					break;
-				}
+			/*
+			 * The inserter definitely committed. But we don't know if it is
+			 * old enough that everyone sees it as committed. Later, after
+			 * processing all the tuples on the page, we'll check if there is
+			 * any snapshot that still considers the newest xid on the page to
+			 * be running. If so, we don't consider the page all-visible.
+			 */
+			xmin = HeapTupleHeaderGetXmin(htup);
 
-				/*
-				 * The inserter definitely committed. But we don't know if it
-				 * is old enough that everyone sees it as committed. Later,
-				 * after processing all the tuples on the page, we'll check if
-				 * there is any snapshot that still considers the newest xid
-				 * on the page to be running. If so, we don't consider the
-				 * page all-visible.
-				 */
-				xmin = HeapTupleHeaderGetXmin(htup);
+			/* Track newest xmin on page. */
+			if (TransactionIdFollows(xmin, prstate->newest_live_xid) &&
+				TransactionIdIsNormal(xmin))
+				prstate->newest_live_xid = xmin;
 
-				/* Track newest xmin on page. */
-				if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
-					TransactionIdIsNormal(xmin))
-					prstate->visibility_cutoff_xid = xmin;
-			}
 			break;
 
 		case HEAPTUPLE_RECENTLY_DEAD:
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 797973d7bd0..696919e35dd 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -470,7 +470,7 @@ static void dead_items_cleanup(LVRelState *vacrel);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 GlobalVisState *vistest,
 									 bool *all_frozen,
-									 TransactionId *visibility_cutoff_xid,
+									 TransactionId *newest_live_xid,
 									 OffsetNumber *logging_offnum);
 #endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
@@ -479,7 +479,7 @@ static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   OffsetNumber *deadoffsets,
 										   int ndeadoffsets,
 										   bool *all_frozen,
-										   TransactionId *visibility_cutoff_xid,
+										   TransactionId *newest_live_xid,
 										   OffsetNumber *logging_offnum);
 static void update_relstats_all_indexes(LVRelState *vacrel);
 static void vacuum_error_callback(void *arg);
@@ -2829,7 +2829,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxHeapTuplesPerPage];
 	int			nunused = 0;
-	TransactionId visibility_cutoff_xid;
+	TransactionId newest_live_xid;
 	TransactionId conflict_xid = InvalidTransactionId;
 	bool		all_frozen;
 	LVSavedErrInfo saved_err_info;
@@ -2855,14 +2855,14 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 	if (heap_page_would_be_all_visible(vacrel->rel, buffer,
 									   vacrel->vistest, true,
 									   deadoffsets, num_offsets,
-									   &all_frozen, &visibility_cutoff_xid,
+									   &all_frozen, &newest_live_xid,
 									   &vacrel->offnum))
 	{
 		vmflags |= VISIBILITYMAP_ALL_VISIBLE;
 		if (all_frozen)
 		{
 			vmflags |= VISIBILITYMAP_ALL_FROZEN;
-			Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+			Assert(!TransactionIdIsValid(newest_live_xid));
 		}
 
 		/*
@@ -2903,7 +2903,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		visibilitymap_set_vmbits(blkno,
 								 vmbuffer, vmflags,
 								 vacrel->rel->rd_locator);
-		conflict_xid = visibility_cutoff_xid;
+		conflict_xid = newest_live_xid;
 	}
 
 	/*
@@ -3617,7 +3617,7 @@ static bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
-						 TransactionId *visibility_cutoff_xid,
+						 TransactionId *newest_live_xid,
 						 OffsetNumber *logging_offnum)
 {
 	/*
@@ -3630,7 +3630,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 										  vistest, false,
 										  NULL, 0,
 										  all_frozen,
-										  visibility_cutoff_xid,
+										  newest_live_xid,
 										  logging_offnum);
 }
 #endif
@@ -3655,7 +3655,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
  * Output parameters:
  *
  *  - *all_frozen: true if every tuple on the page is frozen
- *  - *visibility_cutoff_xid: newest xmin; valid only if page is all-visible
+ *  - *newest_live_xid: newest xmin of live tuples on the page
  *  - *logging_offnum: OffsetNumber of current tuple being processed;
  *     used by vacuum's error callback system.
  *
@@ -3674,7 +3674,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 							   OffsetNumber *deadoffsets,
 							   int ndeadoffsets,
 							   bool *all_frozen,
-							   TransactionId *visibility_cutoff_xid,
+							   TransactionId *newest_live_xid,
 							   OffsetNumber *logging_offnum)
 {
 	Page		page = BufferGetPage(buf);
@@ -3684,7 +3684,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	bool		all_visible = true;
 	int			matched_dead_count = 0;
 
-	*visibility_cutoff_xid = InvalidTransactionId;
+	*newest_live_xid = InvalidTransactionId;
 	*all_frozen = true;
 
 	Assert(ndeadoffsets == 0 || deadoffsets);
@@ -3773,9 +3773,9 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 					xmin = HeapTupleHeaderGetXmin(tuple.t_data);
 
 					/* Track newest xmin on page. */
-					if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
+					if (TransactionIdFollows(xmin, *newest_live_xid) &&
 						TransactionIdIsNormal(xmin))
-						*visibility_cutoff_xid = xmin;
+						*newest_live_xid = xmin;
 
 					/* Check whether this tuple is already frozen or not */
 					if (all_visible && *all_frozen &&
@@ -3805,8 +3805,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
 	 * cannot be all-visible.
 	 */
 	if (all_visible &&
-		TransactionIdIsNormal(*visibility_cutoff_xid) &&
-		GlobalVisTestXidConsideredRunning(vistest, *visibility_cutoff_xid,
+		TransactionIdIsNormal(*newest_live_xid) &&
+		GlobalVisTestXidConsideredRunning(vistest, *newest_live_xid,
 										  allow_update_vistest))
 	{
 		all_visible = false;
-- 
2.43.0



  [text/x-patch] v44-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch (23.2K, 4-v44-0003-WAL-log-VM-setting-during-vacuum-phase-I-in-XLOG.patch)
  download | inline diff:
From c440d93887ed97ee8ab42004da76417e29fa2a92 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v44 03/10] WAL log VM setting during vacuum phase I in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.

Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications. This reduces WAL volume produced by vacuum.

For now, this change applies only to vacuum phase I, not to pruning
performed during normal page access.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Earlier version Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 245 +++++++++++++++++++--------
 src/backend/access/heap/vacuumlazy.c | 113 ++----------
 src/include/access/heapam.h          |  37 ++--
 3 files changed, 205 insertions(+), 190 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 301fcfe7024..4d6d5e92773 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -72,6 +72,21 @@ typedef struct
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
 
+	/*
+	 * set_all_visible and set_all_frozen indicate if the all-visible and
+	 * all-frozen bits in the visibility map can be set for this page after
+	 * pruning.
+	 *
+	 * NOTE: set_all_visible and set_all_frozen initially don't include
+	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
+	 * use them to decide whether to opportunistically freeze the page or not.
+	 * The set_all_visible and set_all_frozen values ultimately used to set
+	 * the VM are adjusted to include LP_DEAD items after we determine whether
+	 * or not to opportunistically freeze.
+	 */
+	bool		set_all_visible;
+	bool		set_all_frozen;
+
 	/*-------------------------------------------------------
 	 * Working state for HOT chain processing
 	 *-------------------------------------------------------
@@ -122,12 +137,16 @@ typedef struct
 	/*
 	 * Caller must provide a pinned vmbuffer corresponding to the heap block
 	 * passed to heap_page_prune_and_freeze(). We will fix any corruption
-	 * found in the VM.
+	 * found in the VM and set the VM if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
-	/* Bits in the vmbuffer for this heap page */
+	/*
+	 * The state of the VM bits at the beginning of pruning and the state they
+	 * will be in at the end.
+	 */
 	uint8		old_vmbits;
+	uint8		new_vmbits;
 
 	/* The newest xmin of live tuples on the page */
 	TransactionId newest_live_xid;
@@ -157,21 +176,6 @@ typedef struct
 	 */
 	int			lpdead_items;	/* number of items in the array */
 	OffsetNumber *deadoffsets;	/* points directly to presult->deadoffsets */
-
-	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map can be set for this page after
-	 * pruning.
-	 *
-	 * NOTE: set_all_visible and set_all_frozen initially don't include
-	 * LP_DEAD items. That's convenient for heap_page_prune_and_freeze() to
-	 * use them to decide whether to freeze the page or not.  The
-	 * set_all_visible and set_all_frozen values returned to the caller are
-	 * adjusted to include LP_DEAD items after we determine whether to
-	 * opportunistically freeze.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
 } PruneState;
 
 /*
@@ -228,6 +232,7 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
 
 
 /*
@@ -395,6 +400,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 
 	Assert(BufferIsValid(params->vmbuffer));
 	prstate->vmbuffer = params->vmbuffer;
+	prstate->new_vmbits = 0;
 	prstate->old_vmbits = visibilitymap_get_status(prstate->relation,
 												   prstate->block,
 												   &prstate->vmbuffer);
@@ -907,6 +913,42 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
 	}
 }
 
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * Returns true if one or both VM bits should be set and false otherwise.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+{
+	/*
+	 * Though on-access pruning maintains prstate->set_all_visible, we don't
+	 * set the VM on-access for now.
+	 */
+	if (reason == PRUNE_ON_ACCESS)
+		return false;
+
+	if (!prstate->set_all_visible)
+		return false;
+
+	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+	if (prstate->set_all_frozen)
+		prstate->new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+	if (prstate->new_vmbits == prstate->old_vmbits)
+	{
+		prstate->new_vmbits = 0;
+		return false;
+	}
+
+	return true;
+}
+
 /*
  * If the page is already all-frozen, or already all-visible and freezing
  * won't be attempted, there is no remaining work and we can use the fast path
@@ -940,8 +982,6 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 	/* We'll fill in presult for the caller */
 	memset(presult, 0, sizeof(PruneFreezeResult));
 
-	presult->old_vmbits = prstate->old_vmbits;
-
 	/* Clear any stale prune hint */
 	if (TransactionIdIsValid(PageGetPruneXid(page)))
 	{
@@ -976,7 +1016,8 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
 
 /*
  * Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
  *
  * Caller must have pin and buffer cleanup lock on the page.  Note that we
  * don't update the FSM information for page on caller's behalf.  Caller might
@@ -991,12 +1032,10 @@ prune_freeze_fast_path(PruneState *prstate, PruneFreezeResult *presult)
  * tuples if it's required in order to advance relfrozenxid / relminmxid, or
  * if it's considered advantageous for overall system performance to do so
  * now.  The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing.  When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set
- * presult->set_all_visible and presult->set_all_frozen after determining
- * whether or not to opportunistically freeze, to indicate if the VM bits can
- * be set. 'set_all_frozen' is always set to false when the
- * HEAP_PAGE_PRUNE_FREEZE option is not passed.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * A vmbuffer corresponding to the heap page is also passed and if the page is
+ * found to be all-visible/all-frozen, we will set it in the VM.
  *
  * presult contains output parameters needed by callers, such as the number of
  * tuples removed and the offsets of dead items on the page after pruning.
@@ -1024,8 +1063,10 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	bool		do_freeze;
 	bool		do_prune;
 	bool		do_hint_prune;
+	bool		do_set_vm;
 	bool		did_tuple_hint_fpi;
 	int64		fpi_before = pgWalUsage.wal_fpi;
+	TransactionId conflict_xid;
 
 	/* Initialize prstate */
 	prune_freeze_setup(params,
@@ -1124,6 +1165,31 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		prstate.set_all_visible = prstate.set_all_frozen = false;
 
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
+	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
+
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+
+	/*
+	 * new_vmbits should be 0 regardless of whether or not the page is
+	 * all-visible if we do not intend to set the VM.
+	 */
+	Assert(do_set_vm || prstate.new_vmbits == 0);
+
+	/*
+	 * The snapshot conflict horizon for the whole record is the most
+	 * conservative (newest) horizon required by any change in the record.
+	 */
+	conflict_xid = InvalidTransactionId;
+	if (do_set_vm)
+		conflict_xid = prstate.newest_live_xid;
+	if (do_freeze && TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid, conflict_xid))
+		conflict_xid = prstate.pagefrz.FreezePageConflictXid;
+	if (do_prune && TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
+		conflict_xid = prstate.latest_xid_removed;
+
+	/* Lock vmbuffer before entering a critical section */
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_EXCLUSIVE);
 
 	/* Any error while applying the changes is critical */
 	START_CRIT_SECTION();
@@ -1145,14 +1211,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 		/*
 		 * If that's all we had to do to the page, this is a non-WAL-logged
-		 * hint.  If we are going to freeze or prune the page, we will mark
-		 * the buffer dirty below.
+		 * hint. If we are going to freeze or prune the page or set
+		 * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+		 *
+		 * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+		 * for the VM to be set and PD_ALL_VISIBLE to be clear.
 		 */
-		if (!do_freeze && !do_prune)
+		if (!do_freeze && !do_prune && !do_set_vm)
 			MarkBufferDirtyHint(prstate.buffer, true);
 	}
 
-	if (do_prune || do_freeze)
+	if (do_prune || do_freeze || do_set_vm)
 	{
 		/* Apply the planned item changes and repair page fragmentation. */
 		if (do_prune)
@@ -1166,6 +1235,27 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		if (do_freeze)
 			heap_freeze_prepared_tuples(prstate.buffer, prstate.frozen, prstate.nfrozen);
 
+		/* Set the visibility map and page visibility hint */
+		if (do_set_vm)
+		{
+			/*
+			 * While it is valid for PD_ALL_VISIBLE to be set when the
+			 * corresponding VM bit is clear, we strongly prefer to keep them
+			 * in sync.
+			 *
+			 * The heap buffer must be marked dirty before adding it to the
+			 * WAL chain when setting the VM. We don't worry about
+			 * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+			 * already set, though. It is extremely rare to have a clean heap
+			 * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+			 * so there is no point in optimizing it.
+			 */
+			PageSetAllVisible(prstate.page);
+			PageClearPrunable(prstate.page);
+			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+									 prstate.relation->rd_locator);
+		}
+
 		MarkBufferDirty(prstate.buffer);
 
 		/*
@@ -1173,29 +1263,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 		 */
 		if (RelationNeedsWAL(prstate.relation))
 		{
-			/*
-			 * The snapshotConflictHorizon for the whole record should be the
-			 * most conservative of all the horizons calculated for any of the
-			 * possible modifications. If this record will prune tuples, any
-			 * queries on the standby older than the newest xid of the most
-			 * recently removed tuple this record will prune will conflict. If
-			 * this record will freeze tuples, any queries on the standby with
-			 * xids older than the newest tuple this record will freeze will
-			 * conflict.
-			 */
-			TransactionId conflict_xid;
-
-			if (TransactionIdFollows(prstate.pagefrz.FreezePageConflictXid,
-									 prstate.latest_xid_removed))
-				conflict_xid = prstate.pagefrz.FreezePageConflictXid;
-			else
-				conflict_xid = prstate.latest_xid_removed;
-
 			log_heap_prune_and_freeze(prstate.relation, prstate.buffer,
-									  InvalidBuffer,	/* vmbuffer */
-									  0,	/* vmflags */
+									  do_set_vm ? prstate.vmbuffer : InvalidBuffer,
+									  do_set_vm ? prstate.new_vmbits : 0,
 									  conflict_xid,
-									  true, params->reason,
+									  true, /* cleanup lock */
+									  params->reason,
 									  prstate.frozen, prstate.nfrozen,
 									  prstate.redirected, prstate.nredirected,
 									  prstate.nowdead, prstate.ndead,
@@ -1205,33 +1278,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 
 	END_CRIT_SECTION();
 
+	if (do_set_vm)
+		LockBuffer(prstate.vmbuffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * During its second pass over the heap, VACUUM calls
+	 * heap_page_would_be_all_visible() to determine whether a page is
+	 * all-visible and all-frozen. The logic here is similar. After completing
+	 * pruning and freezing, use an assertion to verify that our results
+	 * remain consistent with heap_page_would_be_all_visible(). It's also a
+	 * valuable cross-check of the page state after pruning and freezing.
+	 */
+#ifdef USE_ASSERT_CHECKING
+	if (prstate.set_all_visible)
+	{
+		TransactionId debug_cutoff;
+		bool		debug_all_frozen;
+
+		Assert(prstate.lpdead_items == 0);
+
+		Assert(heap_page_is_all_visible(prstate.relation, prstate.buffer,
+										prstate.vistest,
+										&debug_all_frozen,
+										&debug_cutoff, off_loc));
+
+		Assert(!TransactionIdIsValid(debug_cutoff) ||
+			   debug_cutoff == prstate.newest_live_xid);
+
+		/*
+		 * It's possible the page is composed entirely of frozen tuples but is
+		 * not set all-frozen in the VM and did not pass
+		 * HEAP_PAGE_PRUNE_FREEZE. In this case, it's possible
+		 * heap_page_is_all_visible() finds the page completely frozen, even
+		 * though prstate.set_all_frozen is false.
+		 */
+		Assert(!prstate.set_all_frozen || debug_all_frozen);
+	}
+#endif
+
 	/* Copy information back for caller */
 	presult->ndeleted = prstate.ndeleted;
 	presult->nnewlpdead = prstate.ndead;
 	presult->nfrozen = prstate.nfrozen;
 	presult->live_tuples = prstate.live_tuples;
 	presult->recently_dead_tuples = prstate.recently_dead_tuples;
-	presult->set_all_visible = prstate.set_all_visible;
-	presult->set_all_frozen = prstate.set_all_frozen;
 	presult->hastup = prstate.hastup;
-	presult->old_vmbits = prstate.old_vmbits;
-
-	/*
-	 * For callers planning to update the visibility map, the conflict horizon
-	 * for that record must be the newest xmin on the page.  However, if the
-	 * page is completely frozen, there can be no conflict and the
-	 * vm_conflict_horizon should remain InvalidTransactionId.  This includes
-	 * the case that we just froze all the tuples; the prune-freeze record
-	 * included the conflict XID already so the caller doesn't need it.
-	 */
-	if (presult->set_all_frozen)
-		presult->vm_conflict_horizon = InvalidTransactionId;
-	else
-		presult->vm_conflict_horizon = prstate.newest_live_xid;
 
 	presult->lpdead_items = prstate.lpdead_items;
 	/* the presult->deadoffsets array was already filled in */
 
+	presult->newly_all_visible = false;
+	presult->newly_all_frozen = false;
+	presult->newly_all_visible_frozen = false;
+	if (do_set_vm)
+	{
+		if ((prstate.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+		{
+			presult->newly_all_visible = true;
+			if (prstate.set_all_frozen)
+				presult->newly_all_visible_frozen = true;
+		}
+		else if ((prstate.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+				 prstate.set_all_frozen)
+			presult->newly_all_frozen = true;
+	}
+
 	if (prstate.attempt_freeze)
 	{
 		if (presult->nfrozen > 0)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 696919e35dd..23deabd8c01 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -466,13 +466,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
 static void dead_items_reset(LVRelState *vacrel);
 static void dead_items_cleanup(LVRelState *vacrel);
 
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
-									 GlobalVisState *vistest,
-									 bool *all_frozen,
-									 TransactionId *newest_live_xid,
-									 OffsetNumber *logging_offnum);
-#endif
 static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
 										   GlobalVisState *vistest,
 										   bool allow_update_vistest,
@@ -2022,8 +2015,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
-	uint8		old_vmbits = 0;
-	uint8		new_vmbits = 0;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2074,32 +2065,6 @@ lazy_scan_prune(LVRelState *vacrel,
 		vacrel->new_frozen_tuple_pages++;
 	}
 
-	/*
-	 * VACUUM will call heap_page_is_all_visible() during the second pass over
-	 * the heap to determine all_visible and all_frozen for the page -- this
-	 * is a specialized version of the logic from this function.  Now that
-	 * we've finished pruning and freezing, make sure that we're in total
-	 * agreement with heap_page_is_all_visible() using an assertion.
-	 */
-#ifdef USE_ASSERT_CHECKING
-	if (presult.set_all_visible)
-	{
-		TransactionId debug_cutoff;
-		bool		debug_all_frozen;
-
-		Assert(presult.lpdead_items == 0);
-
-		Assert(heap_page_is_all_visible(vacrel->rel, buf,
-										vacrel->vistest, &debug_all_frozen,
-										&debug_cutoff, &vacrel->offnum));
-
-		Assert(presult.set_all_frozen == debug_all_frozen);
-
-		Assert(!TransactionIdIsValid(debug_cutoff) ||
-			   debug_cutoff == presult.vm_conflict_horizon);
-	}
-#endif
-
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
@@ -2120,6 +2085,17 @@ lazy_scan_prune(LVRelState *vacrel,
 	}
 
 	/* Finally, add page-local counts to whole-VACUUM counts */
+	if (presult.newly_all_visible)
+		vacrel->new_all_visible_pages++;
+	if (presult.newly_all_visible_frozen)
+		vacrel->new_all_visible_all_frozen_pages++;
+	if (presult.newly_all_frozen)
+		vacrel->new_all_frozen_pages++;
+
+	/* Capture if the page was newly set frozen */
+	*vm_page_frozen = presult.newly_all_visible_frozen ||
+		presult.newly_all_frozen;
+
 	vacrel->tuples_deleted += presult.ndeleted;
 	vacrel->tuples_frozen += presult.nfrozen;
 	vacrel->lpdead_items += presult.lpdead_items;
@@ -2133,71 +2109,6 @@ lazy_scan_prune(LVRelState *vacrel,
 	/* Did we find LP_DEAD items? */
 	*has_lpdead_items = (presult.lpdead_items > 0);
 
-	Assert(!presult.set_all_visible || !(*has_lpdead_items));
-	Assert(!presult.set_all_frozen || presult.set_all_visible);
-
-	if (!presult.set_all_visible)
-		return presult.ndeleted;
-
-	/* Set the visibility map and page visibility hint */
-	old_vmbits = presult.old_vmbits;
-	new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-	if (presult.set_all_frozen)
-		new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
-	/* Nothing to do */
-	if (old_vmbits == new_vmbits)
-		return presult.ndeleted;
-
-	/*
-	 * It should never be the case that the visibility map page is set while
-	 * the page-level bit is clear (and if so, we cleared it above), but the
-	 * reverse is allowed (if checksums are not enabled). Regardless, set both
-	 * bits so that we get back in sync.
-	 *
-	 * The heap buffer must be marked dirty before adding it to the WAL chain
-	 * when setting the VM. We don't worry about unnecessarily dirtying the
-	 * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
-	 * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
-	 * the VM bits clear, so there is no point in optimizing it.
-	 */
-	PageSetAllVisible(page);
-	PageClearPrunable(page);
-	MarkBufferDirty(buf);
-
-	/*
-	 * If the page is being set all-frozen, we pass InvalidTransactionId as
-	 * the cutoff_xid, since a snapshot conflict horizon sufficient to make
-	 * everything safe for REDO was logged when the page's tuples were frozen.
-	 */
-	Assert(!presult.set_all_frozen ||
-		   !TransactionIdIsValid(presult.vm_conflict_horizon));
-
-	visibilitymap_set(vacrel->rel, blkno, buf,
-					  InvalidXLogRecPtr,
-					  vmbuffer, presult.vm_conflict_horizon,
-					  new_vmbits);
-
-	/*
-	 * If the page wasn't already set all-visible and/or all-frozen in the VM,
-	 * count it as newly set for logging.
-	 */
-	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
-	{
-		vacrel->new_all_visible_pages++;
-		if (presult.set_all_frozen)
-		{
-			vacrel->new_all_visible_all_frozen_pages++;
-			*vm_page_frozen = true;
-		}
-	}
-	else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
-			 presult.set_all_frozen)
-	{
-		vacrel->new_all_frozen_pages++;
-		*vm_page_frozen = true;
-	}
-
 	return presult.ndeleted;
 }
 
@@ -3613,7 +3524,7 @@ dead_items_cleanup(LVRelState *vacrel)
  * that expect no LP_DEAD on the page. Currently assert-only, but there is no
  * reason not to use it outside of asserts.
  */
-static bool
+bool
 heap_page_is_all_visible(Relation rel, Buffer buf,
 						 GlobalVisState *vistest,
 						 bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ca5e8d1794f..0ab322bf58b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -265,7 +265,8 @@ typedef struct PruneFreezeParams
 
 	/*
 	 * Callers should provide a pinned vmbuffer corresponding to the heap
-	 * block in buffer. We will check for and repair any corruption in the VM.
+	 * block in buffer. We will check for and repair any corruption in the VM
+	 * and set the VM after pruning if the page is all-visible/all-frozen.
 	 */
 	Buffer		vmbuffer;
 
@@ -281,8 +282,7 @@ typedef struct PruneFreezeParams
 	 * HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
 	 * LP_UNUSED during pruning.
 	 *
-	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
-	 * will return 'all_visible', 'all_frozen' flags to the caller.
+	 * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
 	 */
 	int			options;
 
@@ -316,26 +316,12 @@ typedef struct PruneFreezeResult
 	int			recently_dead_tuples;
 
 	/*
-	 * set_all_visible and set_all_frozen indicate if the all-visible and
-	 * all-frozen bits in the visibility map should be set for this page after
-	 * pruning.
-	 *
-	 * vm_conflict_horizon is the newest xmin of live tuples on the page.  The
-	 * caller can use it as the conflict horizon when setting the VM bits.  It
-	 * is only valid if we froze some tuples (nfrozen > 0), and set_all_frozen
-	 * is true.
-	 *
-	 * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
-	 */
-	bool		set_all_visible;
-	bool		set_all_frozen;
-	TransactionId vm_conflict_horizon;
-
-	/*
-	 * The value of the vmbuffer's vmbits at the beginning of pruning. It is
-	 * cleared if VM corruption is found and corrected.
+	 * Whether or not the page was newly set all-visible and all-frozen during
+	 * phase I of vacuuming.
 	 */
-	uint8		old_vmbits;
+	bool		newly_all_visible;
+	bool		newly_all_visible_frozen;
+	bool		newly_all_frozen;
 
 	/*
 	 * Whether or not the page makes rel truncation unsafe.  This is set to
@@ -472,6 +458,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 /* in heap/vacuumlazy.c */
 extern void heap_vacuum_rel(Relation rel,
 							const VacuumParams params, BufferAccessStrategy bstrategy);
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+									 GlobalVisState *vistest,
+									 bool *all_frozen,
+									 TransactionId *newest_live_xid,
+									 OffsetNumber *logging_offnum);
+#endif
 
 /* in heap/heapam_visibility.c */
 extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
-- 
2.43.0



  [text/x-patch] v44-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch (5.7K, 5-v44-0004-WAL-log-VM-setting-for-empty-pages-in-XLOG_HEAP2.patch)
  download | inline diff:
From 134e19504883dc3b07c506332dd23533e381b699 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v44 04/10] WAL log VM setting for empty pages in
 XLOG_HEAP2_PRUNE_VACUUM_SCAN

As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.

This has no independent benefit, but empty pages were the last user of
XLOG_HEAP2_VISIBLE, so by making this change we can next remove all of
the XLOG_HEAP2_VISIBLE code.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Earlier version Reviewed-by: Robert Haas <[email protected]>
---
 src/backend/access/heap/pruneheap.c  | 29 +++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 44 +++++++++++++++++-----------
 2 files changed, 45 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4d6d5e92773..fe9564b26c7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -2541,6 +2541,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	uint8		info;
 	uint8		regbuf_flags_heap;
 
+	Page		heap_page = BufferGetPage(buffer);
+
 	/* The following local variables hold data registered in the WAL record: */
 	xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
 	xlhp_freeze_plans freeze_plans;
@@ -2559,14 +2561,18 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	/*
 	 * We can avoid an FPI of the heap page if the only modification we are
 	 * making to it is to set PD_ALL_VISIBLE and checksums/wal_log_hints are
-	 * disabled. Note that if we explicitly skip an FPI, we must not stamp the
-	 * heap page with this record's LSN. Recovery skips records <= the stamped
-	 * LSN, so this could lead to skipping an earlier FPI needed to repair a
-	 * torn page.
+	 * disabled.
+	 *
+	 * However, if the page has never been WAL-logged (LSN is invalid), we
+	 * must force an FPI regardless.  This can happen when another backend
+	 * extends the heap, initializes the page, and then fails before WAL-
+	 * logging it.  Since heap extension is not WAL-logged, recovery might try
+	 * to replay our record and find that the page isn't initialized, which
+	 * would cause a PANIC.
 	 */
-	if (!do_prune &&
-		nfrozen == 0 &&
-		(!do_set_vm || !XLogHintBitIsNeeded()))
+	if (!XLogRecPtrIsValid(PageGetLSN(heap_page)))
+		regbuf_flags_heap |= REGBUF_FORCE_IMAGE;
+	else if (!do_prune && nfrozen == 0 && (!do_set_vm || !XLogHintBitIsNeeded()))
 		regbuf_flags_heap |= REGBUF_NO_IMAGE;
 
 	/*
@@ -2681,12 +2687,13 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
 	}
 
 	/*
-	 * See comment at the top of the function about regbuf_flags_heap for
-	 * details on when we can advance the page LSN.
+	 * If we explicitly skip an FPI, we must not stamp the heap page with this
+	 * record's LSN. Recovery skips records <= the stamped LSN, so this could
+	 * lead to skipping an earlier FPI needed to repair a torn page.
 	 */
-	if (do_prune || nfrozen > 0 || (do_set_vm && XLogHintBitIsNeeded()))
+	if (!(regbuf_flags_heap & REGBUF_NO_IMAGE))
 	{
 		Assert(BufferIsDirty(buffer));
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		PageSetLSN(heap_page, recptr);
 	}
 }
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 23deabd8c01..63e6199241c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1929,33 +1929,43 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 		 */
 		if (!PageIsAllVisible(page))
 		{
+			/* Lock vmbuffer before entering critical section */
+			LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
 			START_CRIT_SECTION();
 
 			/* mark buffer dirty before writing a WAL record */
 			MarkBufferDirty(buf);
 
+			PageSetAllVisible(page);
+			PageClearPrunable(page);
+			visibilitymap_set_vmbits(blkno,
+									 vmbuffer,
+									 VISIBILITYMAP_ALL_VISIBLE |
+									 VISIBILITYMAP_ALL_FROZEN,
+									 vacrel->rel->rd_locator);
+
 			/*
-			 * It's possible that another backend has extended the heap,
-			 * initialized the page, and then failed to WAL-log the page due
-			 * to an ERROR.  Since heap extension is not WAL-logged, recovery
-			 * might try to replay our record setting the page all-visible and
-			 * find that the page isn't initialized, which will cause a PANIC.
-			 * To prevent that, check whether the page has been previously
-			 * WAL-logged, and if not, do that now.
+			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+			 * setting the VM.
 			 */
-			if (RelationNeedsWAL(vacrel->rel) &&
-				!XLogRecPtrIsValid(PageGetLSN(page)))
-				log_newpage_buffer(buf, true);
+			if (RelationNeedsWAL(vacrel->rel))
+				log_heap_prune_and_freeze(vacrel->rel, buf,
+										  vmbuffer,
+										  VISIBILITYMAP_ALL_VISIBLE |
+										  VISIBILITYMAP_ALL_FROZEN,
+										  InvalidTransactionId, /* conflict xid */
+										  false,	/* cleanup lock */
+										  PRUNE_VACUUM_SCAN,	/* reason */
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0,
+										  NULL, 0);
 
-			PageSetAllVisible(page);
-			PageClearPrunable(page);
-			visibilitymap_set(vacrel->rel, blkno, buf,
-							  InvalidXLogRecPtr,
-							  vmbuffer, InvalidTransactionId,
-							  VISIBILITYMAP_ALL_VISIBLE |
-							  VISIBILITYMAP_ALL_FROZEN);
 			END_CRIT_SECTION();
 
+			LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
 			/* Count the newly all-frozen pages for logging */
 			vacrel->new_all_visible_pages++;
 			vacrel->new_all_visible_all_frozen_pages++;
-- 
2.43.0



  [text/x-patch] v44-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (27.6K, 6-v44-0005-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
  download | inline diff:
From 42b15a654c87c36aadd1768f3f2fc915019ee44c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v44 05/10] Remove XLOG_HEAP2_VISIBLE entirely

There are no remaining users that emit XLOG_HEAP2_VISIBLE records, so it
can be removed. This includes deleting the xl_heap_visible struct and
all functions responsible for emitting or replaying XLOG_HEAP2_VISIBLE
records.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/access/common/bufmask.c      |   5 +-
 src/backend/access/heap/heapam.c         |  54 +-------
 src/backend/access/heap/heapam_xlog.c    | 156 ++---------------------
 src/backend/access/heap/pruneheap.c      |   4 +-
 src/backend/access/heap/vacuumlazy.c     |  16 +--
 src/backend/access/heap/visibilitymap.c  | 151 +++++-----------------
 src/backend/access/rmgrdesc/heapdesc.c   |  10 --
 src/backend/replication/logical/decode.c |   1 -
 src/backend/storage/ipc/standby.c        |   9 +-
 src/include/access/heapam_xlog.h         |  21 +--
 src/include/access/visibilitymap.h       |  13 +-
 src/include/access/visibilitymapdefs.h   |   9 --
 src/tools/pgindent/typedefs.list         |   1 -
 13 files changed, 64 insertions(+), 386 deletions(-)

diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 8a67bfa1aff..f32e3911a57 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -55,9 +55,8 @@ mask_page_hint_bits(Page page)
 	PageClearHasFreeLinePointers(page);
 
 	/*
-	 * During replay, if the page LSN has advanced past our XLOG record's LSN,
-	 * we don't mark the page all-visible. See heap_xlog_visible() for
-	 * details.
+	 * PD_ALL_VISIBLE is masked during WAL consistency checking. It is worth
+	 * investigating if we could stop doing this.
 	 */
 	PageClearAllVisible(page);
 }
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..044f385e477 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2589,11 +2589,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 relation->rd_locator);
+			visibilitymap_set(BufferGetBlockNumber(buffer),
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  relation->rd_locator);
 		}
 
 		/*
@@ -8886,50 +8886,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
 	return nblocksfavorable;
 }
 
-/*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible.  REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
-				 TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
-	xl_heap_visible xlrec;
-	XLogRecPtr	recptr;
-	uint8		flags;
-
-	Assert(BufferIsValid(heap_buffer));
-	Assert(BufferIsValid(vm_buffer));
-
-	xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
-	xlrec.flags = vmflags;
-	if (RelationIsAccessibleInLogicalDecoding(rel))
-		xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
-	XLogBeginInsert();
-	XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
-	XLogRegisterBuffer(0, vm_buffer, 0);
-
-	flags = REGBUF_STANDARD;
-	if (!XLogHintBitIsNeeded())
-		flags |= REGBUF_NO_IMAGE;
-	XLogRegisterBuffer(1, heap_buffer, flags);
-
-	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
-	return recptr;
-}
-
 /*
  * Perform XLogInsert for a heap-update operation.  Caller must already
  * have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1da774c1536..1302bb13e18 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -239,7 +239,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+		visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -252,143 +252,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
 		XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
 }
 
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear.  If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
-	XLogRecPtr	lsn = record->EndRecPtr;
-	xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
-	Buffer		vmbuffer = InvalidBuffer;
-	Buffer		buffer;
-	Page		page;
-	RelFileLocator rlocator;
-	BlockNumber blkno;
-	XLogRedoAction action;
-
-	Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
-	XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
-	/*
-	 * If there are any Hot Standby transactions running that have an xmin
-	 * horizon old enough that this page isn't all-visible for them, they
-	 * might incorrectly decide that an index-only scan can skip a heap fetch.
-	 *
-	 * NB: It might be better to throw some kind of "soft" conflict here that
-	 * forces any index-only scan that is in flight to perform heap fetches,
-	 * rather than killing the transaction outright.
-	 */
-	if (InHotStandby)
-		ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
-											xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
-											rlocator);
-
-	/*
-	 * Read the heap page, if it still exists. If the heap file has dropped or
-	 * truncated later in recovery, we don't need to update the page, but we'd
-	 * better still update the visibility map.
-	 */
-	action = XLogReadBufferForRedo(record, 1, &buffer);
-	if (action == BLK_NEEDS_REDO)
-	{
-		/*
-		 * We don't bump the LSN of the heap page when setting the visibility
-		 * map bit (unless checksums or wal_hint_bits is enabled, in which
-		 * case we must). This exposes us to torn page hazards, but since
-		 * we're not inspecting the existing page contents in any way, we
-		 * don't care.
-		 */
-		page = BufferGetPage(buffer);
-
-		PageSetAllVisible(page);
-		PageClearPrunable(page);
-
-		if (XLogHintBitIsNeeded())
-			PageSetLSN(page, lsn);
-
-		MarkBufferDirty(buffer);
-	}
-	else if (action == BLK_RESTORED)
-	{
-		/*
-		 * If heap block was backed up, we already restored it and there's
-		 * nothing more to do. (This can only happen with checksums or
-		 * wal_log_hints enabled.)
-		 */
-	}
-
-	if (BufferIsValid(buffer))
-	{
-		Size		space = PageGetFreeSpace(BufferGetPage(buffer));
-
-		UnlockReleaseBuffer(buffer);
-
-		/*
-		 * Since FSM is not WAL-logged and only updated heuristically, it
-		 * easily becomes stale in standbys.  If the standby is later promoted
-		 * and runs VACUUM, it will skip updating individual free space
-		 * figures for pages that became all-visible (or all-frozen, depending
-		 * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
-		 * propagates too optimistic free space values to upper FSM layers;
-		 * later inserters try to use such pages only to find out that they
-		 * are unusable.  This can cause long stalls when there are many such
-		 * pages.
-		 *
-		 * Forestall those problems by updating FSM's idea about a page that
-		 * is becoming all-visible or all-frozen.
-		 *
-		 * Do this regardless of a full-page image being applied, since the
-		 * FSM data is not in the page anyway.
-		 */
-		if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
-			XLogRecordPageWithFreeSpace(rlocator, blkno, space);
-	}
-
-	/*
-	 * Even if we skipped the heap page update due to the LSN interlock, it's
-	 * still safe to update the visibility map.  Any WAL record that clears
-	 * the visibility map bit does so before checking the page LSN, so any
-	 * bits that need to be cleared will still be cleared.
-	 */
-	if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
-									  &vmbuffer) == BLK_NEEDS_REDO)
-	{
-		Page		vmpage = BufferGetPage(vmbuffer);
-		Relation	reln;
-		uint8		vmbits;
-
-		/* initialize the page if it was read as zeros */
-		if (PageIsNew(vmpage))
-			PageInit(vmpage, BLCKSZ, 0);
-
-		/* remove VISIBILITYMAP_XLOG_* */
-		vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
-		/*
-		 * XLogReadBufferForRedoExtended locked the buffer. But
-		 * visibilitymap_set will handle locking itself.
-		 */
-		LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
-		reln = CreateFakeRelcacheEntry(rlocator);
-
-		visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
-						  xlrec->snapshotConflictHorizon, vmbits);
-
-		ReleaseBuffer(vmbuffer);
-		FreeFakeRelcacheEntry(reln);
-	}
-	else if (BufferIsValid(vmbuffer))
-		UnlockReleaseBuffer(vmbuffer);
-}
-
 /*
  * Given an "infobits" field from an XLog record, set the correct bits in the
  * given infomask and infomask2 for the tuple touched by the record.
@@ -769,8 +632,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 *
 	 * During recovery, however, no concurrent writers exist. Therefore,
 	 * updating the VM without holding the heap page lock is safe enough. This
-	 * same approach is taken when replaying xl_heap_visible records (see
-	 * heap_xlog_visible()).
+	 * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+	 * heap_xlog_prune_freeze()).
 	 */
 	if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
 		XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -782,11 +645,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (PageIsNew(vmpage))
 			PageInit(vmpage, BLCKSZ, 0);
 
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer,
-								 VISIBILITYMAP_ALL_VISIBLE |
-								 VISIBILITYMAP_ALL_FROZEN,
-								 rlocator);
+		visibilitymap_set(blkno,
+						  vmbuffer,
+						  VISIBILITYMAP_ALL_VISIBLE |
+						  VISIBILITYMAP_ALL_FROZEN,
+						  rlocator);
 
 		Assert(BufferIsDirty(vmbuffer));
 		PageSetLSN(vmpage, lsn);
@@ -1369,9 +1232,6 @@ heap2_redo(XLogReaderState *record)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			heap_xlog_prune_freeze(record);
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			heap_xlog_visible(record);
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			heap_xlog_multi_insert(record);
 			break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fe9564b26c7..fc5345e1dff 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1252,8 +1252,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 			 */
 			PageSetAllVisible(prstate.page);
 			PageClearPrunable(prstate.page);
-			visibilitymap_set_vmbits(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
-									 prstate.relation->rd_locator);
+			visibilitymap_set(prstate.block, prstate.vmbuffer, prstate.new_vmbits,
+							  prstate.relation->rd_locator);
 		}
 
 		MarkBufferDirty(prstate.buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 63e6199241c..f698c2d899b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1939,11 +1939,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
-			visibilitymap_set_vmbits(blkno,
-									 vmbuffer,
-									 VISIBILITYMAP_ALL_VISIBLE |
-									 VISIBILITYMAP_ALL_FROZEN,
-									 vacrel->rel->rd_locator);
+			visibilitymap_set(blkno,
+							  vmbuffer,
+							  VISIBILITYMAP_ALL_VISIBLE |
+							  VISIBILITYMAP_ALL_FROZEN,
+							  vacrel->rel->rd_locator);
 
 			/*
 			 * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2821,9 +2821,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
 		 */
 		PageSetAllVisible(page);
 		PageClearPrunable(page);
-		visibilitymap_set_vmbits(blkno,
-								 vmbuffer, vmflags,
-								 vacrel->rel->rd_locator);
+		visibilitymap_set(blkno,
+						  vmbuffer, vmflags,
+						  vacrel->rel->rd_locator);
 		conflict_xid = newest_live_xid;
 	}
 
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index e21b96281a6..4fd470702aa 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
  *		visibilitymap_clear  - clear bits for one page in the visibility map
  *		visibilitymap_pin	 - pin a map page for setting a bit
  *		visibilitymap_pin_ok - check whether correct map page is already pinned
- *		visibilitymap_set	 - set bit(s) in a previously pinned page and log
- *		visibilitymap_set_vmbits - set bit(s) in a pinned page
+ *		visibilitymap_set	 - set bit(s) in a previously pinned page
  *		visibilitymap_get_status - get status of bits
  *		visibilitymap_count  - count number of bits set in visibility map
  *		visibilitymap_prepare_truncate -
@@ -35,21 +34,32 @@
  * is set, we know the condition is true, but if a bit is not set, it might or
  * might not be true.
  *
- * Clearing visibility map bits is not separately WAL-logged.  The callers
- * must make sure that whenever a bit is cleared, the bit is cleared on WAL
- * replay of the updating operation as well.
- *
- * When we *set* a visibility map during VACUUM, we must write WAL.  This may
- * seem counterintuitive, since the bit is basically a hint: if it is clear,
- * it may still be the case that every tuple on the page is visible to all
- * transactions; we just don't know that for certain.  The difficulty is that
- * there are two bits which are typically set together: the PD_ALL_VISIBLE bit
- * on the page itself, and the visibility map bit.  If a crash occurs after the
- * visibility map page makes it to disk and before the updated heap page makes
- * it to disk, redo must set the bit on the heap page.  Otherwise, the next
- * insert, update, or delete on the heap page will fail to realize that the
- * visibility map bit must be cleared, possibly causing index-only scans to
- * return wrong answers.
+ * Changes to the visibility map bits are not separately WAL-logged. Callers
+ * must make sure that whenever a visibility map bit is cleared, the bit is
+ * cleared on WAL replay of the updating operation. And whenever a visibility
+ * map bit is set, the bit is set on WAL replay of the operation that rendered
+ * the page all-visible/all-frozen.
+ *
+ * The visibility map bits operate as a hint in one direction: if they are
+ * clear, it may still be the case that every tuple on the page is visible to
+ * all transactions (we just don't know that for certain). However, if they
+ * are set, we may skip vacuuming pages and advance relfrozenxid or skip
+ * reading heap pages for an index-only scan. If they are incorrectly set,
+ * this can lead to data corruption and wrong results.
+ *
+ * Additionally, it is critical that the heap-page level PD_ALL_VISIBLE bit be
+ * correctly set and cleared along with the VM bits.
+ *
+ * When clearing the VM, if a crash occurs after the heap page makes it to
+ * disk but before the VM page makes it to disk, replay must clear the VM or
+ * the next index-only scan can return wrong results or vacuum may incorrectly
+ * advance relfrozenxid.
+ *
+ * When setting the VM, if a crash occurs after the visibility map page makes
+ * it to disk and before the updated heap page makes it to disk, redo must set
+ * the bit on the heap page. Otherwise, the next insert, update, or delete on
+ * the heap page will fail to realize that the visibility map bit must be
+ * cleared, possibly causing index-only scans to return wrong answers.
  *
  * VACUUM will normally skip pages for which the visibility map bit is set;
  * such pages can't contain any dead tuples and therefore don't need vacuuming.
@@ -222,112 +232,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
 	return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
 }
 
-/*
- *	visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running.  The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below).  cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples.  It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
-				  XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
-				  uint8 flags)
-{
-	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
-	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
-	uint8		mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
-	Page		page;
-	uint8	   *map;
-	uint8		status;
-
-#ifdef TRACE_VISIBILITYMAP
-	elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
-		 flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
-	Assert(InRecovery || !XLogRecPtrIsValid(recptr));
-	Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
-	Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
-	/* Must never set all_frozen bit without also setting all_visible bit */
-	Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
-	/* Check that we have the right heap page pinned, if present */
-	if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
-		elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
-	Assert(!BufferIsValid(heapBuf) ||
-		   BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
-	/* Check that we have the right VM page pinned */
-	if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
-		elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
-	page = BufferGetPage(vmBuf);
-	map = (uint8 *) PageGetContents(page);
-	LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
-	status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
-	if (flags != status)
-	{
-		START_CRIT_SECTION();
-
-		map[mapByte] |= (flags << mapOffset);
-		MarkBufferDirty(vmBuf);
-
-		if (RelationNeedsWAL(rel))
-		{
-			if (!XLogRecPtrIsValid(recptr))
-			{
-				Assert(!InRecovery);
-				recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
-				/*
-				 * If data checksums are enabled (or wal_log_hints=on), we
-				 * need to protect the heap page from being torn.
-				 *
-				 * If not, then we must *not* update the heap page's LSN. In
-				 * this case, the FPI for the heap page was omitted from the
-				 * WAL record inserted above, so it would be incorrect to
-				 * update the heap page's LSN.
-				 */
-				if (XLogHintBitIsNeeded())
-				{
-					Page		heapPage = BufferGetPage(heapBuf);
-
-					PageSetLSN(heapPage, recptr);
-				}
-			}
-			PageSetLSN(page, recptr);
-		}
-
-		END_CRIT_SECTION();
-	}
-
-	LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
 /*
  * Set VM (visibility map) flags in the VM block in vmBuf.
  *
  * This function is intended for callers that log VM changes together
  * with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
  *
  * vmBuf must be pinned and exclusively locked, and it must cover the VM bits
  * corresponding to heapBlk.
@@ -343,9 +252,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
  * rlocator is used only for debugging messages.
  */
 void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
-						 Buffer vmBuf, uint8 flags,
-						 const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+				  Buffer vmBuf, uint8 flags,
+				  const RelFileLocator rlocator)
 {
 	BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
 	uint32		mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 			}
 		}
 	}
-	else if (info == XLOG_HEAP2_VISIBLE)
-	{
-		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
-		appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
-						 xlrec->snapshotConflictHorizon, xlrec->flags);
-	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
 			id = "PRUNE_VACUUM_CLEANUP";
 			break;
-		case XLOG_HEAP2_VISIBLE:
-			id = "VISIBLE";
-			break;
 		case XLOG_HEAP2_MULTI_INSERT:
 			id = "MULTI_INSERT";
 			break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 21f03864a66..3c027bcb2f7 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -448,7 +448,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 		case XLOG_HEAP2_PRUNE_ON_ACCESS:
 		case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
 		case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
-		case XLOG_HEAP2_VISIBLE:
 		case XLOG_HEAP2_LOCK_UPDATED:
 			break;
 		default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index f3ad90c7c7a..de9092fdf5b 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -476,10 +476,11 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
 	/*
 	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
 	 *
-	 * This can happen when replaying already-applied WAL records after a
-	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
-	 * record that marks as frozen a page which was already all-visible.  It's
-	 * also quite common with records generated during index deletion
+	 * This can happen whenever the changes in the WAL record do not affect
+	 * visibility on a standby. For example: a record that only freezes an
+	 * xmax from a locker.
+	 *
+	 * It's also quite common with records generated during index deletion
 	 * (original execution of the deletion can reason that a recovery conflict
 	 * which is sufficient for the deletion operation must take place before
 	 * replay of the deletion record itself).
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..516806fcca2 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,7 @@
 #define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
 #define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
 #define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30
-#define XLOG_HEAP2_VISIBLE		0x40
+/* 0x40 was XLOG_HEAP2_VISIBLE */
 #define XLOG_HEAP2_MULTI_INSERT 0x50
 #define XLOG_HEAP2_LOCK_UPDATED 0x60
 #define XLOG_HEAP2_NEW_CID		0x70
@@ -443,20 +443,6 @@ typedef struct xl_heap_inplace
 
 #define MinSizeOfHeapInplace	(offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
 
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
-	TransactionId snapshotConflictHorizon;
-	uint8		flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
 typedef struct xl_heap_new_cid
 {
 	/*
@@ -500,11 +486,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
 extern const char *heap2_identify(uint8 info);
 extern void heap_xlog_logical_rewrite(XLogReaderState *r);
 
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
-								   Buffer vm_buffer,
-								   TransactionId snapshotConflictHorizon,
-								   uint8 vmflags);
-
 /* in heapdesc.c, so it can be shared between frontend/backend code */
 extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
 												   int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 52cde56be86..e4e0cfa989e 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
 #define VISIBILITYMAP_H
 
 #include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
 #include "storage/block.h"
 #include "storage/buf.h"
 #include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
 extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
 							  Buffer *vmbuf);
 extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
-							  BlockNumber heapBlk, Buffer heapBuf,
-							  XLogRecPtr recptr,
-							  Buffer vmBuf,
-							  TransactionId cutoff_xid,
-							  uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
-									 Buffer vmBuf, uint8 flags,
-									 const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+							  Buffer vmBuf, uint8 flags,
+							  const RelFileLocator rlocator);
 extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
 extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
 extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
 #define VISIBILITYMAP_ALL_FROZEN	0x02
 #define VISIBILITYMAP_VALID_BITS	0x03	/* OR of all valid visibilitymap
 											 * flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL	0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS	(VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
 
 #endif							/* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 0c07c945f05..adc858c2a97 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4421,7 +4421,6 @@ xl_heap_prune
 xl_heap_rewrite_mapping
 xl_heap_truncate
 xl_heap_update
-xl_heap_visible
 xl_invalid_page
 xl_invalid_page_key
 xl_invalidations
-- 
2.43.0



  [text/x-patch] v44-0006-Track-which-relations-are-modified-by-a-query.patch (6.4K, 7-v44-0006-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 8f637aeb39efe65e629f616fbf4362ce9476ea1a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v44 06/10] Track which relations are modified by a query

Save the OIDs of modified relations in a list in the PlannedStmt. A
later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map. Setting
the visibility map during a scan is counterproductive if the query is
going to modify the page immediately after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this list is only used as a hint
to avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 ++
 src/backend/executor/nodeModifyTable.c |  9 ++++++
 src/backend/optimizer/plan/planner.c   | 44 +++++++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++
 5 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..5c1cf51d71c 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelOids = estate->es_plannedstmt->modifiedRelOids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..49b55d15e3e 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+							   RelationGetRelid(erm->relation)));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..12ecdd383cc 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,9 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelationDesc)));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1526,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelInfo->ri_RelationDesc)));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2211,9 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(list_member_oid(estate->es_plannedstmt->modifiedRelOids,
+						   RelationGetRelid(resultRelInfo->ri_RelationDesc)));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..039796773a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	List	   *modifiedRelOids = NIL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,46 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelOids from result relations and row marks.
+	 *
+	 * This is a superset of what the executor will actually modify/lock at
+	 * runtime, because runtime partition pruning may eliminate some result
+	 * relations, and parent row marks are included here but skipped by the
+	 * executor.
+	 *
+	 * For partitioned tables, modifiedRelOids is expanded to include all
+	 * descendant partition OIDs. This is necessary because tuple routing
+	 * lazily expands leaf partitions at execution time.
+	 */
+	foreach(lc, glob->resultRelations)
+	{
+		Index		rti = lfirst_int(lc);
+		RangeTblEntry *rte = rt_fetch(rti, glob->finalrtable);
+
+		modifiedRelOids = list_append_unique_oid(modifiedRelOids, rte->relid);
+
+		if (rte->relkind == RELKIND_PARTITIONED_TABLE)
+		{
+			List	   *children = find_all_inheritors(rte->relid,
+													   NoLock, NULL);
+			ListCell   *lc2;
+
+			foreach(lc2, children)
+				modifiedRelOids = list_append_unique_oid(modifiedRelOids,
+														 lfirst_oid(lc2));
+		}
+	}
+	foreach(lc, glob->finalrowmarks)
+	{
+		PlanRowMark *rc = (PlanRowMark *) lfirst(lc);
+		RangeTblEntry *rte = rt_fetch(rc->rti, glob->finalrtable);
+
+		modifiedRelOids = list_append_unique_oid(modifiedRelOids, rte->relid);
+	}
+	result->modifiedRelOids = modifiedRelOids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..6a7008cd50a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * OIDs of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	List	   *modifiedRelOids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v44-0007-Thread-flags-through-begin-scan-APIs.patch (32.9K, 8-v44-0007-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 134bdfa0fa25bb74c954c179c88d7cc38ca14c56 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v44 07/10] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  7 ++-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 21 +++----
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  8 +--
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  2 +-
 src/backend/executor/nodeIndexonlyscan.c  |  5 +-
 src/backend/executor/nodeIndexscan.c      |  6 +-
 src/backend/executor/nodeSamplescan.c     |  2 +-
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  6 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  4 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 72 +++++++++++++++--------
 26 files changed, 117 insertions(+), 75 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..0556e9f7b88 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..b25e814a996 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..555b16771e9 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..66726b22de6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -80,11 +80,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -762,7 +763,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -771,7 +774,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 5eb7e99ad3e..63d5daadca6 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -255,6 +255,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -284,7 +285,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -593,7 +594,7 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -615,7 +616,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..016a5e546dd 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared), 0);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..7a12e808b07 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..29d7c3514b6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 219f604df7b..ec9bbfe554a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,7 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
+	scan = table_beginscan(rel, snapshot, 0, NULL, 0);
 
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
@@ -22882,7 +22882,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23346,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..5316cea7cec 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..23509771557 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..324e2bed22c 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL, 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f733be0220c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -794,7 +795,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +861,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..1a101df492b 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1730,7 +1732,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1796,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan, 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cc6b23abee0 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,7 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode, 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..c2d9b7293de 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,7 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL, 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..994f70989bc 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,7 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid, 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +460,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +494,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b98c20a0edc 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 0ab322bf58b..47cbf2a20cf 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8357d05d83b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +910,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +956,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +975,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1078,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1159,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1170,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v44-0008-Pass-down-information-on-table-modification-to-s.patch (11.8K, 9-v44-0008-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 8bdd2f0a153abc23cc7c87eab00255b95a05446c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v44 08/10] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  6 +++++-
 src/backend/executor/nodeIndexonlyscan.c  | 15 ++++++++++++---
 src/backend/executor/nodeIndexscan.c      | 18 ++++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 18 +++++++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..ff05eca3a61 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !list_member_oid(ss->ps.state->es_plannedstmt->modifiedRelOids,
+							RelationGetRelid(ss->ss_currentRelation));
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 324e2bed22c..aec92c868ac 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,11 +144,15 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL, 0);
+							   NULL,
+							   flags);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f733be0220c..de9db45322c 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -795,7 +798,10 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -861,7 +867,10 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
+
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 1a101df492b..9df4a699504 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1732,7 +1738,9 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1796,7 +1804,9 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan, 0);
+								 piscan,
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cc6b23abee0..71c70e5e5c7 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,13 +292,16 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode, 0);
+									 scanstate->use_pagemode, flags);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index c2d9b7293de..79470e6b9b5 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,17 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL, 0);
+								   0, NULL, flags);
+
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -368,14 +372,18 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 flags);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +413,12 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan, 0);
+		table_beginscan_parallel(node->ss.ss_currentRelation,
+								 pscan,
+								 flags);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 994f70989bc..4257afd96ed 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,10 +242,13 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid, 0);
+												&node->trss_maxtid, flags);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -452,15 +455,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
 
 /* ----------------------------------------------------------------
@@ -490,9 +496,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0);
+										  pscan, flags);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8357d05d83b..487e38292fa 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v44-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 10-v44-0009-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 273a8d32ec46ac37286b1952f164b6290cee6c66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v44 09/10] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 66726b22de6..651efa0127a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -148,7 +148,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2545,7 +2546,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index fc5345e1dff..36260897503 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 47cbf2a20cf..bfc7d482827 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v44-0010-Set-pd_prune_xid-on-insert.patch (8.8K, 11-v44-0010-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 05279abac3bf623887c1e4883d360116ad5538b0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v44 10/10] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 40 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..ba11bbc03a5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM. We also don't set it in
+		 * bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4153,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 36260897503..4e21c6f94ea 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-24 06:53                                   ` Kirill Reshke <[email protected]>
  1 sibling, 0 replies; 34+ messages in thread

From: Kirill Reshke @ 2026-03-24 06:53 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, 24 Mar 2026 at 02:54, Melanie Plageman
<[email protected]> wrote:
>
> On Sun, Mar 22, 2026 at 3:58 PM Melanie Plageman
> <[email protected]> wrote:
> >
> > I've pushed the first two patches. Attached are the remaining 10. No
> > changes were made to those from the previous version.
>
> I'm planning on pushing 0001-0005 in the morning.
>

Thanks for taking care. I think it would be good to get WAL volume
reduction in v19 from 0004 & 0005. lgtm


-- 
Best regards,
Kirill Reshke





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-24 17:53                                   ` Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Andres Freund @ 2026-03-24 17:53 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Hi,

On 2026-03-23 17:54:13 -0400, Melanie Plageman wrote:
> I've made some significant changes to 0006 and realized I need some
> help. 0006 tracks what relations are modified by a query. This new
> version (v44) uses relation oids instead of rt indexes to handle cases
> where the same relation appears more than once in the range table
> (e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
> computes modifiedRelOids (a list of relation OIDs modified by the
> query) in the planner and stores them in the PlannedStmt. There is one
> big issue I'm not sure how to solve:

I'm not entirely sure this is something we need to catch and therefore not
sure that modifiedRelOids is worth the trouble over just having the RT
indexes.


> For queries like INSERT INTO ptable SELECT * FROM ptable, where ptable
> is a partitioned table, though we scan ptable, we don't know when
> executing that scan that we will then modify ptable with the insert.

But does that matter? If such a query inserts a meaningful amount of rows it's
going to insert into different pages than the ones you selected from?


> In my patch, I've added find_all_inheritors() when populating
> modifiedRelOids, but I realize this probably isn't acceptable to add
> to planner from a performance perspective.

Agreed.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
@ 2026-03-24 23:44                                     ` Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-24 23:44 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Attached v45 is what remains in the patchset (I committed the rest).

On Tue, Mar 24, 2026 at 1:53 PM Andres Freund <[email protected]> wrote:
>
> On 2026-03-23 17:54:13 -0400, Melanie Plageman wrote:
> > I've made some significant changes to 0006 and realized I need some
> > help. 0006 tracks what relations are modified by a query. This new
> > version (v44) uses relation oids instead of rt indexes to handle cases
> > where the same relation appears more than once in the range table
> > (e.g. INSERT INTO foo SELECT * FROM foo; foo appears twice). It
> > computes modifiedRelOids (a list of relation OIDs modified by the
> > query) in the planner and stores them in the PlannedStmt. There is one
> > big issue I'm not sure how to solve:
>
> I'm not entirely sure this is something we need to catch and therefore not
> sure that modifiedRelOids is worth the trouble over just having the RT
> indexes.

Do you see the disadvantage of saving the oids as the space? I guess
it is also worse (from a semantic perspective) to use oids if the set
is incomplete—for example, because of the insert into leaf partition
case. If they are RT indexes, then it is accurate to say that it
includes all RT indexes for modified rels.

For INSERT INTO foo SELECT * FROM foo, if the pages are mostly full,
setting pages all-visible during the scan won't hurt because we will
insert at the end of the table. And if there is freespace throughout,
we won't do on-access pruning. So, I actually don't think we could end
up setting and unsetting the VM for every page.

In v45, I've gone back to RT indexes.

- Melanie


Attachments:

  [text/x-patch] v45-0001-Track-which-relations-are-modified-by-a-query.patch (5.9K, 2-v45-0001-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 67acc2e3ea3a3227e68a85c500ec8104a8c5b812 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v45 1/5] Track which relations are modified by a query

Save the range table indexes of modified relations in a Bitmapset in the
PlannedStmt. A later commit will use this information during scans to
control whether or not on-access pruning is allowed to set the
visibility map. Setting the visibility map during a scan is
counterproductive if the query is going to modify the page immediately
after.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE). All row mark types are included, even those which
don't actually modify tuples, because this set is only used as a hint to
avoid unnecessary work.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 +++
 src/backend/executor/nodeModifyTable.c | 14 ++++++++++++++
 src/backend/optimizer/plan/planner.c   | 21 ++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++++
 5 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..38a43315f11 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..4c64589b421 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,14 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * If this is a leaf partition we just found, it won't have a valid range
+	 * table index.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1531,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2216,9 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..86dea1c9cb8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.
+	 *
+	 * This isn't exactly what the executor will actually modify/lock at
+	 * runtime. Runtime partition pruning may eliminate some result relations
+	 * and parent row marks included here may be skipped by the executor.
+	 * Conversely, leaf partitions whose result relations are created at the
+	 * time of insert are not included here.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		modifiedRelids = bms_add_member(modifiedRelids,
+										((PlanRowMark *) lfirst(lc))->rti);
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..a9cf9dd0f29 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v45-0002-Thread-flags-through-begin-scan-APIs.patch (32.8K, 3-v45-0002-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 84cfddea0ad7b2362f6c47ac68575f2d004edc55 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v45 2/5] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new uint32 flags field is added to
IndexFetchTableData, and the heap AM stores the caller-provided flags
there in heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  1 +
 src/backend/access/gin/gininsert.c        |  1 +
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  2 +
 src/backend/access/index/indexam.c        |  6 +-
 src/backend/access/nbtree/nbtsort.c       |  2 +-
 src/backend/access/table/tableam.c        | 19 ++++---
 src/backend/commands/constraint.c         |  2 +-
 src/backend/commands/copyto.c             |  2 +-
 src/backend/commands/tablecmds.c          |  9 ++-
 src/backend/commands/typecmds.c           |  4 +-
 src/backend/executor/execIndexing.c       |  3 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  1 +
 src/backend/executor/nodeIndexonlyscan.c  |  3 +
 src/backend/executor/nodeIndexscan.c      |  4 ++
 src/backend/executor/nodeSamplescan.c     |  1 +
 src/backend/executor/nodeSeqscan.c        |  6 +-
 src/backend/executor/nodeTidrangescan.c   |  5 +-
 src/backend/partitioning/partbounds.c     |  2 +-
 src/backend/utils/adt/selfuncs.c          |  1 +
 src/include/access/genam.h                |  2 +
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 68 +++++++++++++++--------
 26 files changed, 111 insertions(+), 62 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..95fad61fa9e 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, 0, GetActiveSnapshot(), 0, NULL);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..79a79bea1c6 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,6 +2844,7 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
+									0,
 									ParallelTableScanFromBrinShared(brinshared));
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..32167d03137 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,6 +2068,7 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
+									0,
 									ParallelTableScanFromGinBuildShared(ginshared));
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..951273a4d7f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex,
+									0,	/* flags */
+									SnapshotAny, NULL, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +775,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, 0, SnapshotAny, 0, (ScanKey) NULL);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..b099d956e41 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,6 +455,7 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
+										 0, /* flags */
 										 snapshot, NULL, nkeys, 0);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
@@ -716,6 +717,7 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
+									 0, /* flags */
 									 snapshot, NULL, nkeys, 0);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..ae754503007 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -256,6 +256,7 @@ index_insert_cleanup(Relation indexRelation,
 IndexScanDesc
 index_beginscan(Relation heapRelation,
 				Relation indexRelation,
+				uint32 flags,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
 				int nkeys, int norderbys)
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -592,6 +593,7 @@ index_parallelrescan(IndexScanDesc scan)
  */
 IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
+						 uint32 flags,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
 						 ParallelIndexScanDesc pscan)
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..98e9410c579 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									0, ParallelTableScanFromBTShared(btshared));
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..32bd3fdb7a5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,10 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, uint32 flags, ParallelTableScanDesc pscan)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +176,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +185,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
+								  uint32 flags,
 								  ParallelTableScanDesc pscan)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +207,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +216,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..390b4260ada 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, 0, GetActiveSnapshot(), 0, NULL);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..14d808671c5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13980,7 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, 0, snapshot, 0, NULL);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22881,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23345,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, 0, snapshot, 0, NULL);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..8c5d5e708a1 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,7 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, 0, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3266,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, 0, snapshot, 0, NULL);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..c46beedeb71 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,8 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index, 0,	/* flags */
+								 &DirtySnapshot, NULL, indnkeyatts, 0);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..3ef4d5d8bb2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   &snap, NULL, skey_attoff, 0);
 
 retry:
 	found = false;
@@ -383,7 +385,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, 0, &snap, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, 0, SnapshotAny, 0, NULL);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +668,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   0,	/* flags */
+						   SnapshotAny, NULL, skey_attoff, 0);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..7e2c1b7467b 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -146,6 +146,7 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	{
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
+							   0,
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..5cacb4b215a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -92,6 +92,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -791,6 +792,7 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
+								 0, /* flags */
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
@@ -857,6 +859,7 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
+								 0, /* flags */
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..aaef31dbbad 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -110,6 +110,7 @@ IndexNext(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -206,6 +207,7 @@ IndexNextWithReorder(IndexScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
+								   0,	/* flags */
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1727,6 +1729,7 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
+								 0, /* flags */
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
@@ -1791,6 +1794,7 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
+								 0, /* flags */
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf4dd6a16b4 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -294,6 +294,7 @@ tablesample_init(SampleScanState *scanstate)
 	{
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
+									 0,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..376e877e87c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -70,7 +70,7 @@ SeqNext(SeqScanState *node)
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   estate->es_snapshot,
+								   0, estate->es_snapshot,
 								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
@@ -375,7 +375,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +408,5 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..bacd7aa5bc4 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -243,6 +243,7 @@ TidRangeNext(TidRangeScanState *node)
 		if (scandesc == NULL)
 		{
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
+												0,
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid);
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  0, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  0, pscan);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..919df5eef0a 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, 0, snapshot, 0, NULL);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..0528f8166d8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7177,6 +7177,7 @@ get_actual_variable_endpoint(Relation heapRel,
 							  GlobalVisTestFor(heapRel));
 
 	index_scan = index_beginscan(heapRel, indexRel,
+								 0, /* flags */
 								 &SnapshotNonVacuumable, NULL,
 								 1, 0);
 	/* Set it up for index-only scan */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..24b2fda51df 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,6 +156,7 @@ extern void index_insert_cleanup(Relation indexRelation,
 
 extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
+									 uint32 flags,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
 									 int nkeys, int norderbys);
@@ -182,6 +183,7 @@ extern void index_parallelscan_initialize(Relation heapRelation,
 extern void index_parallelrescan(IndexScanDesc scan);
 extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
+											  uint32 flags,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
 											  ParallelIndexScanDesc pscan);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..e1f90f2b6a7 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,18 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -893,13 +909,14 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
  */
 static inline TableScanDesc
-table_beginscan(Relation rel, Snapshot snapshot,
+table_beginscan(Relation rel, uint32 flags, Snapshot snapshot,
 				int nkeys, ScanKeyData *key)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +945,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -938,12 +955,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  * make it worth using the same data structure.
  */
 static inline TableScanDesc
-table_beginscan_bm(Relation rel, Snapshot snapshot,
+table_beginscan_bm(Relation rel, uint32 flags, Snapshot snapshot,
 				   int nkeys, ScanKeyData *key)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -954,21 +972,22 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
  * also allows control of whether page-mode visibility checking is used.
  */
 static inline TableScanDesc
-table_beginscan_sampling(Relation rel, Snapshot snapshot,
+table_beginscan_sampling(Relation rel, uint32 flags, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
 						 bool allow_pagemode)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1000,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1013,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1057,14 +1076,15 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
  * for a TID range scan.
  */
 static inline TableScanDesc
-table_beginscan_tidrange(Relation rel, Snapshot snapshot,
+table_beginscan_tidrange(Relation rel, uint32 flags, Snapshot snapshot,
 						 ItemPointer mintid,
 						 ItemPointer maxtid)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,6 +1159,7 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
+											  uint32 flags,
 											  ParallelTableScanDesc pscan);
 
 /*
@@ -1149,6 +1170,7 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
+													   uint32 flags,
 													   ParallelTableScanDesc pscan);
 
 /*
@@ -1175,8 +1197,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1209,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v45-0003-Pass-down-information-on-table-modification-to-s.patch (11.6K, 4-v45-0003-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 7b400e9ccd7d8d83358bd503b7209c8ed1ec7ea3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v45 3/5] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  5 ++++-
 src/backend/executor/nodeIndexonlyscan.c  | 11 ++++++++---
 src/backend/executor/nodeIndexscan.c      | 16 ++++++++++++----
 src/backend/executor/nodeSamplescan.c     |  5 ++++-
 src/backend/executor/nodeSeqscan.c        | 14 +++++++++++---
 src/backend/executor/nodeTidrangescan.c   | 15 ++++++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..d2ffe28e010 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7e2c1b7467b..dba6c31d188 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -144,9 +144,12 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 	 */
 	if (!node->ss.ss_currentScanDesc)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		node->ss.ss_currentScanDesc =
 			table_beginscan_bm(node->ss.ss_currentRelation,
-							   0,
+							   flags,
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL);
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 5cacb4b215a..88491249a9a 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -85,6 +85,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index only scan is not parallel, or if we're
 		 * serially executing an index only scan that was planned to be
@@ -92,7 +95,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->ioss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
@@ -792,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
@@ -859,7 +863,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 	node->ioss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->ioss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index aaef31dbbad..16ec455a964 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -104,13 +104,16 @@ IndexNext(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -201,13 +204,16 @@ IndexNextWithReorder(IndexScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the index scan is not parallel, or if we're
 		 * serially executing an index scan that was planned to be parallel.
 		 */
 		scandesc = index_beginscan(node->ss.ss_currentRelation,
 								   node->iss_RelationDesc,
-								   0,	/* flags */
+								   flags,
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
@@ -1729,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
@@ -1794,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 	node->iss_ScanDesc =
 		index_beginscan_parallel(node->ss.ss_currentRelation,
 								 node->iss_RelationDesc,
-								 0, /* flags */
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf4dd6a16b4..b6a02072da5 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -292,9 +292,12 @@ tablesample_init(SampleScanState *scanstate)
 	/* Now we can create or reset the HeapScanDesc */
 	if (scanstate->ss.ss_currentScanDesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&scanstate->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		scanstate->ss.ss_currentScanDesc =
 			table_beginscan_sampling(scanstate->ss.ss_currentRelation,
-									 0,
+									 flags,
 									 scanstate->ss.ps.state->es_snapshot,
 									 0, NULL,
 									 scanstate->use_bulkread,
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 376e877e87c..2d0993a83f4 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,12 +65,15 @@ SeqNext(SeqScanState *node)
 
 	if (scandesc == NULL)
 	{
+		uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+			SO_HINT_REL_READ_ONLY : 0;
+
 		/*
 		 * We reach here if the scan is not parallel, or if we're serially
 		 * executing a scan that was planned to be parallel.
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
-								   0, estate->es_snapshot,
+								   flags, estate->es_snapshot,
 								   0, NULL);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
@@ -368,14 +371,17 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, flags, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -405,8 +411,10 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 							ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, 0, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, flags, pscan);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index bacd7aa5bc4..05ed5364238 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -242,8 +242,11 @@ TidRangeNext(TidRangeScanState *node)
 
 		if (scandesc == NULL)
 		{
+			uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+				SO_HINT_REL_READ_ONLY : 0;
+
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
-												0,
+												flags,
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid);
@@ -453,15 +456,18 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 {
 	EState	   *estate = node->ss.ps.state;
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_allocate(pcxt->toc, node->trss_pscanlen);
 	table_parallelscan_initialize(node->ss.ss_currentRelation,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  0, pscan);
+										  flags, pscan);
 }
 
 /* ----------------------------------------------------------------
@@ -491,9 +497,12 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 								 ParallelWorkerContext *pwcxt)
 {
 	ParallelTableScanDesc pscan;
+	uint32		flags = ScanRelIsReadOnly(&node->ss) ?
+		SO_HINT_REL_READ_ONLY : 0;
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  0, pscan);
+										  flags, pscan);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e1f90f2b6a7..a8fd8f0d45c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v45-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.1K, 5-v45-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 902a8522f213ffd0b5aae486740da3d6141c98b3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v45 4/5] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              | 16 +++++++--
 5 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 951273a4d7f..5c2faaf2340 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..d83fd26b274 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..2fc4462050a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -95,7 +96,12 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * For sequential scans, bitmap heap scans, TID range scans, and sample
+	 * scans. The current heap block's corresponding page in the visibility
+	 * map. If the relation is not modified by the query, on-access pruning
+	 * may set the VM.
+	 */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
@@ -126,7 +132,11 @@ typedef struct IndexFetchHeapData
 	 */
 	Buffer		xs_cbuf;
 
-	/* Current heap block's corresponding page in the visibility map */
+	/*
+	 * Current heap block's corresponding page in the visibility map. For
+	 * index scans that do not modify the underlying heap table, on-access
+	 * pruning may set the VM on-access.
+	 */
 	Buffer		xs_vmbuffer;
 } IndexFetchHeapData;
 
@@ -431,7 +441,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v45-0005-Set-pd_prune_xid-on-insert.patch (8.8K, 6-v45-0005-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 7633073a6866d4cb94c8722a547ba49e68950bb0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v45 5/5] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d83fd26b274..bb364f53a44 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-25 18:54                                       ` Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:29                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
  0 siblings, 2 replies; 34+ messages in thread

From: Melanie Plageman @ 2026-03-25 18:54 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 2:02 PM Tomas Vondra <[email protected]> wrote:
>
> 0002
>
> - Don't we usually keep "flags" as the last parameter? It seems a bit
> weird that it's added in between relation and snapshot.

In an earlier review, Andres said he disliked using flags as the last
parameter for index_beginscan() because its current last two
parameters are integers (nkeys and norderbys), which could be
confusing. Personally, I think you have to look at the function
signature before just randomly passing stuff, and so it shouldn't
matter -- but I didn't care enough to argue. If you agree with me that
they should be last, then it's two against one and I'll change it back
:) I can keep the callsite comments naming the flags parameter.

> - Do we really want to pass two sets of flags to table_beginscan_common?
>  I realize it's done to ensure "users" don't use internal flags, but
> then maybe it'd be better to do that check in the places calling the
> _common? Someone adding a new caller can break this in various ways
> anyway, e.g. by setting bits in the internal flags, no?

Yes, callers of table_beginscan_common() could pass flags they
shouldn't in internal_flags. But I was mostly trying to prevent the
case where a user picks a flag that overlaps with an internal flag,
conditionally passes it as a user flag, and then when they test for it
in their AM-specific code, they aren't actually checking if their own
flag is set.

Anyway, it's not hard to move:
    Assert((flags & SO_INTERNAL_FLAGS) == 0);
into the table_beginscan_common() callers and then pass the internal
flags the caller wants to pass + the user specified flags to
table_beginscan_common(). And I think that fixes what you are talking
about?

> If we want to have these checks, should we be more thorough? Should we
> check the internal flags only set internal flags?

That's easy enough too.
Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.

I think this would largely be the same as having
table_beginscan_common() callers validate that the user-passed flags
are not internal and then OR them together with the internal flags
they want to pass to table_beginscan_common().

I'm trying to think of cases where the two approaches would differ so
I can decide which to do.

> 0003
>
> - Half the "beginscan" calls use a ternary operator directly, half sets
> a variable first (and then uses that). Often mixed in the same file.
> Shouldn't it be a bit consistent?

Indeed.

- Melanie





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-25 23:14                                         ` Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-25 23:14 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 2:54 PM Melanie Plageman
<[email protected]> wrote:
>
> I'm trying to think of cases where the two approaches would differ so
> I can decide which to do.
>
> > 0003
> >
> > - Half the "beginscan" calls use a ternary operator directly, half sets
> > a variable first (and then uses that). Often mixed in the same file.
> > Shouldn't it be a bit consistent?
>
> Indeed.

Attached v46 addresses your feedback and has a bit of assorted cleanup in it.

I started wondering if table_beginscan_strat() is a bit weird now
because it has two boolean arguments that are basically just
SO_ALLOW_STRAT and SO_ALLOW_SYNC -- so those are kind of letting the
user set "internal" flags. Anyway, I'm not sure we should do anything
about it, but it got me thinking.

- Melanie


Attachments:

  [text/x-patch] v46-0001-Track-which-relations-are-modified-by-a-query.patch (6.3K, 2-v46-0001-Track-which-relations-are-modified-by-a-query.patch)
  download | inline diff:
From 4216d588438aacd4023801869edc464dc2cb0921 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v46 1/5] Track which relations are modified by a query

Save the range table indexes of relations modified by a query in a
bitmap in the PlannedStmt. This is derived from existing PlannedStmt
members listing row marks and result relations, but precomputing it
allows cheap membership checks during execution.

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

Relations are considered modified if they are the target of INSERT,
UPDATE, DELETE, or MERGE, or if they have any row mark (including SELECT
FOR UPDATE/SHARE and non-locking marks like ROW_MARK_REFERENCE). Since
this bitmap is used to avoid unnecessary work, it is okay for it to be
conservative.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c    |  1 +
 src/backend/executor/nodeLockRows.c    |  3 +++
 src/backend/executor/nodeModifyTable.c | 20 ++++++++++++++++++++
 src/backend/optimizer/plan/planner.c   | 21 ++++++++++++++++++++-
 src/include/nodes/plannodes.h          |  6 ++++++
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..4f39767d033 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->modifiedRelids = estate->es_plannedstmt->modifiedRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 8d865470780..38a43315f11 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -113,6 +113,9 @@ lnext:
 		}
 		erm->ermActive = true;
 
+		Assert(bms_is_member(erm->rti,
+							 estate->es_plannedstmt->modifiedRelids));
+
 		/* fetch the tuple's ctid */
 		datum = ExecGetJunkAttribute(slot,
 									 aerm->ctidAttNo,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..b22264c343b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -896,6 +896,14 @@ ExecInsert(ModifyTableContext *context,
 
 	resultRelationDesc = resultRelInfo->ri_RelationDesc;
 
+	/*
+	 * If this is a leaf partition we just found, it won't have a valid range
+	 * table index.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	/*
 	 * Open the table's indexes, if we have not done so already, so that we
 	 * can add new index entries for the inserted tuple.
@@ -1523,6 +1531,9 @@ ExecDeleteAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 {
 	EState	   *estate = context->estate;
 
+	Assert(bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	return table_tuple_delete(resultRelInfo->ri_RelationDesc, tupleid,
 							  estate->es_output_cid,
 							  estate->es_snapshot,
@@ -2205,6 +2216,15 @@ ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 	bool		partition_constraint_failed;
 	TM_Result	result;
 
+	/*
+	 * Tuple routing for cross-partition updates or ON CONFLICT ... DO UPDATE
+	 * may open leaf partitions not in the range table, in which case
+	 * ri_RangeTableIndex is 0.
+	 */
+	Assert(resultRelInfo->ri_RangeTableIndex == 0 ||
+		   bms_is_member(resultRelInfo->ri_RangeTableIndex,
+						 estate->es_plannedstmt->modifiedRelids));
+
 	updateCxt->crossPartUpdate = false;
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..de10b6fb413 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *modifiedRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +663,23 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute modifiedRelids from result relations and row marks.
+	 *
+	 * This isn't exactly what the executor will actually modify/lock at
+	 * runtime. Runtime partition pruning may eliminate some result relations
+	 * and some rowmarks are included that may not result in table
+	 * modification. Conversely, leaf partitions whose result relations are
+	 * created at the time of insert are not included here.
+	 */
+	foreach(lc, glob->resultRelations)
+		modifiedRelids = bms_add_member(modifiedRelids, lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		modifiedRelids = bms_add_member(modifiedRelids,
+										((PlanRowMark *) lfirst(lc))->rti);
+	result->modifiedRelids = modifiedRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..a9cf9dd0f29 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
 	 */
 	Bitmapset  *unprunableRelids;
 
+	/*
+	 * RT indexes of relations modified by the query through
+	 * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+	 */
+	Bitmapset  *modifiedRelids;
+
 	/*
 	 * list of RTEPermissionInfo nodes for rtable entries needing one
 	 */
-- 
2.43.0



  [text/x-patch] v46-0002-Thread-flags-through-begin-scan-APIs.patch (34.0K, 3-v46-0002-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 384753f272cbe60e77299b97153801d3a448d33b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v46 2/5] Thread flags through begin-scan APIs

Add a user-settable flags parameter to the table_beginscan_* wrappers,
index_beginscan(), table_index_fetch_begin(), and the table
AM callback index_fetch_begin(). This allows users to pass additional
context to be used when building the scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM stores the caller-provided flags there in
heapam_index_fetch_begin().

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  3 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  6 +-
 src/backend/access/index/indexam.c        | 10 ++--
 src/backend/access/nbtree/nbtsort.c       |  3 +-
 src/backend/access/table/tableam.c        | 22 +++----
 src/backend/commands/constraint.c         |  3 +-
 src/backend/commands/copyto.c             |  3 +-
 src/backend/commands/tablecmds.c          | 13 ++--
 src/backend/commands/typecmds.c           |  6 +-
 src/backend/executor/execIndexing.c       |  4 +-
 src/backend/executor/execReplication.c    | 14 +++--
 src/backend/executor/nodeBitmapHeapscan.c |  3 +-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++-
 src/backend/executor/nodeIndexscan.c      | 12 ++--
 src/backend/executor/nodeSamplescan.c     |  3 +-
 src/backend/executor/nodeSeqscan.c        |  9 ++-
 src/backend/executor/nodeTidrangescan.c   |  7 ++-
 src/backend/partitioning/partbounds.c     |  3 +-
 src/backend/utils/adt/selfuncs.c          |  3 +-
 src/include/access/genam.h                |  6 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 73 +++++++++++++++--------
 26 files changed, 152 insertions(+), 84 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..75ad379190f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,8 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+						   0 /* flags */ );
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..536493fa38a 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									0 /* flags */ );
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..d4e9c9ed950 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									0 /* flags */ );
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..8c7695ebfb9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									0 /* flags */ );
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									0 /* flags */ );
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..03a243345bc 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 0 /* flags */ );
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 0 /* flags */ );
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..13cdbb86cd7 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -594,7 +595,8 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..3c444ece216 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									0 /* flags */ );
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..3ac4027ce11 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, 0);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, 0 /* flags */ );
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..1c4f5a25ba4 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															0 /* flags */ );
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..e6c237d6d0f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   0 /* flags */ );
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..6dd3aed6b98 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   0 /* flags */ );
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   0 /* flags */ );
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..115bd77af27 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   0 /* flags */ );
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..72671013c52 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 0 /* flags */ );
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..ca0d1cc6b95 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,9 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0,
+						   0 /* flags */ );
 
 retry:
 	found = false;
@@ -383,7 +385,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   0 /* flags */ );
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +605,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   0 /* flags */ );
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +670,9 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0,
+						   0 /* flags */ );
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..e58bb02db43 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   0 /* flags */ );
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..f8a6671793f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3df091ac000 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   0 /* flags */ );
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 0 /* flags */ );
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..f0e14e53fab 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 0 /* flags */ );
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..eaa8cfb6a1a 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   0 /* flags */ );
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 0 /* flags */ );
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 0 /* flags */ );
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..6f63e9f80d0 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												0 /* flags */ );
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0 /* flags */ );
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, 0 /* flags */ );
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..c0f847b43be 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   0 /* flags */ );
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..9fbbb6a8ddc 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 0 /* flags */ );
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..ce5176bdf69 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,16 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table_beginscan_* functions
+ * and must not be passed by callers.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +430,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +881,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +911,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +946,7 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags, 0);
 }
 
 /*
@@ -939,11 +957,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +976,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1001,7 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -994,7 +1014,7 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags, 0);
 }
 
 /*
@@ -1059,12 +1079,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1160,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1171,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1198,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1210,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v46-0003-Pass-down-information-on-table-modification-to-s.patch (9.5K, 4-v46-0003-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 105c2c2c0057ee9945cf6ec1c32061f617f627a2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v46 3/5] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          |  8 ++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..d2ffe28e010 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,14 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
 }
 
+/* Return true if the scan node's relation is not modified by the query */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	return !bms_is_member(((Scan *) ss->ps.plan)->scanrelid,
+						  ss->ps.state->es_plannedstmt->modifiedRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index e58bb02db43..7096e6f8645 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   0 /* flags */ );
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : 0);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index f8a6671793f..3971e54d7da 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3df091ac000..09df10dd78a 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index f0e14e53fab..98fab36fbdc 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 0 /* flags */ );
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : 0);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index eaa8cfb6a1a..2f4c18051cd 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   0 /* flags */ );
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : 0);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 0 /* flags */ );
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : 0);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 6f63e9f80d0..f83a72e3635 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												0 /* flags */ );
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : 0);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0 /* flags */ );
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : 0);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, 0 /* flags */ );
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : 0);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ce5176bdf69..014c686a5de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v46-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.2K, 5-v46-0004-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From e8f9da0d1ca12ab03cb58e4283dfd4111aa9fc2c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v46 4/5] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++--
 src/backend/access/heap/pruneheap.c      | 46 +++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 8c7695ebfb9..d59b423c8ad 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..d83fd26b274 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -253,7 +256,8 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * unpinning *vmbuffer.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +340,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +398,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +468,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +926,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1189,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v46-0005-Set-pd_prune_xid-on-insert.patch (8.8K, 6-v46-0005-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 9d6d6c2529700e4fe381dbc55ef172ba13882fab Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v46 5/5] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d83fd26b274..bb364f53a44 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -275,7 +275,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1918,17 +1919,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-26 23:10                                           ` David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: David Rowley @ 2026-03-26 23:10 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, 26 Mar 2026 at 12:14, Melanie Plageman
<[email protected]> wrote:
> Attached v46 addresses your feedback and has a bit of assorted cleanup in it.

(I've not had a chance to process this thread, so apologies if I
missed discussion on certain things I'm going to say)

I was looking at v46-0001. With:

+++ b/src/include/nodes/plannodes.h
@@ -112,6 +112,12 @@ typedef struct PlannedStmt
  */
  Bitmapset  *unprunableRelids;

+ /*
+ * RT indexes of relations modified by the query through
+ * UPDATE/DELETE/INSERT/MERGE or targeted by SELECT FOR UPDATE/SHARE.
+ */
+ Bitmapset  *modifiedRelids;
+

This doesn't really mention anything about leaf partitions not being
mentioned for INSERT queries. You did mention it in standard_planner()
here:

+ * modification. Conversely, leaf partitions whose result relations are
+ * created at the time of insert are not included here.

I think if someone is going to use this field, they're going to look
at where the field is defined to find out what it is, not where it
gets populated.

I'm also wondering about having this combined field. If you were to
have a Bitmapset field that mirrors "List *resultRelations;", then
have another:

/* a list of PlanRowMark's */
List   *rowMarks;

+ /* Relids which have rowMarks */
+ Bitmapset *rowMarkRelids;

I think they're more likely to be useful for other purposes, and I
think the only pain that it causes you is that you have to call
bms_is_member() twice in ScanRelIsReadOnly().

Then, as a follow-up, maybe we could consider removing
PlannedStmt.resultRelations.  (The deprecated)
ExecRelationIsTargetRelation() could use the new Bitmapset, which
would be more efficient. OverExplain does do:

if (es->format != EXPLAIN_FORMAT_TEXT ||
plannedstmt->resultRelations != NIL)
overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);

but maybe Robert is ok with those coming out in ascending numerical
order rather than list order. overexplain_bitmapset() would do that.

In [1], I didn't see any code actually using the field. Just a couple
of projects that have duplicated the copyObject() code.

I did quickly look over the remaining patches. I wondered if you might
want to add a new ScanOption SO_NONE = 0, or SO_EMPTY_FLAGS. It might
make the places where you're passing zero directly easier to read?

David

[1] https://codesearch.debian.net/search?q=resultRelations&literal=1





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
@ 2026-03-27 19:17                                             ` Melanie Plageman <[email protected]>
  2026-03-29 17:16                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-27 19:17 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 26, 2026 at 7:10 PM David Rowley <[email protected]> wrote:
>
> I was looking at v46-0001. With:

Thanks for taking a look!

> +++ b/src/include/nodes/plannodes.h
> + Bitmapset  *modifiedRelids;
> +
>
> This doesn't really mention anything about leaf partitions not being
> mentioned for INSERT queries. You did mention it in standard_planner()
> here:
>
> I'm also wondering about having this combined field. If you were to
> have a Bitmapset field that mirrors "List *resultRelations;", then
> have another:
>
> /* a list of PlanRowMark's */
> List   *rowMarks;
>
> + /* Relids which have rowMarks */
> + Bitmapset *rowMarkRelids;
>
> I think they're more likely to be useful for other purposes, and I
> think the only pain that it causes you is that you have to call
> bms_is_member() twice in ScanRelIsReadOnly().

Yea, outside of the insert into leaf partitions case, I thought of
another, perhaps even more compelling reason the combined field might
be a bit confusing:
Take a table t1 and a table t2. If you do
SELECT * FROM t1 JOIN t2 ON t1.id  = t2.id FOR UPDATE of t1;
t1 will get a ROW_MARK_EXCLUSIVE and t2 would get a ROW_MARK_REFERENCE
(that's just how preprocess_rowmarks() works).
That means modifiedRelids would contain t2, even though t2 is not
being locked for update. For the purposes of setting the VM, it's
totally fine that we are more conservative than we need to be and
don't consider setting it when scanning t2. But for the purposes of
modifiedRelids, it's a bit confusing that t2 is in there.

But we can't just exclude ROW_MARK_REFERENCE from modifiedRelids
because we rely on ROW_MARK_REFERENCE to avoid setting the VM for a
table we are updating or deleting from when it is mentioned more than
once in the query (e.g. UPDATE foo SET x = 1 FROM foo f2 WHERE foo.id
= f2.id).

So, for that reason and because of the missing leaf partitions for
inserts, I think making quick reference bitmapsets would be better.
I've done this in attached v47.

I've also removed the asserts in ExecInsert/Update/Delete because they
are a bit tautological now.

My one remaining question is whether the two new bitmapsets
(rowMarkRelids and resultRelationRelids) should move from the
PlannedStmt to the EState. They are determined at plan time and never
modified during execution. However, I do notice there are other EState
members that seem like just a copy of info from the PlannedStmt that
isn't modified during execution (e.g. es_rteperminfos/permInfos).
However, putting them in the EState increases the work required to get
them to parallel workers and to the child estate for EPQs. I would
prefer to keep it in the PlannedStmt but am worried that breaks
convention.

> Then, as a follow-up, maybe we could consider removing
> PlannedStmt.resultRelations.  (The deprecated)
> ExecRelationIsTargetRelation() could use the new Bitmapset, which
> would be more efficient.

Yea, I like this and think it makes sense. Done in v47.

> I did quickly look over the remaining patches. I wondered if you might
> want to add a new ScanOption SO_NONE = 0, or SO_EMPTY_FLAGS. It might
> make the places where you're passing zero directly easier to read?

That makes sense to me. Done in v47.

- Melanie


Attachments:

  [text/x-patch] v47-0001-Make-it-cheap-to-check-with-relations-are-modifi.patch (4.4K, 2-v47-0001-Make-it-cheap-to-check-with-relations-are-modifi.patch)
  download | inline diff:
From 499fe3dbdddb6321b5f09d9d94e37a5c97303bda Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 09:21:22 -0400
Subject: [PATCH v47 1/6] Make it cheap to check with relations are modified by
 a query

Save the range table indexes of result relations and row mark relations in
separate bitmaps in the PlannedStmt. Precomputing them allows cheap membership
checks during execution. With a few exceptions, these two groups comprise all
relations that will be modified by a query. This includes relations targeted by
INSERT, UPDATE, DELETE, and MERGE as well as relations with any row mark (like
SELECT for UPDATE).

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

PlannedStmt->resultRelations is only used in a membership check, so it may make
sense to replace its usage with the new resultRelationRelids.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c  |  2 ++
 src/backend/optimizer/plan/planner.c | 19 ++++++++++++++++++-
 src/include/nodes/plannodes.h        |  9 +++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..791fcb88de9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,8 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
+	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d19800ad6a5..df4c99fc3ff 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *resultRelationRelids = NULL;
+	Bitmapset  *rowMarkRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +664,20 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
+	 * rowMarks for quick access.
+	 */
+	foreach(lc, glob->resultRelations)
+		resultRelationRelids = bms_add_member(resultRelationRelids,
+											  lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		rowMarkRelids = bms_add_member(rowMarkRelids,
+									   ((PlanRowMark *) lfirst(lc))->rti);
+	result->resultRelationRelids = resultRelationRelids;
+	result->rowMarkRelids = rowMarkRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..88be65d7bde 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,9 @@ typedef struct PlannedStmt
 	/* integer list of RT indexes, or NIL */
 	List	   *resultRelations;
 
+	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
+	Bitmapset  *resultRelationRelids;
+
 	/* list of AppendRelInfo nodes */
 	List	   *appendRelations;
 
@@ -138,6 +141,12 @@ typedef struct PlannedStmt
 	/* a list of PlanRowMark's */
 	List	   *rowMarks;
 
+	/*
+	 * RT indexes of relations with row marks. Useful for quick membership
+	 * checks instead of iterating through rowMarks.
+	 */
+	Bitmapset  *rowMarkRelids;
+
 	/* OIDs of relations the plan depends on */
 	List	   *relationOids;
 
-- 
2.43.0



  [text/x-patch] v47-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch (3.8K, 3-v47-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch)
  download | inline diff:
From bf13aaf3f9a9610e7e7be381dfdb7242f22761c7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 08:35:00 -0400
Subject: [PATCH v47 2/6] Remove PlannedStmt->resultRelations in favor of
 resultRelationRelids

PlannedStmt->resultRelations was an integer list of range table indexes.
Now that we have a bitmapset, which offers cheap membership checks,
remove the list and update all consumers to use the bitmapset.
---
 contrib/pg_overexplain/pg_overexplain.c | 5 +++--
 src/backend/executor/execParallel.c     | 1 -
 src/backend/executor/execUtils.c        | 2 +-
 src/backend/optimizer/plan/planner.c    | 1 -
 src/include/nodes/plannodes.h           | 4 ----
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index c2b90493cc6..b4e90909289 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -780,8 +780,9 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
 		overexplain_bitmapset("Unprunable RTIs", plannedstmt->unprunableRelids,
 							  es);
 	if (es->format != EXPLAIN_FORMAT_TEXT ||
-		plannedstmt->resultRelations != NIL)
-		overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);
+		!bms_is_empty(plannedstmt->resultRelationRelids))
+		overexplain_bitmapset("Result RTIs", plannedstmt->resultRelationRelids,
+							  es);
 
 	/* Close group, we're all done */
 	ExplainCloseGroup("Range Table", "Range Table", false, es);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 791fcb88de9..1bab6160036 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -191,7 +191,6 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
 	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
-	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
 	pstmt->planOrigin = PLAN_STMT_INTERNAL;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..36c5285d252 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -733,7 +733,7 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
+	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4c99fc3ff..9853443209d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -659,7 +659,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 											  glob->prunableRelids);
 	result->permInfos = glob->finalrteperminfos;
 	result->subrtinfos = glob->subrtinfos;
-	result->resultRelations = glob->resultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 88be65d7bde..19e5d814c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -117,10 +117,6 @@ typedef struct PlannedStmt
 	 */
 	List	   *permInfos;
 
-	/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
-	/* integer list of RT indexes, or NIL */
-	List	   *resultRelations;
-
 	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
 	Bitmapset  *resultRelationRelids;
 
-- 
2.43.0



  [text/x-patch] v47-0003-Thread-flags-through-begin-scan-APIs.patch (34.2K, 4-v47-0003-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 58f307d8f03fbcfbc4933e3f2cecf752294d804c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v47 3/6] Thread flags through begin-scan APIs

Add an AM user-settable flags parameter to several of the table
scan functions, one table AM callback, and index_beginscan(). This
allows users to pass additional context to be used when building the
scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |  2 +-
 src/backend/access/brin/brin.c            |  3 +-
 src/backend/access/gin/gininsert.c        |  3 +-
 src/backend/access/heap/heapam_handler.c  |  9 ++-
 src/backend/access/index/genam.c          |  6 +-
 src/backend/access/index/indexam.c        | 10 +--
 src/backend/access/nbtree/nbtsort.c       |  3 +-
 src/backend/access/table/tableam.c        | 22 +++---
 src/backend/commands/constraint.c         |  3 +-
 src/backend/commands/copyto.c             |  3 +-
 src/backend/commands/tablecmds.c          | 13 ++--
 src/backend/commands/typecmds.c           |  6 +-
 src/backend/executor/execIndexing.c       |  4 +-
 src/backend/executor/execReplication.c    | 12 ++--
 src/backend/executor/nodeBitmapHeapscan.c |  3 +-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++-
 src/backend/executor/nodeIndexscan.c      | 12 ++--
 src/backend/executor/nodeSamplescan.c     |  3 +-
 src/backend/executor/nodeSeqscan.c        |  9 ++-
 src/backend/executor/nodeTidrangescan.c   |  7 +-
 src/backend/partitioning/partbounds.c     |  3 +-
 src/backend/utils/adt/selfuncs.c          |  3 +-
 src/include/access/genam.h                |  6 +-
 src/include/access/heapam.h               |  5 +-
 src/include/access/relscan.h              |  1 +
 src/include/access/tableam.h              | 81 ++++++++++++++++-------
 26 files changed, 157 insertions(+), 84 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..d164c4c03ad 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, SO_NONE);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..bdb30752e09 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..9d83a495775 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..99280cd8159 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									SO_NONE);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									SO_NONE);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1408989c568 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 SO_NONE);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 SO_NONE);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..13cdbb86cd7 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -594,7 +595,8 @@ IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +618,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..756dfa3dcf4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									SO_NONE);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..86481d7c029 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, SO_NONE);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, SO_NONE);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..421d8c359f0 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															SO_NONE);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..f0e0147c665 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   SO_NONE);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..ec0063287d0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   SO_NONE);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   SO_NONE);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..cd38e9cddf4 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cc6eb3a6ee9 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 SO_NONE);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..fea8991cb04 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,8 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0, SO_NONE);
 
 retry:
 	found = false;
@@ -383,7 +384,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +669,8 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0, SO_NONE);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..69683d81527 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..02df40f32c5 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3c0b8daf664 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf32df33d82 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..09ccc65de1c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..084e4c6ec90 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..f867d1b75a5 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..4160d2d6e24 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 SO_NONE);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..80ea0b437d1 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,7 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..60ceee9decd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -45,6 +45,8 @@ typedef struct ValidateIndexState ValidateIndexState;
  */
 typedef enum ScanOptions
 {
+	SO_NONE = 0,
+
 	/* one of SO_TYPE_* may be specified */
 	SO_TYPE_SEQSCAN = 1 << 0,
 	SO_TYPE_BITMAPSCAN = 1 << 1,
@@ -65,6 +67,19 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table scan functions and
+ * shouldn't be passed by callers. Some of these are effectively set by callers
+ * through parameters to table scan functions (e.g. SO_ALLOW_STRAT/allow_strat),
+ * however, for now, retain tight control over them and don't allow users to
+ * pass these themselves to table scan functions.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -420,7 +435,7 @@ typedef struct TableAmRoutine
 	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +886,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -894,12 +916,13 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +951,8 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -939,11 +963,12 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -957,18 +982,19 @@ static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1007,8 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -994,7 +1021,8 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -1059,12 +1087,13 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1139,7 +1168,8 @@ extern void table_parallelscan_initialize(Relation rel,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1149,7 +1179,8 @@ extern TableScanDesc table_beginscan_parallel(Relation relation,
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1175,8 +1206,10 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1218,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v47-0004-Pass-down-information-on-table-modification-to-s.patch (10.0K, 5-v47-0004-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 975d619cbf3d9158a8134c9c62f8cf936290574b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v47 4/6] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          | 21 +++++++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 36c5285d252..f090de49921 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,27 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ *
+ * This is not perfectly accurate. INSERT ... SELECT from the same table does
+ * not add the scan relation to resultRelationRelids, so it will be reported
+ * as read-only even though the query modifies it.
+ *
+ * Conversely, when any relation in the query has a modifying row mark, all
+ * other relations get a ROW_MARK_REFERENCE, causing them to be reported as
+ * not read-only even though they may only be read.
+ */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	Index		scanrelid = ((Scan *) ss->ps.plan)->scanrelid;
+	PlannedStmt *pstmt = ss->ps.state->es_plannedstmt;
+
+	return !bms_is_member(scanrelid, pstmt->resultRelationRelids) &&
+		!bms_is_member(scanrelid, pstmt->rowMarkRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 69683d81527..73831aed451 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   SO_NONE);
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 02df40f32c5..de6154fd541 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3c0b8daf664..1620d146071 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf32df33d82..f3d273e1c5e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 SO_NONE);
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 09ccc65de1c..04803b0e37d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 084e4c6ec90..4a8fe91b2b3 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												SO_NONE);
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 60ceee9decd..5f1c1079cb5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v47-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 6-v47-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From 3c4c589c84fb5444fe40b8a8eec506845d1130e0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v47 5/6] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++-
 src/backend/access/heap/pruneheap.c      | 56 +++++++++++++++++++-----
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 044f385e477..dbdf6521c42 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 99280cd8159..3433ea93c11 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..48f7cf77bc8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -251,9 +254,20 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
  * VM corruption during pruning, we will fix it. Caller is responsible for
  * unpinning *vmbuffer.
+ *
+ * rel_read_only is true if we determined at plan time that the query does not
+ * modify the relation. It is counterproductive to set the VM if the query
+ * will immediately clear it.
+ *
+ * As noted in ScanRelIsReadOnly(), INSERT ... SELECT on the same table will
+ * report the scan relation as read-only. This is usually harmless in
+ * practice. It is useful to set scanned pages all-visible that won't be
+ * inserted into. Pages we do insert to rarely meet the criteria for pruning,
+ * and those that do will contain in-progress inserts after the first tuple.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +350,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +408,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +478,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +936,37 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL. This situation
+	 * should be rare, as on-access pruning is only attempted when
+	 * pd_prune_xid is valid.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1199,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v47-0006-Set-pd_prune_xid-on-insert.patch (8.8K, 7-v47-0006-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 7f874b2f759a48a4553b85a2e7655075a311f32e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v47 6/6] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts
and on the new page during updates.

This enables heap_page_prune_and_freeze() to set the VM all-visible
after a page is filled with newly inserted tuples the first time it is
read. This means the page will get set all-visible when it is still in
shared buffers and avoid potential I/O amplification when vacuum later
has to scan the page and set it all-visible. It also enables index-only
scans of newly inserted data much sooner.

This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index dbdf6521c42..cdaf57e3f12 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2156,6 +2156,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2182,6 +2183,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2205,25 +2208,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2233,7 +2241,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2598,8 +2605,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4141,12 +4152,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 48f7cf77bc8..5bb9e929acf 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -285,7 +285,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1928,17 +1929,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-29 17:16                                               ` Melanie Plageman <[email protected]>
  2026-03-31 02:16                                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-29 17:16 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Fri, Mar 27, 2026 at 3:17 PM Melanie Plageman
<[email protected]> wrote:
>
>  Done in v47.

Attached v48 does a bit more cleanup. No functional changes. I'm
planning to push this soon. I think my remaining question is whether I
should move the row marks and result relation bitmaps into the estate.
I'm leaning toward not doing that and leaving them in the PlannedStmt.
Anyway, If I want to replace the list of result relation RTIs in the
PlannedStmt, I have to leave the bitmapset version there.

- Melanie


Attachments:

  [text/x-patch] v48-0001-Make-it-cheap-to-check-if-a-relation-is-modified.patch (4.4K, 2-v48-0001-Make-it-cheap-to-check-if-a-relation-is-modified.patch)
  download | inline diff:
From 04d24039ec7c14672955aaaba37e3aa512858a0d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 09:21:22 -0400
Subject: [PATCH v48 1/6] Make it cheap to check if a relation is modified by a
 query

Save the range table indexes of result relations and row mark relations
in separate bitmaps in the PlannedStmt. Precomputing them allows cheap
membership checks during execution. With a few exceptions, these two
groups comprise all relations that will be modified by a query. This
includes relations targeted by INSERT, UPDATE, DELETE, and MERGE as well
as relations with any row mark (like SELECT FOR UPDATE).

A later commit will use this information during scans to control whether
or not on-access pruning is allowed to set the visibility map -- which
would be counterproductive if the query will modify the page.

PlannedStmt->resultRelations is only used in a membership check, so it
may make sense to replace its usage with the new resultRelationRelids.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 src/backend/executor/execParallel.c  |  2 ++
 src/backend/optimizer/plan/planner.c | 19 ++++++++++++++++++-
 src/include/nodes/plannodes.h        |  9 +++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..791fcb88de9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -188,6 +188,8 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->partPruneInfos = estate->es_part_prune_infos;
 	pstmt->rtable = estate->es_range_table;
 	pstmt->unprunableRelids = estate->es_unpruned_relids;
+	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
+	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
 	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d19800ad6a5..df4c99fc3ff 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -340,8 +340,11 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	RelOptInfo *final_rel;
 	Path	   *best_path;
 	Plan	   *top_plan;
+	Bitmapset  *resultRelationRelids = NULL;
+	Bitmapset  *rowMarkRelids = NULL;
 	ListCell   *lp,
-			   *lr;
+			   *lr,
+			   *lc;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -661,6 +664,20 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
 	result->rowMarks = glob->finalrowmarks;
+
+	/*
+	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
+	 * rowMarks for quick access.
+	 */
+	foreach(lc, glob->resultRelations)
+		resultRelationRelids = bms_add_member(resultRelationRelids,
+											  lfirst_int(lc));
+	foreach(lc, glob->finalrowmarks)
+		rowMarkRelids = bms_add_member(rowMarkRelids,
+									   ((PlanRowMark *) lfirst(lc))->rti);
+	result->resultRelationRelids = resultRelationRelids;
+	result->rowMarkRelids = rowMarkRelids;
+
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
 	result->paramExecTypes = glob->paramExecTypes;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..88be65d7bde 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,9 @@ typedef struct PlannedStmt
 	/* integer list of RT indexes, or NIL */
 	List	   *resultRelations;
 
+	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
+	Bitmapset  *resultRelationRelids;
+
 	/* list of AppendRelInfo nodes */
 	List	   *appendRelations;
 
@@ -138,6 +141,12 @@ typedef struct PlannedStmt
 	/* a list of PlanRowMark's */
 	List	   *rowMarks;
 
+	/*
+	 * RT indexes of relations with row marks. Useful for quick membership
+	 * checks instead of iterating through rowMarks.
+	 */
+	Bitmapset  *rowMarkRelids;
+
 	/* OIDs of relations the plan depends on */
 	List	   *relationOids;
 
-- 
2.43.0



  [text/x-patch] v48-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch (3.8K, 3-v48-0002-Remove-PlannedStmt-resultRelations-in-favor-of-r.patch)
  download | inline diff:
From 7c331c575a377b40a1dd1142b23fa3a8692de38f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Mar 2026 08:35:00 -0400
Subject: [PATCH v48 2/6] Remove PlannedStmt->resultRelations in favor of
 resultRelationRelids

PlannedStmt->resultRelations was an integer list of range table indexes.
Now that we have a bitmapset, which offers cheap membership checks,
remove the list and update all consumers to use the bitmapset.
---
 contrib/pg_overexplain/pg_overexplain.c | 5 +++--
 src/backend/executor/execParallel.c     | 1 -
 src/backend/executor/execUtils.c        | 2 +-
 src/backend/optimizer/plan/planner.c    | 1 -
 src/include/nodes/plannodes.h           | 4 ----
 5 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index c2b90493cc6..b4e90909289 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -780,8 +780,9 @@ overexplain_range_table(PlannedStmt *plannedstmt, ExplainState *es)
 		overexplain_bitmapset("Unprunable RTIs", plannedstmt->unprunableRelids,
 							  es);
 	if (es->format != EXPLAIN_FORMAT_TEXT ||
-		plannedstmt->resultRelations != NIL)
-		overexplain_intlist("Result RTIs", plannedstmt->resultRelations, es);
+		!bms_is_empty(plannedstmt->resultRelationRelids))
+		overexplain_bitmapset("Result RTIs", plannedstmt->resultRelationRelids,
+							  es);
 
 	/* Close group, we're all done */
 	ExplainCloseGroup("Range Table", "Range Table", false, es);
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 791fcb88de9..1bab6160036 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -191,7 +191,6 @@ ExecSerializePlan(Plan *plan, EState *estate)
 	pstmt->resultRelationRelids = estate->es_plannedstmt->resultRelationRelids;
 	pstmt->rowMarkRelids = estate->es_plannedstmt->rowMarkRelids;
 	pstmt->permInfos = estate->es_rteperminfos;
-	pstmt->resultRelations = NIL;
 	pstmt->appendRelations = NIL;
 	pstmt->planOrigin = PLAN_STMT_INTERNAL;
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9886ab06b69..36c5285d252 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -733,7 +733,7 @@ ExecCreateScanSlotFromOuterPlan(EState *estate,
 bool
 ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 {
-	return list_member_int(estate->es_plannedstmt->resultRelations, scanrelid);
+	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index df4c99fc3ff..9853443209d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -659,7 +659,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 											  glob->prunableRelids);
 	result->permInfos = glob->finalrteperminfos;
 	result->subrtinfos = glob->subrtinfos;
-	result->resultRelations = glob->resultRelations;
 	result->appendRelations = glob->appendRelations;
 	result->subplans = glob->subplans;
 	result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 88be65d7bde..19e5d814c59 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -117,10 +117,6 @@ typedef struct PlannedStmt
 	 */
 	List	   *permInfos;
 
-	/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
-	/* integer list of RT indexes, or NIL */
-	List	   *resultRelations;
-
 	/* RT indexes of result relations targeted by INSERT/UPDATE/DELETE/MERGE */
 	Bitmapset  *resultRelationRelids;
 
-- 
2.43.0



  [text/x-patch] v48-0003-Thread-flags-through-begin-scan-APIs.patch (37.1K, 4-v48-0003-Thread-flags-through-begin-scan-APIs.patch)
  download | inline diff:
From 05cc37abae70327fda4bee4a392dfebcc08ec3c5 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 19 Mar 2026 17:05:55 -0400
Subject: [PATCH v48 3/6] Thread flags through begin-scan APIs

Add an AM user-settable flags parameter to several of the table
scan functions, one table AM callback, and index_beginscan(). This
allows users to pass additional context to be used when building the
scan descriptors.

For index scans, a new flags field is added to IndexFetchTableData, and
the heap AM saves the caller-provided flags there.

This introduces an extension point for follow-up work to pass
per-scan information (such as whether the relation is read-only for the
current query) from the executor to the AM layer.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/F5CDD1B5-628C-44A1-9F85-3958C626F6A9%40gmail.com
---
 contrib/pgrowlocks/pgrowlocks.c           |   2 +-
 src/backend/access/brin/brin.c            |   3 +-
 src/backend/access/gin/gininsert.c        |   3 +-
 src/backend/access/heap/heapam_handler.c  |   9 +-
 src/backend/access/index/genam.c          |   6 +-
 src/backend/access/index/indexam.c        |  13 ++-
 src/backend/access/nbtree/nbtsort.c       |   3 +-
 src/backend/access/table/tableam.c        |  22 ++---
 src/backend/commands/constraint.c         |   3 +-
 src/backend/commands/copyto.c             |   3 +-
 src/backend/commands/tablecmds.c          |  13 +--
 src/backend/commands/typecmds.c           |   6 +-
 src/backend/executor/execIndexing.c       |   4 +-
 src/backend/executor/execReplication.c    |  12 ++-
 src/backend/executor/nodeBitmapHeapscan.c |   3 +-
 src/backend/executor/nodeIndexonlyscan.c  |   9 +-
 src/backend/executor/nodeIndexscan.c      |  12 ++-
 src/backend/executor/nodeSamplescan.c     |   3 +-
 src/backend/executor/nodeSeqscan.c        |   9 +-
 src/backend/executor/nodeTidrangescan.c   |   7 +-
 src/backend/partitioning/partbounds.c     |   3 +-
 src/backend/utils/adt/selfuncs.c          |   3 +-
 src/include/access/genam.h                |   6 +-
 src/include/access/heapam.h               |   5 +-
 src/include/access/relscan.h              |   6 ++
 src/include/access/tableam.h              | 103 ++++++++++++++++------
 26 files changed, 185 insertions(+), 86 deletions(-)

diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index ff3692c87c4..d164c4c03ad 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -115,7 +115,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
 					   RelationGetRelationName(rel));
 
 	/* Scan the relation */
-	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, SO_NONE);
 	hscan = (HeapScanDesc) scan;
 
 	attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 2a0f8c8e3b8..bdb30752e09 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2844,7 +2844,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
 	indexInfo->ii_Concurrent = brinshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromBrinShared(brinshared));
+									ParallelTableScanFromBrinShared(brinshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
 									   brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index e54782d9dd8..9d83a495775 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2068,7 +2068,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
 	indexInfo->ii_Concurrent = ginshared->isconcurrent;
 
 	scan = table_beginscan_parallel(heap,
-									ParallelTableScanFromGinBuildShared(ginshared));
+									ParallelTableScanFromGinBuildShared(ginshared),
+									SO_NONE);
 
 	reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
 									   ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d40878928e1..99280cd8159 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -81,11 +81,12 @@ heapam_slot_callbacks(Relation relation)
  */
 
 static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
 {
 	IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
 
 	hscan->xs_base.rel = rel;
+	hscan->xs_base.flags = flags;
 	hscan->xs_cbuf = InvalidBuffer;
 	hscan->xs_vmbuffer = InvalidBuffer;
 
@@ -763,7 +764,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 
 		tableScan = NULL;
 		heapScan = NULL;
-		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0,
+									SO_NONE);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
@@ -772,7 +774,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
 		pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
 									 PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
 
-		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+		tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL,
+									SO_NONE);
 		heapScan = (HeapScanDesc) tableScan;
 		indexScan = NULL;
 
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index 5e89b86a62c..1408989c568 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -455,7 +455,8 @@ systable_beginscan(Relation heapRelation,
 		}
 
 		sysscan->iscan = index_beginscan(heapRelation, irel,
-										 snapshot, NULL, nkeys, 0);
+										 snapshot, NULL, nkeys, 0,
+										 SO_NONE);
 		index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 		sysscan->scan = NULL;
 
@@ -716,7 +717,8 @@ systable_beginscan_ordered(Relation heapRelation,
 		bsysscan = true;
 
 	sysscan->iscan = index_beginscan(heapRelation, indexRelation,
-									 snapshot, NULL, nkeys, 0);
+									 snapshot, NULL, nkeys, 0,
+									 SO_NONE);
 	index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
 	sysscan->scan = NULL;
 
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index fbfc33159eb..44496ae0963 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -258,7 +258,8 @@ index_beginscan(Relation heapRelation,
 				Relation indexRelation,
 				Snapshot snapshot,
 				IndexScanInstrumentation *instrument,
-				int nkeys, int norderbys)
+				int nkeys, int norderbys,
+				uint32 flags)
 {
 	IndexScanDesc scan;
 
@@ -285,7 +286,7 @@ index_beginscan(Relation heapRelation,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+	scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
 
 	return scan;
 }
@@ -588,13 +589,17 @@ index_parallelrescan(IndexScanDesc scan)
 /*
  * index_beginscan_parallel - join parallel index scan
  *
+ * flags is a bitmask of ScanOptions affecting the underlying table scan. No
+ * SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must be holding suitable locks on the heap and the index.
  */
 IndexScanDesc
 index_beginscan_parallel(Relation heaprel, Relation indexrel,
 						 IndexScanInstrumentation *instrument,
 						 int nkeys, int norderbys,
-						 ParallelIndexScanDesc pscan)
+						 ParallelIndexScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
 	IndexScanDesc scan;
@@ -616,7 +621,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
 	scan->instrument = instrument;
 
 	/* prepare to fetch index matches from table */
-	scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+	scan->xs_heapfetch = table_index_fetch_begin(heaprel, flags);
 
 	return scan;
 }
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 47a9bda30c9..756dfa3dcf4 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1928,7 +1928,8 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
 	indexInfo = BuildIndexInfo(btspool->index);
 	indexInfo->ii_Concurrent = btshared->isconcurrent;
 	scan = table_beginscan_parallel(btspool->heap,
-									ParallelTableScanFromBTShared(btshared));
+									ParallelTableScanFromBTShared(btshared),
+									SO_NONE);
 	reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
 									   true, progress, _bt_build_callback,
 									   &buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..86481d7c029 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -118,7 +118,7 @@ table_beginscan_catalog(Relation relation, int nkeys, ScanKeyData *key)
 	Snapshot	snapshot = RegisterSnapshot(GetCatalogSnapshot(relid));
 
 	return table_beginscan_common(relation, snapshot, nkeys, key,
-								  NULL, flags);
+								  NULL, flags, SO_NONE);
 }
 
 
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
 }
 
 TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan,
+						 uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -176,7 +177,7 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -185,16 +186,17 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
 	}
 
 	return table_beginscan_common(relation, snapshot, 0, NULL,
-								  pscan, flags);
+								  pscan, internal_flags, flags);
 }
 
 TableScanDesc
 table_beginscan_parallel_tidrange(Relation relation,
-								  ParallelTableScanDesc pscan)
+								  ParallelTableScanDesc pscan,
+								  uint32 flags)
 {
 	Snapshot	snapshot;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 	TableScanDesc sscan;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
 	Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
 
@@ -206,7 +208,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 		/* Snapshot was serialized -- restore it */
 		snapshot = RestoreSnapshot((char *) pscan + pscan->phs_snapshot_off);
 		RegisterSnapshot(snapshot);
-		flags |= SO_TEMP_SNAPSHOT;
+		internal_flags |= SO_TEMP_SNAPSHOT;
 	}
 	else
 	{
@@ -215,7 +217,7 @@ table_beginscan_parallel_tidrange(Relation relation,
 	}
 
 	sscan = table_beginscan_common(relation, snapshot, 0, NULL,
-								   pscan, flags);
+								   pscan, internal_flags, flags);
 	return sscan;
 }
 
@@ -248,7 +250,7 @@ table_index_fetch_tuple_check(Relation rel,
 	bool		found;
 
 	slot = table_slot_create(rel, NULL);
-	scan = table_index_fetch_begin(rel);
+	scan = table_index_fetch_begin(rel, SO_NONE);
 	found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
 									all_dead);
 	table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..421d8c359f0 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,8 @@ unique_key_recheck(PG_FUNCTION_ARGS)
 	 */
 	tmptid = checktid;
 	{
-		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+		IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation,
+															SO_NONE);
 		bool		call_again = false;
 
 		if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..f0e0147c665 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1336,7 +1336,8 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
 	AttrMap    *map = NULL;
 	TupleTableSlot *root_slot = NULL;
 
-	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+	scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL,
+							   SO_NONE);
 	slot = table_slot_create(rel, NULL);
 
 	/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c69c12dc014..ec0063287d0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6411,7 +6411,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		 * checking all the constraints.
 		 */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(oldrel, snapshot, 0, NULL);
+		scan = table_beginscan(oldrel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -13980,8 +13981,8 @@ validateForeignKeyConstraint(char *conname,
 	 */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
 	slot = table_slot_create(rel, NULL);
-	scan = table_beginscan(rel, snapshot, 0, NULL);
-
+	scan = table_beginscan(rel, snapshot, 0, NULL,
+						   SO_NONE);
 	perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
 									  "validateForeignKeyConstraint",
 									  ALLOCSET_SMALL_SIZES);
@@ -22882,7 +22883,8 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
 
 		/* Scan through the rows. */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+		scan = table_beginscan(mergingPartition, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
@@ -23346,7 +23348,8 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
 
 	/* Scan through the rows. */
 	snapshot = RegisterSnapshot(GetLatestSnapshot());
-	scan = table_beginscan(splitRel, snapshot, 0, NULL);
+	scan = table_beginscan(splitRel, snapshot, 0, NULL,
+						   SO_NONE);
 
 	/*
 	 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 3dab6bb5a79..cd38e9cddf4 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3185,7 +3185,8 @@ validateDomainNotNullConstraint(Oid domainoid)
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
@@ -3266,7 +3267,8 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
 
 		/* Scan all tuples in this relation */
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
-		scan = table_beginscan(testrel, snapshot, 0, NULL);
+		scan = table_beginscan(testrel, snapshot, 0, NULL,
+							   SO_NONE);
 		slot = table_slot_create(testrel, NULL);
 		while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
 		{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 9d071e495c6..cc6eb3a6ee9 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -815,7 +815,9 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 retry:
 	conflict = false;
 	found_self = false;
-	index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+	index_scan = index_beginscan(heap, index,
+								 &DirtySnapshot, NULL, indnkeyatts, 0,
+								 SO_NONE);
 	index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
 
 	while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..fea8991cb04 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,8 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
 	skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
 	/* Start an index scan. */
-	scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   &snap, NULL, skey_attoff, 0, SO_NONE);
 
 retry:
 	found = false;
@@ -383,7 +384,8 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 	/* Start a heap scan. */
 	InitDirtySnapshot(snap);
-	scan = table_beginscan(rel, &snap, 0, NULL);
+	scan = table_beginscan(rel, &snap, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 retry:
@@ -602,7 +604,8 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+	scan = table_beginscan(rel, SnapshotAny, 0, NULL,
+						   SO_NONE);
 	scanslot = table_slot_create(rel, NULL);
 
 	table_rescan(scan, NULL);
@@ -666,7 +669,8 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
 	 * not yet committed or those just committed prior to the scan are
 	 * excluded in update_most_recent_deletion_info().
 	 */
-	scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+	scan = index_beginscan(rel, idxrel,
+						   SnapshotAny, NULL, skey_attoff, 0, SO_NONE);
 
 	index_rescan(scan, skey, skey_attoff, NULL, 0);
 
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 7cf8d23c742..69683d81527 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -148,7 +148,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 			table_beginscan_bm(node->ss.ss_currentRelation,
 							   node->ss.ps.state->es_snapshot,
 							   0,
-							   NULL);
+							   NULL,
+							   SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 9eab81fd1c8..02df40f32c5 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -95,7 +95,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   estate->es_snapshot,
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
-								   node->ioss_NumOrderByKeys);
+								   node->ioss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -794,7 +795,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -860,7 +862,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_Instrument,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 06143e94c5a..3c0b8daf664 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -113,7 +113,8 @@ IndexNext(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -209,7 +210,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   estate->es_snapshot,
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
-								   node->iss_NumOrderByKeys);
+								   node->iss_NumOrderByKeys,
+								   SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1730,7 +1732,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1794,7 +1797,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_Instrument,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
-								 piscan);
+								 piscan,
+								 SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 6b0d65f752f..cf32df33d82 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -298,7 +298,8 @@ tablesample_init(SampleScanState *scanstate)
 									 0, NULL,
 									 scanstate->use_bulkread,
 									 allow_sync,
-									 scanstate->use_pagemode);
+									 scanstate->use_pagemode,
+									 SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 8f219f60a93..09ccc65de1c 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -71,7 +71,8 @@ SeqNext(SeqScanState *node)
 		 */
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
-								   0, NULL);
+								   0, NULL,
+								   SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,7 +376,8 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -408,5 +410,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
-		table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+								 SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 617713bde04..084e4c6ec90 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -245,7 +245,8 @@ TidRangeNext(TidRangeScanState *node)
 			scandesc = table_beginscan_tidrange(node->ss.ss_currentRelation,
 												estate->es_snapshot,
 												&node->trss_mintid,
-												&node->trss_maxtid);
+												&node->trss_maxtid,
+												SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -460,7 +461,7 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -494,5 +495,5 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan);
+										  pscan, SO_NONE);
 }
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..f867d1b75a5 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,8 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 		econtext = GetPerTupleExprContext(estate);
 		snapshot = RegisterSnapshot(GetLatestSnapshot());
 		tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
-		scan = table_beginscan(part_rel, snapshot, 0, NULL);
+		scan = table_beginscan(part_rel, snapshot, 0, NULL,
+							   SO_NONE);
 
 		/*
 		 * Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 53f85ccde01..4160d2d6e24 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7178,7 +7178,8 @@ get_actual_variable_endpoint(Relation heapRel,
 
 	index_scan = index_beginscan(heapRel, indexRel,
 								 &SnapshotNonVacuumable, NULL,
-								 1, 0);
+								 1, 0,
+								 SO_NONE);
 	/* Set it up for index-only scan */
 	index_scan->xs_want_itup = true;
 	index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 1a27bf060b3..b69320a7fc8 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -158,7 +158,8 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
 									 Relation indexRelation,
 									 Snapshot snapshot,
 									 IndexScanInstrumentation *instrument,
-									 int nkeys, int norderbys);
+									 int nkeys, int norderbys,
+									 uint32 flags);
 extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
 											Snapshot snapshot,
 											IndexScanInstrumentation *instrument,
@@ -184,7 +185,8 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
 											  Relation indexrel,
 											  IndexScanInstrumentation *instrument,
 											  int nkeys, int norderbys,
-											  ParallelIndexScanDesc pscan);
+											  ParallelIndexScanDesc pscan,
+											  uint32 flags);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 									 ScanDirection direction);
 extern bool index_fetch_heap(IndexScanDesc scan, TupleTableSlot *slot);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9b403203006..e2e07348f37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,10 +95,7 @@ typedef struct HeapScanDescData
 	 */
 	ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
 
-	/*
-	 * For sequential scans and bitmap heap scans. The current heap block's
-	 * corresponding page in the visibility map.
-	 */
+	/* Current heap block's corresponding page in the visibility map */
 	Buffer		rs_vmbuffer;
 
 	/* these fields only used in page-at-a-time mode and for bitmap scans */
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index ce340c076f8..960abf6c214 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -122,6 +122,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
 typedef struct IndexFetchTableData
 {
 	Relation	rel;
+
+	/*
+	 * Bitmask of ScanOptions affecting the relation. No SO_INTERNAL_FLAGS are
+	 * permitted.
+	 */
+	uint32		flags;
 } IndexFetchTableData;
 
 struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..f8d1423b2d0 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -45,6 +45,8 @@ typedef struct ValidateIndexState ValidateIndexState;
  */
 typedef enum ScanOptions
 {
+	SO_NONE = 0,
+
 	/* one of SO_TYPE_* may be specified */
 	SO_TYPE_SEQSCAN = 1 << 0,
 	SO_TYPE_BITMAPSCAN = 1 << 1,
@@ -65,6 +67,19 @@ typedef enum ScanOptions
 	SO_TEMP_SNAPSHOT = 1 << 9,
 }			ScanOptions;
 
+/*
+ * Mask of flags that are set internally by the table scan functions and
+ * shouldn't be passed by callers. Some of these are effectively set by callers
+ * through parameters to table scan functions (e.g. SO_ALLOW_STRAT/allow_strat),
+ * however, for now, retain tight control over them and don't allow users to
+ * pass these themselves to table scan functions.
+ */
+#define SO_INTERNAL_FLAGS \
+	(SO_TYPE_SEQSCAN | SO_TYPE_BITMAPSCAN | SO_TYPE_SAMPLESCAN | \
+	 SO_TYPE_TIDSCAN | SO_TYPE_TIDRANGESCAN | SO_TYPE_ANALYZE | \
+	 SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE | \
+	 SO_TEMP_SNAPSHOT)
+
 /*
  * Result codes for table_{update,delete,lock_tuple}, and for visibility
  * routines inside table AMs.
@@ -321,8 +336,9 @@ typedef struct TableAmRoutine
 	 * `flags` is a bitmask indicating the type of scan (ScanOptions's
 	 * SO_TYPE_*, currently only one may be specified), options controlling
 	 * the scan's behaviour (ScanOptions's SO_ALLOW_*, several may be
-	 * specified, an AM may ignore unsupported ones) and whether the snapshot
-	 * needs to be deallocated at scan_end (ScanOptions's SO_TEMP_SNAPSHOT).
+	 * specified, an AM may ignore unsupported ones), whether the snapshot
+	 * needs to be deallocated at scan_end (ScanOptions's SO_TEMP_SNAPSHOT),
+	 * and any number of the other ScanOptions values.
 	 */
 	TableScanDesc (*scan_begin) (Relation rel,
 								 Snapshot snapshot,
@@ -418,9 +434,12 @@ typedef struct TableAmRoutine
 	 * IndexFetchTableData, which the AM will typically embed in a larger
 	 * structure with additional information.
 	 *
+	 * flags is a bitmask of ScanOptions affecting underlying table scan
+	 * behavior. See scan_begin() for more information on passing these.
+	 *
 	 * Tuples for an index scan can then be fetched via index_fetch_tuple.
 	 */
-	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+	struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
 
 	/*
 	 * Reset index fetch. Typically this will release cross index fetch
@@ -871,12 +890,19 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
  * A wrapper around the Table Access Method scan_begin callback, to centralize
  * error checking. All calls to ->scan_begin() should go through this
  * function.
+ *
+ * The caller-provided user_flags are validated against SO_INTERNAL_FLAGS to
+ * catch callers that accidentally pass scan-type or other internal flags.
  */
 static TableScanDesc
 table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 					   ScanKeyData *key, ParallelTableScanDesc pscan,
-					   uint32 flags)
+					   uint32 flags, uint32 user_flags)
 {
+	Assert((user_flags & SO_INTERNAL_FLAGS) == 0);
+	Assert((flags & ~SO_INTERNAL_FLAGS) == 0);
+	flags |= user_flags;
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -891,15 +917,18 @@ table_beginscan_common(Relation rel, Snapshot snapshot, int nkeys,
 /*
  * Start a scan of `rel`. Returned tuples pass a visibility test of
  * `snapshot`, and if nkeys != 0, the results are filtered by those scan keys.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan(Relation rel, Snapshot snapshot,
-				int nkeys, ScanKeyData *key)
+				int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SEQSCAN |
+	uint32		internal_flags = SO_TYPE_SEQSCAN |
 		SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -928,7 +957,8 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
 	if (allow_sync)
 		flags |= SO_ALLOW_SYNC;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -936,14 +966,17 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
  * TableScanDesc for a bitmap heap scan.  Although that scan technology is
  * really quite unlike a standard seqscan, there is just enough commonality to
  * make it worth using the same data structure.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_bm(Relation rel, Snapshot snapshot,
-				   int nkeys, ScanKeyData *key)
+				   int nkeys, ScanKeyData *key, uint32 flags)
 {
-	uint32		flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -952,23 +985,26 @@ table_beginscan_bm(Relation rel, Snapshot snapshot,
  * using the same data structure although the behavior is rather different.
  * In addition to the options offered by table_beginscan_strat, this call
  * also allows control of whether page-mode visibility checking is used.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_sampling(Relation rel, Snapshot snapshot,
 						 int nkeys, ScanKeyData *key,
 						 bool allow_strat, bool allow_sync,
-						 bool allow_pagemode)
+						 bool allow_pagemode, uint32 flags)
 {
-	uint32		flags = SO_TYPE_SAMPLESCAN;
+	uint32		internal_flags = SO_TYPE_SAMPLESCAN;
 
 	if (allow_strat)
-		flags |= SO_ALLOW_STRAT;
+		internal_flags |= SO_ALLOW_STRAT;
 	if (allow_sync)
-		flags |= SO_ALLOW_SYNC;
+		internal_flags |= SO_ALLOW_SYNC;
 	if (allow_pagemode)
-		flags |= SO_ALLOW_PAGEMODE;
+		internal_flags |= SO_ALLOW_PAGEMODE;
 
-	return table_beginscan_common(rel, snapshot, nkeys, key, NULL, flags);
+	return table_beginscan_common(rel, snapshot, nkeys, key, NULL,
+								  internal_flags, flags);
 }
 
 /*
@@ -981,7 +1017,8 @@ table_beginscan_tid(Relation rel, Snapshot snapshot)
 {
 	uint32		flags = SO_TYPE_TIDSCAN;
 
-	return table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -994,7 +1031,8 @@ table_beginscan_analyze(Relation rel)
 {
 	uint32		flags = SO_TYPE_ANALYZE;
 
-	return table_beginscan_common(rel, NULL, 0, NULL, NULL, flags);
+	return table_beginscan_common(rel, NULL, 0, NULL, NULL,
+								  flags, SO_NONE);
 }
 
 /*
@@ -1055,16 +1093,19 @@ table_scan_getnextslot(TableScanDesc sscan, ScanDirection direction, TupleTableS
 /*
  * table_beginscan_tidrange is the entry point for setting up a TableScanDesc
  * for a TID range scan.
+ *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
  */
 static inline TableScanDesc
 table_beginscan_tidrange(Relation rel, Snapshot snapshot,
 						 ItemPointer mintid,
-						 ItemPointer maxtid)
+						 ItemPointer maxtid, uint32 flags)
 {
 	TableScanDesc sscan;
-	uint32		flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
+	uint32		internal_flags = SO_TYPE_TIDRANGESCAN | SO_ALLOW_PAGEMODE;
 
-	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL, flags);
+	sscan = table_beginscan_common(rel, snapshot, 0, NULL, NULL,
+								   internal_flags, flags);
 
 	/* Set the range of TIDs to scan */
 	sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
@@ -1136,20 +1177,26 @@ extern void table_parallelscan_initialize(Relation rel,
  * table_parallelscan_initialize(), for the same relation. The initialization
  * does not need to have happened in this backend.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel(Relation relation,
-											  ParallelTableScanDesc pscan);
+											  ParallelTableScanDesc pscan,
+											  uint32 flags);
 
 /*
  * Begin a parallel tid range scan. `pscan` needs to have been initialized
  * with table_parallelscan_initialize(), for the same relation. The
  * initialization does not need to have happened in this backend.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Caller must hold a suitable lock on the relation.
  */
 extern TableScanDesc table_beginscan_parallel_tidrange(Relation relation,
-													   ParallelTableScanDesc pscan);
+													   ParallelTableScanDesc pscan,
+													   uint32 flags);
 
 /*
  * Restart a parallel scan.  Call this in the leader process.  Caller is
@@ -1172,11 +1219,15 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
  * Prepare to fetch tuples from the relation, as needed when fetching tuples
  * for an index scan.
  *
+ * flags is a bitmask of ScanOptions. No SO_INTERNAL_FLAGS are permitted.
+ *
  * Tuples for an index scan can then be fetched via table_index_fetch_tuple().
  */
 static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
 {
+	Assert((flags & SO_INTERNAL_FLAGS) == 0);
+
 	/*
 	 * We don't allow scans to be started while CheckXidAlive is set, except
 	 * via systable_beginscan() et al.  See detailed comments in xact.c where
@@ -1185,7 +1236,7 @@ table_index_fetch_begin(Relation rel)
 	if (unlikely(TransactionIdIsValid(CheckXidAlive) && !bsysscan))
 		elog(ERROR, "scan started during logical decoding");
 
-	return rel->rd_tableam->index_fetch_begin(rel);
+	return rel->rd_tableam->index_fetch_begin(rel, flags);
 }
 
 /*
-- 
2.43.0



  [text/x-patch] v48-0004-Pass-down-information-on-table-modification-to-s.patch (10.0K, 5-v48-0004-Pass-down-information-on-table-modification-to-s.patch)
  download | inline diff:
From 239ec276e5bee0f59ae0a91d0bd9eff8842c8a63 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 2 Mar 2026 16:31:33 -0500
Subject: [PATCH v48 4/6] Pass down information on table modification to scan
 node

Pass down information to sequential scan, index [only] scan, bitmap
table scan, sample scan, and TID range scan nodes on whether or not the
query modifies the relation being scanned. A later commit will use this
information to update the VM during on-access pruning only if the
relation is not modified by the query.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
 src/backend/executor/execUtils.c          | 21 +++++++++++++++++++++
 src/backend/executor/nodeBitmapHeapscan.c |  3 ++-
 src/backend/executor/nodeIndexonlyscan.c  |  9 ++++++---
 src/backend/executor/nodeIndexscan.c      | 12 ++++++++----
 src/backend/executor/nodeSamplescan.c     |  3 ++-
 src/backend/executor/nodeSeqscan.c        | 10 +++++++---
 src/backend/executor/nodeTidrangescan.c   | 11 ++++++++---
 src/include/access/tableam.h              |  3 +++
 src/include/executor/executor.h           |  2 ++
 9 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 36c5285d252..f090de49921 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -736,6 +736,27 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
 	return bms_is_member(scanrelid, estate->es_plannedstmt->resultRelationRelids);
 }
 
+/*
+ * Return true if the scan node's relation is not modified by the query.
+ *
+ * This is not perfectly accurate. INSERT ... SELECT from the same table does
+ * not add the scan relation to resultRelationRelids, so it will be reported
+ * as read-only even though the query modifies it.
+ *
+ * Conversely, when any relation in the query has a modifying row mark, all
+ * other relations get a ROW_MARK_REFERENCE, causing them to be reported as
+ * not read-only even though they may only be read.
+ */
+bool
+ScanRelIsReadOnly(ScanState *ss)
+{
+	Index		scanrelid = ((Scan *) ss->ps.plan)->scanrelid;
+	PlannedStmt *pstmt = ss->ps.state->es_plannedstmt;
+
+	return !bms_is_member(scanrelid, pstmt->resultRelationRelids) &&
+		!bms_is_member(scanrelid, pstmt->rowMarkRelids);
+}
+
 /* ----------------------------------------------------------------
  *		ExecOpenScanRelation
  *
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 69683d81527..73831aed451 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -149,7 +149,8 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
 							   node->ss.ps.state->es_snapshot,
 							   0,
 							   NULL,
-							   SO_NONE);
+							   ScanRelIsReadOnly(&node->ss) ?
+							   SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 
 	node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 02df40f32c5..de6154fd541 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -96,7 +96,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
 								   node->ioss_Instrument,
 								   node->ioss_NumScanKeys,
 								   node->ioss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->ioss_ScanDesc = scandesc;
 
@@ -796,7 +797,8 @@ ExecIndexOnlyScanInitializeDSM(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 	node->ioss_VMBuffer = InvalidBuffer;
 
@@ -863,7 +865,8 @@ ExecIndexOnlyScanInitializeWorker(IndexOnlyScanState *node,
 								 node->ioss_NumScanKeys,
 								 node->ioss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 	node->ioss_ScanDesc->xs_want_itup = true;
 
 	/*
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 3c0b8daf664..1620d146071 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -114,7 +114,8 @@ IndexNext(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -211,7 +212,8 @@ IndexNextWithReorder(IndexScanState *node)
 								   node->iss_Instrument,
 								   node->iss_NumScanKeys,
 								   node->iss_NumOrderByKeys,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 
 		node->iss_ScanDesc = scandesc;
 
@@ -1733,7 +1735,8 @@ ExecIndexScanInitializeDSM(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
@@ -1798,7 +1801,8 @@ ExecIndexScanInitializeWorker(IndexScanState *node,
 								 node->iss_NumScanKeys,
 								 node->iss_NumOrderByKeys,
 								 piscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 
 	/*
 	 * If no run-time keys to calculate or they are ready, go ahead and pass
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index cf32df33d82..f3d273e1c5e 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -299,7 +299,8 @@ tablesample_init(SampleScanState *scanstate)
 									 scanstate->use_bulkread,
 									 allow_sync,
 									 scanstate->use_pagemode,
-									 SO_NONE);
+									 ScanRelIsReadOnly(&scanstate->ss) ?
+									 SO_HINT_REL_READ_ONLY : SO_NONE);
 	}
 	else
 	{
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 09ccc65de1c..04803b0e37d 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -72,7 +72,8 @@ SeqNext(SeqScanState *node)
 		scandesc = table_beginscan(node->ss.ss_currentRelation,
 								   estate->es_snapshot,
 								   0, NULL,
-								   SO_NONE);
+								   ScanRelIsReadOnly(&node->ss) ?
+								   SO_HINT_REL_READ_ONLY : SO_NONE);
 		node->ss.ss_currentScanDesc = scandesc;
 	}
 
@@ -375,9 +376,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
 								  pscan,
 								  estate->es_snapshot);
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -411,5 +414,6 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
-								 SO_NONE);
+								 ScanRelIsReadOnly(&node->ss) ?
+								 SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/backend/executor/nodeTidrangescan.c b/src/backend/executor/nodeTidrangescan.c
index 084e4c6ec90..4a8fe91b2b3 100644
--- a/src/backend/executor/nodeTidrangescan.c
+++ b/src/backend/executor/nodeTidrangescan.c
@@ -246,7 +246,8 @@ TidRangeNext(TidRangeScanState *node)
 												estate->es_snapshot,
 												&node->trss_mintid,
 												&node->trss_maxtid,
-												SO_NONE);
+												ScanRelIsReadOnly(&node->ss) ?
+												SO_HINT_REL_READ_ONLY : SO_NONE);
 			node->ss.ss_currentScanDesc = scandesc;
 		}
 		else
@@ -461,7 +462,9 @@ ExecTidRangeScanInitializeDSM(TidRangeScanState *node, ParallelContext *pcxt)
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
 
 /* ----------------------------------------------------------------
@@ -495,5 +498,7 @@ ExecTidRangeScanInitializeWorker(TidRangeScanState *node,
 	pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
 	node->ss.ss_currentScanDesc =
 		table_beginscan_parallel_tidrange(node->ss.ss_currentRelation,
-										  pscan, SO_NONE);
+										  pscan,
+										  ScanRelIsReadOnly(&node->ss) ?
+										  SO_HINT_REL_READ_ONLY : SO_NONE);
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index f8d1423b2d0..68ddabc171a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -65,6 +65,9 @@ typedef enum ScanOptions
 
 	/* unregister snapshot at scan end? */
 	SO_TEMP_SNAPSHOT = 1 << 9,
+
+	/* set if the query doesn't modify the relation */
+	SO_HINT_REL_READ_ONLY = 1 << 10,
 }			ScanOptions;
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..7979a17e4ec 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -690,6 +690,8 @@ extern void ExecCreateScanSlotFromOuterPlan(EState *estate,
 
 extern bool ExecRelationIsTargetRelation(EState *estate, Index scanrelid);
 
+extern bool ScanRelIsReadOnly(ScanState *ss);
+
 extern Relation ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags);
 
 extern void ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos,
-- 
2.43.0



  [text/x-patch] v48-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch (9.9K, 6-v48-0005-Allow-on-access-pruning-to-set-pages-all-visible.patch)
  download | inline diff:
From e914b4834e613c59935df55a400a9290cc145b33 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 27 Feb 2026 16:33:40 -0500
Subject: [PATCH v48 5/6] Allow on-access pruning to set pages all-visible

Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.

This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.

Setting the visibility map on-access can avoid write amplification
caused by vacuum later needing to set the page all-visible, trigger a
write and potentially FPI. It also allows more frequent index-only
scans, since they require pages to be marked all-visible in the VM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c         |  3 +-
 src/backend/access/heap/heapam_handler.c |  6 ++-
 src/backend/access/heap/pruneheap.c      | 55 ++++++++++++++++++------
 src/backend/access/heap/vacuumlazy.c     |  2 +-
 src/include/access/heapam.h              |  3 +-
 5 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eb1f67f31cd..7012ee2c306 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -633,7 +633,8 @@ heap_prepare_pagescan(TableScanDesc sscan)
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_base.rs_rd, buffer, &scan->rs_vmbuffer,
+						(sscan->rs_flags & SO_HINT_REL_READ_ONLY));
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 99280cd8159..3433ea93c11 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -149,7 +149,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
 		 */
 		if (prev_buf != hscan->xs_cbuf)
 			heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
-								&hscan->xs_vmbuffer);
+								&hscan->xs_vmbuffer,
+								(hscan->xs_base.flags & SO_HINT_REL_READ_ONLY));
 	}
 
 	/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2546,7 +2547,8 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
 	/*
 	 * Prune and repair fragmentation for the whole page, if possible.
 	 */
-	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer);
+	heap_page_prune_opt(scan->rs_rd, buffer, &hscan->rs_vmbuffer,
+						scan->rs_flags & SO_HINT_REL_READ_ONLY);
 
 	/*
 	 * We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 6693af8da7f..7fcfc844d20 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -44,6 +44,8 @@ typedef struct
 	bool		mark_unused_now;
 	/* whether to attempt freezing tuples */
 	bool		attempt_freeze;
+	/* whether to attempt setting the VM */
+	bool		attempt_set_vm;
 	struct VacuumCutoffs *cutoffs;
 	Relation	relation;
 
@@ -232,7 +234,8 @@ static void page_verify_redirects(Page page);
 
 static bool heap_page_will_freeze(bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
 								  PruneState *prstate);
-static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
+static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+								  bool do_prune, bool do_freeze);
 
 
 /*
@@ -251,9 +254,21 @@ static bool heap_page_will_set_vm(PruneState *prstate, PruneReason reason);
  * reuse the pin across calls, avoiding repeated pin/unpin cycles. If we find
  * VM corruption during pruning, we will fix it. Caller is responsible for
  * unpinning *vmbuffer.
+ *
+ * rel_read_only is true if we determined at plan time that the query does not
+ * modify the relation. It is counterproductive to set the VM if the query
+ * will immediately clear it.
+ *
+ * As noted in ScanRelIsReadOnly(), INSERT ... SELECT on the same table will
+ * report the scan relation as read-only. This is usually harmless in
+ * practice. It is useful to set scanned pages all-visible that won't be
+ * inserted into. Pages we do insert to rarely meet the criteria for pruning,
+ * and those that do are likely to contain in-progress inserts which make the
+ * page not fully all-visible.
  */
 void
-heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
+					bool rel_read_only)
 {
 	Page		page = BufferGetPage(buffer);
 	TransactionId prune_xid;
@@ -336,6 +351,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
 			 * current implementation.
 			 */
 			params.options = HEAP_PAGE_PRUNE_ALLOW_FAST_PATH;
+			if (rel_read_only)
+				params.options |= HEAP_PAGE_PRUNE_SET_VM;
 
 			heap_page_prune_and_freeze(&params, &presult, &dummy_off_loc,
 									   NULL, NULL);
@@ -392,6 +409,7 @@ prune_freeze_setup(PruneFreezeParams *params,
 	/* cutoffs must be provided if we will attempt freezing */
 	Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
 	prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+	prstate->attempt_set_vm = (params->options & HEAP_PAGE_PRUNE_SET_VM) != 0;
 	prstate->cutoffs = params->cutoffs;
 	prstate->relation = params->relation;
 	prstate->block = BufferGetBlockNumber(params->buffer);
@@ -461,9 +479,8 @@ prune_freeze_setup(PruneFreezeParams *params,
 	 * We track whether the page will be all-visible/all-frozen at the end of
 	 * pruning and freezing. While examining tuple visibility, we'll set
 	 * set_all_visible to false if there are tuples on the page not visible to
-	 * all running and future transactions. set_all_visible is always
-	 * maintained but only VACUUM will set the VM if the page ends up being
-	 * all-visible.
+	 * all running and future transactions. If enabled for this scan, we will
+	 * set the VM if the page ends up being all-visible.
 	 *
 	 * We also keep track of the newest live XID, which is used to calculate
 	 * the snapshot conflict horizon for a WAL record setting the VM.
@@ -920,21 +937,35 @@ heap_page_fix_vm_corruption(PruneState *prstate, OffsetNumber offnum,
  * This function does not actually set the VM bits or page-level visibility
  * hint, PD_ALL_VISIBLE.
  *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
  * Returns true if one or both VM bits should be set and false otherwise.
  */
 static bool
-heap_page_will_set_vm(PruneState *prstate, PruneReason reason)
+heap_page_will_set_vm(PruneState *prstate, PruneReason reason,
+					  bool do_prune, bool do_freeze)
 {
-	/*
-	 * Though on-access pruning maintains prstate->set_all_visible, we don't
-	 * set the VM on-access for now.
-	 */
-	if (reason == PRUNE_ON_ACCESS)
+	if (!prstate->attempt_set_vm)
 		return false;
 
 	if (!prstate->set_all_visible)
 		return false;
 
+	/*
+	 * If this is an on-access call and we're not actually pruning, avoid
+	 * setting the visibility map if it would newly dirty the heap page or, if
+	 * the page is already dirty, if doing so would require including a
+	 * full-page image (FPI) of the heap page in the WAL.
+	 */
+	if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+		(!BufferIsDirty(prstate->buffer) || XLogCheckBufferNeedsBackup(prstate->buffer)))
+	{
+		prstate->set_all_visible = false;
+		prstate->set_all_frozen = false;
+		return false;
+	}
+
 	prstate->new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
 
 	if (prstate->set_all_frozen)
@@ -1167,7 +1198,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
 	Assert(!prstate.set_all_frozen || prstate.set_all_visible);
 	Assert(!prstate.set_all_visible || (prstate.lpdead_items == 0));
 
-	do_set_vm = heap_page_will_set_vm(&prstate, params->reason);
+	do_set_vm = heap_page_will_set_vm(&prstate, params->reason, do_prune, do_freeze);
 
 	/*
 	 * new_vmbits should be 0 regardless of whether or not the page is
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f698c2d899b..24001b27387 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2021,7 +2021,7 @@ lazy_scan_prune(LVRelState *vacrel,
 		.buffer = buf,
 		.vmbuffer = vmbuffer,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_SET_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e2e07348f37..f2a009141be 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -43,6 +43,7 @@
 #define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW		(1 << 0)
 #define HEAP_PAGE_PRUNE_FREEZE				(1 << 1)
 #define HEAP_PAGE_PRUNE_ALLOW_FAST_PATH		(1 << 2)
+#define HEAP_PAGE_PRUNE_SET_VM				(1 << 3)
 
 typedef struct BulkInsertStateData *BulkInsertState;
 typedef struct GlobalVisState GlobalVisState;
@@ -431,7 +432,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
 
 /* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer,
-								Buffer *vmbuffer);
+								Buffer *vmbuffer, bool rel_read_only);
 extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
 									   PruneFreezeResult *presult,
 									   OffsetNumber *off_loc,
-- 
2.43.0



  [text/x-patch] v48-0006-Set-pd_prune_xid-on-insert.patch (8.6K, 7-v48-0006-Set-pd_prune_xid-on-insert.patch)
  download | inline diff:
From 13f3c314d760bce33ca48ea6d1cde606b62cad4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v48 6/6] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that on-access pruning can update the visibility map (VM) during
read-only queries, set the page’s pd_prune_xid hint during INSERT and on
the new page during UPDATE.

This allows heap_page_prune_and_freeze() to set the VM the first time a
page is read after being filled with tuples. This may avoid I/O
amplification by setting the page all-visible when it is still in shared
buffers and allowing later vacuums to skip scanning the page. It also
enables index-only scans of newly inserted data much sooner.

As a side benefit, this addresses a long-standing note in heap_insert()
and heap_multi_insert(): aborted inserts can now be pruned on-access
rather than lingering until the next VACUUM.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
 src/backend/access/heap/heapam.c      | 39 +++++++++++++++++----------
 src/backend/access/heap/heapam_xlog.c | 19 ++++++++++++-
 src/backend/access/heap/pruneheap.c   | 18 ++++++-------
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7012ee2c306..3b020d910d7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2154,6 +2154,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
 	bool		all_visible_cleared = false;
 
@@ -2180,6 +2181,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 									   &vmbuffer, NULL,
 									   0);
 
+	page = BufferGetPage(buffer);
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -2203,25 +2206,30 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	if (PageIsAllVisible(page))
 	{
 		all_visible_cleared = true;
-		PageClearAllVisible(BufferGetPage(buffer));
+		PageClearAllVisible(page);
 		visibilitymap_clear(relation,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
 
 	/*
-	 * XXX Should we set PageSetPrunable on this page ?
+	 * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+	 * is full so that we can set the page all-visible in the VM on the next
+	 * page access.
 	 *
-	 * The inserting transaction may eventually abort thus making this tuple
-	 * DEAD and hence available for pruning. Though we don't want to optimize
-	 * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
-	 * aborted tuple will never be pruned until next vacuum is triggered.
+	 * Setting pd_prune_xid is also handy if the inserting transaction
+	 * eventually aborts making this tuple DEAD and hence available for
+	 * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+	 * tuple would never otherwise be pruned until next vacuum is triggered.
 	 *
-	 * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+	 * Don't set it if we are in bootstrap mode or we are inserting a frozen
+	 * tuple, as there is no further pruning/freezing needed in those cases.
 	 */
+	if (TransactionIdIsNormal(xid) && !(options & HEAP_INSERT_FROZEN))
+		PageSetPrunable(page, xid);
 
 	MarkBufferDirty(buffer);
 
@@ -2231,7 +2239,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xl_heap_insert xlrec;
 		xl_heap_header xlhdr;
 		XLogRecPtr	recptr;
-		Page		page = BufferGetPage(buffer);
 		uint8		info = XLOG_HEAP_INSERT;
 		int			bufflags = 0;
 
@@ -2596,8 +2603,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		}
 
 		/*
-		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+		 * Set pd_prune_xid. See heap_insert() for more on why we do this when
+		 * inserting tuples. This only makes sense if we aren't already
+		 * setting the page frozen in the VM and we're not in bootstrap mode.
 		 */
+		if (!all_frozen_set && TransactionIdIsNormal(xid))
+			PageSetPrunable(page, xid);
 
 		MarkBufferDirty(buffer);
 
@@ -4139,12 +4150,12 @@ l2:
 	 * the subsequent page pruning will be a no-op and the hint will be
 	 * cleared.
 	 *
-	 * XXX Should we set hint on newbuf as well?  If the transaction aborts,
-	 * there would be a prunable tuple in the newbuf; but for now we choose
-	 * not to optimize for aborts.  Note that heap_xlog_update must be kept in
-	 * sync if this decision changes.
+	 * We set the new page prunable as well. See heap_insert() for more on why
+	 * we do this when inserting tuples.
 	 */
 	PageSetPrunable(page, xid);
+	if (newbuf != buffer)
+		PageSetPrunable(newpage, xid);
 
 	if (use_hot_update)
 	{
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 1302bb13e18..f3f419d3dc1 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -450,6 +450,14 @@ heap_xlog_insert(XLogReaderState *record)
 
 		freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
 
+		/*
+		 * Set the page prunable to trigger on-access pruning later, which may
+		 * set the page all-visible in the VM. See comments in heap_insert().
+		 */
+		if (TransactionIdIsNormal(XLogRecGetXid(record)) &&
+			!HeapTupleHeaderXminFrozen(htup))
+			PageSetPrunable(page, XLogRecGetXid(record));
+
 		PageSetLSN(page, lsn);
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -599,12 +607,19 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
-		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		/*
+		 * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+		 * we are not setting the page frozen, then set the page's prunable
+		 * hint so that we trigger on-access pruning later which may set the
+		 * page all-visible in the VM.
+		 */
 		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
 		{
 			PageSetAllVisible(page);
 			PageClearPrunable(page);
 		}
+		else
+			PageSetPrunable(page, XLogRecGetXid(record));
 
 		MarkBufferDirty(buffer);
 	}
@@ -921,6 +936,8 @@ heap_xlog_update(XLogReaderState *record, bool hot_update)
 		freespace = PageGetHeapFreeSpace(npage);
 
 		PageSetLSN(npage, lsn);
+		/* See heap_insert() for why we set pd_prune_xid on insert */
+		PageSetPrunable(npage, XLogRecGetXid(record));
 		MarkBufferDirty(nbuffer);
 	}
 
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7fcfc844d20..fe9b1f16db4 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -286,7 +286,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer,
 	/*
 	 * First check whether there's any chance there's something to prune,
 	 * determining the appropriate horizon is a waste if there's no prune_xid
-	 * (i.e. no updates/deletes left potentially dead tuples around).
+	 * (i.e. no updates/deletes left potentially dead tuples around and no
+	 * inserts inserted new tuples that may be visible to all).
 	 */
 	prune_xid = PageGetPruneXid(page);
 	if (!TransactionIdIsValid(prune_xid))
@@ -1927,17 +1928,14 @@ heap_prune_record_unchanged_lp_normal(PruneState *prstate, OffsetNumber offnum)
 			prstate->set_all_visible = false;
 			prstate->set_all_frozen = false;
 
-			/* The page should not be marked all-visible */
-			if (PageIsAllVisible(page))
-				heap_page_fix_vm_corruption(prstate, offnum,
-											VM_CORRUPT_TUPLE_VISIBILITY);
-
 			/*
-			 * If we wanted to optimize for aborts, we might consider marking
-			 * the page prunable when we see INSERT_IN_PROGRESS.  But we
-			 * don't.  See related decisions about when to mark the page
-			 * prunable in heapam.c.
+			 * Though there is nothing "prunable" on the page, we maintain
+			 * pd_prune_xid for inserts so that we have the opportunity to
+			 * mark them all-visible during the next round of pruning.
 			 */
+			heap_prune_record_prunable(prstate,
+									   HeapTupleHeaderGetXmin(htup),
+									   offnum);
 			break;
 
 		case HEAPTUPLE_DELETE_IN_PROGRESS:
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-29 17:16                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-31 02:16                                                 ` David Rowley <[email protected]>
  2026-03-31 16:19                                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: David Rowley @ 2026-03-31 02:16 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Mon, 30 Mar 2026 at 06:16, Melanie Plageman
<[email protected]> wrote:
> Attached v48 does a bit more cleanup. No functional changes. I'm
> planning to push this soon. I think my remaining question is whether I
> should move the row marks and result relation bitmaps into the estate.
> I'm leaning toward not doing that and leaving them in the PlannedStmt.
> Anyway, If I want to replace the list of result relation RTIs in the
> PlannedStmt, I have to leave the bitmapset version there.

I looked at v48-0001 and it looks fine to me. I've only minor quibbles
about you using foreach() instead of foreach_int() and foreach_node()
for populating the new Bitmapsets in standard_planner().

I don't see any advantage to adding the fields to EState. There might
be if there was some performance reason, but it looks like you're only
accessing the fields when scans are initialised. It's hard to imagine
an extra pointer deference would matter there. I didn't find any
guidance in any comments to understand if there's a best practise
here, so I assume what's there today is down to people's taste. For
me, I'd say if it's not performance critical and the executor does not
modify the field for any purpose, then keeping it in PlannedStmt is
fine. If someone thinks I'm wrong on that, then a comment at the top
of EState would be helpful.

David





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-29 17:16                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-31 02:16                                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
@ 2026-03-31 16:19                                                   ` Melanie Plageman <[email protected]>
  2026-03-31 22:14                                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-31 16:19 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

Thanks for the reply! I have committed the patches in this thread and
marked the CF entry accordingly.

On Mon, Mar 30, 2026 at 10:17 PM David Rowley <[email protected]> wrote:
>
> I looked at v48-0001 and it looks fine to me. I've only minor quibbles
> about you using foreach() instead of foreach_int() and foreach_node()
> for populating the new Bitmapsets in standard_planner().

Good point. I forgot about those. Attached patch fixes that (since the
code was already committed).

- Melanie


Attachments:

  [text/x-patch] v1-0001-Use-foreach_int-foreach_node-for-resultRelationRe.patch (2.1K, 2-v1-0001-Use-foreach_int-foreach_node-for-resultRelationRe.patch)
  download | inline diff:
From cd9ba7cce756ad870a00ce82faae41b5564980b7 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 31 Mar 2026 12:06:39 -0400
Subject: [PATCH v1] Use foreach_int/foreach_node for resultRelationRelids and
 rowMarkRelids

0f4c170cf3b85e iterated through PlannerGlobal->resultRelations and
PlannerGlobal->finalrowmarks adding their RTIs to bitmapsets in the
PlannedStmt. It used the generic foreach() instead of the more recently
introduced, preferred, type-safe specialized variants: foreach_int() and
foreach_node(). Do that now.

Reported-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAApHDvq_R-gNXu%2B06GQW6w_HaEMh1pezsyiCh7GNhgh%2Bh0UqMw%40mail.gmail.com
---
 src/backend/optimizer/plan/planner.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 07944612668..2b8243635a9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -341,8 +341,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	Path	   *best_path;
 	Plan	   *top_plan;
 	ListCell   *lp,
-			   *lr,
-			   *lc;
+			   *lr;
 
 	/*
 	 * Set up global state for this planner invocation.  This data is needed
@@ -666,12 +665,12 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
 	 * Compute resultRelationRelids and rowMarkRelids from resultRelations and
 	 * rowMarks. These can be used for cheap membership checks.
 	 */
-	foreach(lc, glob->resultRelations)
+	foreach_int(rti, glob->resultRelations)
 		result->resultRelationRelids = bms_add_member(result->resultRelationRelids,
-													  lfirst_int(lc));
-	foreach(lc, glob->finalrowmarks)
+													  rti);
+	foreach_node(PlanRowMark, rowmark, glob->finalrowmarks)
 		result->rowMarkRelids = bms_add_member(result->rowMarkRelids,
-											   ((PlanRowMark *) lfirst(lc))->rti);
+											   rowmark->rti);
 
 	result->relationOids = glob->relationOids;
 	result->invalItems = glob->invalItems;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-29 17:16                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-31 02:16                                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-31 16:19                                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-31 22:14                                                     ` David Rowley <[email protected]>
  2026-03-31 22:55                                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: David Rowley @ 2026-03-31 22:14 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, 1 Apr 2026 at 05:19, Melanie Plageman <[email protected]> wrote:
>
> Thanks for the reply! I have committed the patches in this thread and
> marked the CF entry accordingly.

Yeah, realised that after sending the email.

> On Mon, Mar 30, 2026 at 10:17 PM David Rowley <[email protected]> wrote:
> >
> > I looked at v48-0001 and it looks fine to me. I've only minor quibbles
> > about you using foreach() instead of foreach_int() and foreach_node()
> > for populating the new Bitmapsets in standard_planner().
>
> Good point. I forgot about those. Attached patch fixes that (since the
> code was already committed).

Since it's in already, maybe it'd be worth doing something more
widespread after the freeze is over, changing just the ones new to
v19.

git diff 2652835d3efa003439ecc23d5fc3cf089c5952a6.. -- *.c | grep -E
"^\+\s+foreach\("

or with a bit more context:

git diff 2652835d3efa003439ecc23d5fc3cf089c5952a6.. -- *.c | grep -E
"(^\+\s+foreach\(|^---)"

The mixed node ones don't qualify, but it shouldn't be too hard to
filter those out manually.

David





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:14                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 23:10                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-27 19:17                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-29 17:16                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-31 02:16                                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
  2026-03-31 16:19                                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-31 22:14                                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) David Rowley <[email protected]>
@ 2026-03-31 22:55                                                       ` Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 34+ messages in thread

From: Melanie Plageman @ 2026-03-31 22:55 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Tue, Mar 31, 2026 at 6:14 PM David Rowley <[email protected]> wrote:
>
> Since it's in already, maybe it'd be worth doing something more
> widespread after the freeze is over, changing just the ones new to
> v19.

That makes sense. I've put it on my todo list for post-freeze.

- Melanie





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-25 23:29                                         ` Tomas Vondra <[email protected]>
  2026-03-26 14:51                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  1 sibling, 1 reply; 34+ messages in thread

From: Tomas Vondra @ 2026-03-25 23:29 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On 3/25/26 19:54, Melanie Plageman wrote:
> On Wed, Mar 25, 2026 at 2:02 PM Tomas Vondra <[email protected]> wrote:
>>
>> 0002
>>
>> - Don't we usually keep "flags" as the last parameter? It seems a bit
>> weird that it's added in between relation and snapshot.
> 
> In an earlier review, Andres said he disliked using flags as the last
> parameter for index_beginscan() because its current last two
> parameters are integers (nkeys and norderbys), which could be
> confusing. Personally, I think you have to look at the function
> signature before just randomly passing stuff, and so it shouldn't
> matter -- but I didn't care enough to argue. If you agree with me that
> they should be last, then it's two against one and I'll change it back
> :) I can keep the callsite comments naming the flags parameter.
> 

Who am I to argue with Andres? ;-) I'm kinda used to flags being the
last argument, but it's not something I'm particularly attached to.

>> - Do we really want to pass two sets of flags to table_beginscan_common?
>>  I realize it's done to ensure "users" don't use internal flags, but
>> then maybe it'd be better to do that check in the places calling the
>> _common? Someone adding a new caller can break this in various ways
>> anyway, e.g. by setting bits in the internal flags, no?
> 
> Yes, callers of table_beginscan_common() could pass flags they
> shouldn't in internal_flags. But I was mostly trying to prevent the
> case where a user picks a flag that overlaps with an internal flag,
> conditionally passes it as a user flag, and then when they test for it
> in their AM-specific code, they aren't actually checking if their own
> flag is set.
> 

Ah, so we expect people to invent their "own" flags, outside what's in
ScanOptions? Or do I misunderstand how it works? (I admit not reading
the whole massive thread, as I was only interested in using the flags in
my own patch.)

> Anyway, it's not hard to move:
>     Assert((flags & SO_INTERNAL_FLAGS) == 0);
> into the table_beginscan_common() callers and then pass the internal
> flags the caller wants to pass + the user specified flags to
> table_beginscan_common(). And I think that fixes what you are talking
> about?
> 

Right. I wouldn't say it "fixes" it, because it wasn't a bug. But it
does ensure the two sets do not "overlap", which I assume should never
happen.

>> If we want to have these checks, should we be more thorough? Should we
>> check the internal flags only set internal flags?
> 
> That's easy enough too.
> Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.
> 
> I think this would largely be the same as having
> table_beginscan_common() callers validate that the user-passed flags
> are not internal and then OR them together with the internal flags
> they want to pass to table_beginscan_common().
> 
> I'm trying to think of cases where the two approaches would differ so
> I can decide which to do.
> 

OK


-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:29                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
@ 2026-03-26 14:51                                           ` Melanie Plageman <[email protected]>
  2026-03-26 16:07                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Melanie Plageman @ 2026-03-26 14:51 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Wed, Mar 25, 2026 at 7:29 PM Tomas Vondra <[email protected]> wrote:
>
> >> - Do we really want to pass two sets of flags to table_beginscan_common?
> >>  I realize it's done to ensure "users" don't use internal flags, but
> >> then maybe it'd be better to do that check in the places calling the
> >> _common? Someone adding a new caller can break this in various ways
> >> anyway, e.g. by setting bits in the internal flags, no?
> >
> > Yes, callers of table_beginscan_common() could pass flags they
> > shouldn't in internal_flags. But I was mostly trying to prevent the
> > case where a user picks a flag that overlaps with an internal flag,
> > conditionally passes it as a user flag, and then when they test for it
> > in their AM-specific code, they aren't actually checking if their own
> > flag is set.
>
> Ah, so we expect people to invent their "own" flags, outside what's in
> ScanOptions? Or do I misunderstand how it works? (I admit not reading
> the whole massive thread, as I was only interested in using the flags in
> my own patch.)

Yes, this isn't really explored in the rest of the thread. I thought
since the flags are threaded all the way through and they can
set/check the flags in the table AM-specific layer, it would make
sense that they could choose flags for their own purposes. They don't
have to wait for consensus on getting a new SO type added. I don't
know if this is a bad idea. However, changing the table AM wrappers
seems more justifiable if we are making them extensible in this way.

> >> If we want to have these checks, should we be more thorough? Should we
> >> check the internal flags only set internal flags?
> >
> > That's easy enough too.
> > Assert((internal_flags & ~SO_INTERNAL_FLAGS) == 0); I think does the trick.

I did this in the previously posted v46.

- Melanie





^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:29                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
  2026-03-26 14:51                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-03-26 16:07                                             ` Tomas Vondra <[email protected]>
  2026-03-27 19:31                                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 34+ messages in thread

From: Tomas Vondra @ 2026-03-26 16:07 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On 3/26/26 15:51, Melanie Plageman wrote:
> On Wed, Mar 25, 2026 at 7:29 PM Tomas Vondra <[email protected]> wrote:
>>
>>>> - Do we really want to pass two sets of flags to table_beginscan_common?
>>>>  I realize it's done to ensure "users" don't use internal flags, but
>>>> then maybe it'd be better to do that check in the places calling the
>>>> _common? Someone adding a new caller can break this in various ways
>>>> anyway, e.g. by setting bits in the internal flags, no?
>>>
>>> Yes, callers of table_beginscan_common() could pass flags they
>>> shouldn't in internal_flags. But I was mostly trying to prevent the
>>> case where a user picks a flag that overlaps with an internal flag,
>>> conditionally passes it as a user flag, and then when they test for it
>>> in their AM-specific code, they aren't actually checking if their own
>>> flag is set.
>>
>> Ah, so we expect people to invent their "own" flags, outside what's in
>> ScanOptions? Or do I misunderstand how it works? (I admit not reading
>> the whole massive thread, as I was only interested in using the flags in
>> my own patch.)
> 
> Yes, this isn't really explored in the rest of the thread. I thought
> since the flags are threaded all the way through and they can
> set/check the flags in the table AM-specific layer, it would make
> sense that they could choose flags for their own purposes. They don't
> have to wait for consensus on getting a new SO type added. I don't
> know if this is a bad idea. However, changing the table AM wrappers
> seems more justifiable if we are making them extensible in this way.
> 

No idea. Do we have an example of a TAM actually needing this? If not,
I'd probably advise to remove that and keep the patch simpler. My past
attempts to future-proof a patch like this rarely worked.

If we want to give TAMs the opportunity to define custom flags, do we
already do something like that elsewhere? Is there a precedent how to do
that? If we allow the TAM to pick arbitrary flag values, it's easy to
end up with collisions later (if we add a new internal flag). Maybe
there is a way to prevent that? E.g. we could restrict internal flags to
0x0000FFFF, and custom flags to 0xFFFF0000?

regards

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 34+ messages in thread

* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-02 23:38 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-03 07:32   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-03 15:52     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-04 08:59       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-05 08:52         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 02:40           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
  2026-03-06 23:33             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-11 17:01               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-15 19:10                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-16 14:53                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-17 09:05                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
  2026-03-17 14:48                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-18 17:14                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-20 02:38                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-20 23:37                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-22 19:58                               ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-23 21:54                                 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-24 17:53                                   ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
  2026-03-24 23:44                                     ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 18:54                                       ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-25 23:29                                         ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
  2026-03-26 14:51                                           ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
  2026-03-26 16:07                                             ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Tomas Vondra <[email protected]>
@ 2026-03-27 19:31                                               ` Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 34+ messages in thread

From: Melanie Plageman @ 2026-03-27 19:31 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Andrey Borodin <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>

On Thu, Mar 26, 2026 at 12:07 PM Tomas Vondra <[email protected]> wrote:
>
> >> Ah, so we expect people to invent their "own" flags, outside what's in
> >> ScanOptions? Or do I misunderstand how it works? (I admit not reading
> >> the whole massive thread, as I was only interested in using the flags in
> >> my own patch.)
> >
> > Yes, this isn't really explored in the rest of the thread. I thought
> > since the flags are threaded all the way through and they can
> > set/check the flags in the table AM-specific layer, it would make
> > sense that they could choose flags for their own purposes. They don't
> > have to wait for consensus on getting a new SO type added. I don't
> > know if this is a bad idea. However, changing the table AM wrappers
> > seems more justifiable if we are making them extensible in this way.
> >
>
> No idea. Do we have an example of a TAM actually needing this? If not,
> I'd probably advise to remove that and keep the patch simpler. My past
> attempts to future-proof a patch like this rarely worked.

Yea, not allowing that doesn't really simplify the patch.
But, talking to Andres off-list yesterday, he reminded me that users
can simply add a new member to their table access method-specific scan
descriptor (e.g. HeapScanDescData could get a new member). The value
of flags lies in enabling table AM-agnostic executor code to pass
flags through the table AM to the scan code. Besides my read-only hint
scan option, he gave some examples -- like a hint to the scan that
there is a LIMIT on the query. I think that is compelling.

While exploring this, I realized that for a few internal flags, such
as SO_ALLOW_STRAT and SO_ALLOW_SYNC, we have table scan functions,
like table_beginscan_strat(), that accept parameters for setting those
flags. They are basically the same as table_beginscan() but give users
control over those flags. I think we can use the flags parameter to
deprecate some of these specialized table scan functions. I think we
can simplify the scan_rescan() callback as well. I don't think it
makes sense to do it this late in the 19 release, though. All of those
changes require having a flags parameter in the top level scan
wrappers first. So, I think it is reasonable to do just that this
release.

- Melanie





^ permalink  raw  reply  [nested|flat] 34+ messages in thread


end of thread, other threads:[~2026-03-31 22:55 UTC | newest]

Thread overview: 34+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-20 17:59 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-03-02 23:38 ` Melanie Plageman <[email protected]>
2026-03-03 07:32   ` Chao Li <[email protected]>
2026-03-03 15:52     ` Melanie Plageman <[email protected]>
2026-03-04 08:59       ` Chao Li <[email protected]>
2026-03-05 08:52         ` Chao Li <[email protected]>
2026-03-06 02:40           ` Chao Li <[email protected]>
2026-03-06 23:33             ` Melanie Plageman <[email protected]>
2026-03-11 17:01               ` Melanie Plageman <[email protected]>
2026-03-15 19:10                 ` Melanie Plageman <[email protected]>
2026-03-16 14:53                   ` Melanie Plageman <[email protected]>
2026-03-17 09:05                     ` Kirill Reshke <[email protected]>
2026-03-17 14:48                       ` Melanie Plageman <[email protected]>
2026-03-18 17:14                         ` Andres Freund <[email protected]>
2026-03-20 02:38                           ` Melanie Plageman <[email protected]>
2026-03-20 23:37                             ` Melanie Plageman <[email protected]>
2026-03-22 19:58                               ` Melanie Plageman <[email protected]>
2026-03-23 21:54                                 ` Melanie Plageman <[email protected]>
2026-03-24 06:53                                   ` Kirill Reshke <[email protected]>
2026-03-24 17:53                                   ` Andres Freund <[email protected]>
2026-03-24 23:44                                     ` Melanie Plageman <[email protected]>
2026-03-25 18:54                                       ` Melanie Plageman <[email protected]>
2026-03-25 23:14                                         ` Melanie Plageman <[email protected]>
2026-03-26 23:10                                           ` David Rowley <[email protected]>
2026-03-27 19:17                                             ` Melanie Plageman <[email protected]>
2026-03-29 17:16                                               ` Melanie Plageman <[email protected]>
2026-03-31 02:16                                                 ` David Rowley <[email protected]>
2026-03-31 16:19                                                   ` Melanie Plageman <[email protected]>
2026-03-31 22:14                                                     ` David Rowley <[email protected]>
2026-03-31 22:55                                                       ` Melanie Plageman <[email protected]>
2026-03-25 23:29                                         ` Tomas Vondra <[email protected]>
2026-03-26 14:51                                           ` Melanie Plageman <[email protected]>
2026-03-26 16:07                                             ` Tomas Vondra <[email protected]>
2026-03-27 19:31                                               ` Melanie Plageman <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox