public inbox for [email protected]
help / color / mirror / Atom feedFrom: Melanie Plageman <[email protected]>
To: Robert Haas <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: Wed, 17 Sep 2025 20:10:07 -0400
Message-ID: <CAAKRu_YOJ3VTKo4Z9vB2hGeTnwVWsL39gXH09vyBUQ7bGtDnKA@mail.gmail.com> (raw)
In-Reply-To: <CA+TgmobYY2URHKBMh1NHo1zF3Z28TiS_+0aSyRYyBfvauCPZzA@mail.gmail.com>
References: <[email protected]>
<CAAKRu_a-aVGxNEdkJt+96HGryQXuQNuXe+PhR0KcnUhXSOPBJw@mail.gmail.com>
<[email protected]>
<CAAKRu_ZH8kL0Zm0j7m7DC9fzk7ru7yf9rm2pEQRvx1iXX25aPQ@mail.gmail.com>
<CAAKRu_bGCgUuhmy1Mzkw3yCmbGcjNQAjV=OmjuW6hz90PuXKWA@mail.gmail.com>
<CALdSSPhAU56g1gGVT0+wG8RrSWE6qW8TOfNJS1HNAWX6wPgbFA@mail.gmail.com>
<CAAKRu_YD0ecXeAh+DmJpzQOJwcRzmMyGdcc5W_0pEF78rYSJkQ@mail.gmail.com>
<CALdSSPhu7WZd+EfQDha1nz=DC93OtY1=UFEdWwSZsASka_2eRQ@mail.gmail.com>
<CAAKRu_a2zU7672weJCGzAE2K44cCwnvsb-BwPh8ET3n1bsKfPQ@mail.gmail.com>
<CAAKRu_Yc1VKM+iuKuJzncPXCYNqQz_jUFBYXuDiPC5k9sUiiQQ@mail.gmail.com>
<tvvtfoxz5ykpsctxjbzxg3nldnzfc7geplrt2z2s54pmgto27y@hbijsndifu45>
<CAAKRu_Yz9x0sejBa5ov_LJ5sMOSKM3AeKOFUg+fQpNqyMmxwRA@mail.gmail.com>
<CAAKRu_Y=QZ5iD7zt1AHcG3_G_iMR0w6ApGPgr8FKcDn-YLFvuQ@mail.gmail.com>
<CA+TgmoasgmY7mzZutGisD2=3y7BwwPUS=oNsQoORKRg1r69fEA@mail.gmail.com>
<CAAKRu_Y7X=0UAQa5b_2Z20z5+UPBtDbjazYD9228jmj-d9NpQA@mail.gmail.com>
<CA+Tgmob05A07mtzeUGwxQKU9KZSf4BhJU9CXgcy4Pe3ZHxZrcw@mail.gmail.com>
<CAAKRu_YX0NP_yhXvPnvDRjVxxprsRBM-_MZzAJskfMydMQ=ETA@mail.gmail.com>
<CA+TgmoZef8XqRujP1NN=wJdV4SxOtu7rxRozsyAtaEvuVMZhEw@mail.gmail.com>
<CAAKRu_YxD3UC3BXxS55jPjBC_yj_vn3FVoLvBMwQuHXGDXacGg@mail.gmail.com>
<CA+TgmobYY2URHKBMh1NHo1zF3Z28TiS_+0aSyRYyBfvauCPZzA@mail.gmail.com>
On Wed, Sep 10, 2025 at 4:01 PM Robert Haas <[email protected]> wrote:
>
> On Tue, Sep 9, 2025 at 7:08 PM Melanie Plageman
> <[email protected]> wrote:
> > Fair. I've introduced new XLHP flags in attached v13. Hopefully it
> > puts an end to the horror.
>
> I suggest not renumbering all of the existing flags and just adding
> these new ones at the end. Less code churn and more likely to break in
> an obvious way if you mix up the two sets of flags.
Makes sense. In my attached v14, I have not renumbered them.
> More on 0002:
After an off-list discussion we had about how to make the patches in
the set progressively improve the code instead of just mechanically
refactoring it, I have made some big changes in the intermediate
patches in the set.
Before actually including the VM changes in the vacuum/prune WAL
records, I first include setting PD_ALL_VISIBLE with the other changes
to the heap page so that we can remove the heap page from the VM
setting WAL chain. This happens to fix the bug we discussed where if
you set an all-visible page all-frozen and checksums/wal_log_hints are
enabled, you may end up setting an LSN on a page that was not marked
dirty.
0001 is RFC but waiting on one other reviewer
0002 - 0007 is a bit of cleanup I had later in the patch set but moved
up because I think it made the intermediate patches better
0008 - 0012 removes the heap page from the XLOG_HEAP2_VISIBLE WAL
chain (it makes all callers of visibilitymap_set() set PD_ALL_VISIBLE
in the same WAL record as changes to the heap page)
0013 - 0018 finish the job eliminating XLOG_HEAP2_VISIBLE and set VM
bits in the same WAL record as the heap changes
0019 - 0024 set the VM on-access
> /*
> + * Note that the heap relation may have been dropped or truncated, leading
> + * us to skip updating the heap block due to the LSN interlock. However,
> + * even in that case, it's still safe to update the visibility map. Any
> + * WAL record that clears the visibility map bit does so before checking
> + * the page LSN, so any bits that need to be cleared will still be
> + * cleared.
> + *
> + * Note that the lock on the heap page was dropped above. In normal
> + * operation this would never be safe because a concurrent query could
> + * modify the heap page and clear PD_ALL_VISIBLE -- violating the
> + * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
> + * the VM is set.
> + *
> + * In recovery, we expect no other writers, so writing to the VM page
> + * without holding a lock on the heap page is considered safe enough. It
> + * is done this way when replaying xl_heap_visible records (see
> */
>
> How many copies of this comment do you plan to end up with?
By the end, one for copy freeze replay and one for prune/freeze/vacuum
replay. I felt two wasn't too bad and was easier than meta-explaining
what the other comment was explaining.
> > > 0004. It is not clear to me why you need to get
> > > log_heap_prune_and_freeze to do the work here. Why can't
> > > log_newpage_buffer get the job done already?
> >
> > Well, I need something to emit the changes to the VM. I'm eliminating
> > all users of xl_heap_visible. Empty pages are the ones that benefit
> > the least from switching from xl_heap_visible -> xl_heap_prune. But,
> > if I don't transition them, we have to maintain all the
> > xl_heap_visible code (including visibilitymap_set() in its long form).
> >
> > As for log_newpage_buffer(), I could keep it if you think it is too
> > confusing to change log_heap_prune_and_freeze()'s API (by passing
> > force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
> > there and then call log_heap_prune_and_freeze().
> >
> > I just thought it seemed simple to avoid emitting the new page record
> > and the VM update record, so why not -- but I don't have strong
> > feelings.
>
> Yeah, I'm not sure what the right thing to do here is. I think I was
> again experiencing brain fade by forgetting that there is a heap page
> and a VM page and, of course, log_heap_newpage() probably isn't going
> to touch the latter. So that makes sense. On the other hand, we could
> only have one type of WAL record for every single operation in the
> system if we gave it enough flags, and force_heap_fpi seems
> suspiciously like a flag that turns this into a whole different kind
> of WAL record.
I've kept log_heap_newpage() and used log_heap_prune_and_freeze() for
setting PD_ALL_VISIBLE and the VM.
> > > 0005. It looks a little curious that you delete the
> > > identify-corruption logic from the end of the if-nest and add it to
> > > the beginning. Ceteris paribus, you'd expect that to be worse, since
> > > corruption is a rare case.
> >
> > On master, the two corruption cases are sandwiched between the normal
> > VM set cases. And I actually think doing it this way is brittle. If
> > you put the cases which set the VM first, you have to have completely
> > bulletproof the if statements guarding them to foreclose any possible
> > corruption case from entering because otherwise you will overwrite the
> > corruption you then try to detect.
>
> Hmm. In the current code, we first test (!all_visible_according_to_vm
> && presult.all_visible), then (all_visible_according_to_vm &&
> !PageIsAllVisible(page) && visibilitymap_get_status(vacrel->rel,
> blkno, &vmbuffer) != 0), and then (presult.lpdead_items > 0 &&
> PageIsAllVisible(page)). The first and second can never coexist,
> because they require opposite values of all_visible_according_to_vm.
> The second and third cannot coexist because they require opposite
> values of PageIsAllVisible(page). It is not entirely obvious that the
> first and third tests couldn't both pass, but you'd have to have
> presult.all_visible and presult.lpdead_items > 0, and it's a bit hard
> to see how heap_page_prune_and_freeze() could ever allow that.
> Consider:
>
> if (prstate.all_visible && prstate.lpdead_items == 0)
> {
> presult->all_visible = prstate.all_visible;
> presult->all_frozen = prstate.all_frozen;
> }
> else
> {
> presult->all_visible = false;
> presult->all_frozen = false;
> }
> ...
> presult->lpdead_items = prstate.lpdead_items;
>
> So I don't really think I'm persuaded that the current way is brittle.
I meant brittle because it has to be so carefully coded for it to work
out this way. If you ever wanted to change or enhance it, it's quite
hard to know how to make sure all of them are entirely mutually
exclusive.
> But that having been said, I agree with you that the order of the
> checks is kind of random, and I don't think it really matters that
> much for performance. What does matter is clarity. I feel like what
> I'd ideally like this logic to do is say: do we want the VM bit for
> the page to be set to all-frozen, just all-visible, or neither? Then
> push the VM bit to the correct state, dragging the page-level bit
> along behind. And the current logic sort of does that. It's roughly:
>
> 1. Should we go from not-all-visible to either all-visible or
> all-frozen? If yes, do so.
> 2. Should we go from either all-visible or all-frozen to
> not-all-visible? If yes, do so.
> 3. Should we go from either all-visible or all-frozen to
> not-all-visible for a different reason? If yes, do so.
> 4. Should we go from all-visible to all-frozen? If yes, do so.
I don't necessarily agree that fixing corruption and setting the VM
should be together -- they feel like separate things to me. But, I
don't feel strongly enough about it to push it.
> But what's weird is that all the tests are written differently, and we
> have two different reasons for going to not-all-visible, namely
> PD_ALL_VISIBLE-not-set and dead-items-on-page, whereas there's only
> one test for each of the other state-transitions, because the
> decision-making for those cases is fully completed at an earlier
> stage. I would kind of like to see this expressed in a way that first
> decides which state transition to make (forward-to-all-frozen,
> forward-to-all-visible, backward-to-all-visible,
> backward-to-not-all-visible, nothing) and then does the corresponding
> work. What you're doing instead is splitting half of those functions
> off into a helper function while keeping the other half where they are
> without cleaning up any of the logic. Now, maybe that's OK: I'm far
> from having grokked the whole patch set. But it is not any more clear
> than what we have now, IMHO, and perhaps even a bit less so.
In terms of my patch set, I do have to change something about this
mixture of fixing corruption and setting the VM because I need to set
the VM bits in the same critical section as making the other changes
to the heap page (pruning, etc) and include the VM set changes in the
same WAL record (note that clearing the VM to fix corruption is not
WAL-logged).
What I've gone with is determining what to set the VM bits to and then
fixing the corruption at the same time. Then, later, when making the
changes to the heap page, I actually set the VM. This is kind of the
opposite of what you suggested above -- determining what to set the
bits to altogether -- corruption and non-corruption cases together. I
don't think we can do that though, because fixing the corruption is
non WAL-logged changes to the page and VM and setting the VM bits is a
WAL-logged change. And, you can't clear bits with visibilitymap_set()
(there's an assertion about that). So you have to call different
functions (not to mention emit distinct error messages). I don't know
that I've come up with the ideal solution, though.
- Melanie
Attachments:
[text/x-patch] v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patch (6.2K, 2-v14-0003-Reorder-heap_page_prune_and_freeze-parameters.patch)
download | inline diff:
From da4f0d141c8fa673a4651c42efd8bc48cd88c485 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 12:06:19 -0400
Subject: [PATCH v14 03/24] Reorder heap_page_prune_and_freeze parameters
Move read-only parameters to the beginning of the function, making it
more clear which paramters are inputs and which are input/outputs or
outputs. Also const-qualify VacuumCutoffs, which is not modified in
heap_page_prune_and_freeze().
---
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++--------------
src/backend/access/heap/vacuumlazy.c | 6 +++--
src/include/access/heapam.h | 6 ++---
3 files changed, 27 insertions(+), 25 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d8ea0c78f77..28bd6a56749 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -43,7 +43,7 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool freeze;
- struct VacuumCutoffs *cutoffs;
+ const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
* Fields describing what to do to the page
@@ -260,8 +260,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, vistest, 0,
- NULL, &presult, PRUNE_ON_ACCESS, &dummy_off_loc, NULL, NULL);
+ heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
* Report the number of tuples reclaimed to pgstats. This is
@@ -303,7 +303,17 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
+ * reason indicates why the pruning is performed. It is included in the WAL
+ * record for debugging and analysis purposes, but otherwise has no effect.
+ *
+ * options:
+ * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
+ * pruning.
+ *
+ * FREEZE indicates that we will also freeze tuples, and will return
+ * 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
@@ -313,29 +323,19 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
* that also freeze need that information.
*
- * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
- * (see heap_prune_satisfies_vacuum).
- *
- * options:
- * MARK_UNUSED_NOW indicates that dead items can be set LP_UNUSED during
- * pruning.
- *
- * FREEZE indicates that we will also freeze tuples, and will return
- * 'all_visible', 'all_frozen' flags to the caller.
- *
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
* cutoffs->OldestXmin is also used to determine if dead tuples are
* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD.
*
+ * vistest is used to distinguish whether tuples are DEAD or RECENTLY_DEAD
+ * (see heap_prune_satisfies_vacuum).
+ *
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
* heap_page_prune_and_freeze() is responsible for initializing it. Required
* by all callers.
*
- * reason indicates why the pruning is performed. It is included in the WAL
- * record for debugging and analysis purposes, but otherwise has no effect.
- *
* off_loc is the offset location required by the caller to use in error
* callback.
*
@@ -348,11 +348,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 981d9380a92..ddc9677694c 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1974,8 +1974,10 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, vacrel->vistest, prune_options,
- &vacrel->cutoffs, &presult, PRUNE_VACUUM_SCAN,
+ heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ &vacrel->cutoffs,
+ vacrel->vistest,
+ &presult,
&vacrel->offnum,
&vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..34206a6a7d5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -374,11 +374,11 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
- struct GlobalVisState *vistest,
+ PruneReason reason,
int options,
- struct VacuumCutoffs *cutoffs,
+ const struct VacuumCutoffs *cutoffs,
+ struct GlobalVisState *vistest,
PruneFreezeResult *presult,
- PruneReason reason,
OffsetNumber *off_loc,
TransactionId *new_relfrozen_xid,
MultiXactId *new_relmin_mxid);
--
2.43.0
[text/x-patch] v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patch (4.9K, 3-v14-0005-Rename-PruneState.freeze-to-attempt_freeze.patch)
download | inline diff:
From 94b4e946cd498470e9a0fac0b15299feaccfeefc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 31 Jul 2025 14:07:51 -0400
Subject: [PATCH v14 05/24] Rename PruneState.freeze to attempt_freeze
This makes it more clear that this is to indicate the caller would like
heap_page_prune_and_freeze() to consider freezing tuples -- not that we
ultimately will end up freezing them.
Also rename local variable hint_bit_fpi to did_tuple_hint_fpi. This
makes it clear it is about tuple hints and not page hints and that it
indicates something that happened and not something that could happen.
And rename local variable do_hint to do_hint_prune. This distinguishes
the prunable and page full hints used to decide whether or not to
on-access prune a page from other page-level and tuple hint bits.
---
src/backend/access/heap/pruneheap.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index ea8216e0632..740aa07cd83 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -42,7 +42,7 @@ typedef struct
/* whether or not dead items can be set LP_UNUSED during pruning */
bool mark_unused_now;
/* whether to attempt freezing tuples */
- bool freeze;
+ bool attempt_freeze;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -361,14 +361,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
HeapTupleData tup;
bool do_freeze;
bool do_prune;
- bool do_hint;
- bool hint_bit_fpi;
+ bool do_hint_prune;
+ bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
/* Copy parameters to prstate */
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
- prstate.freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -390,7 +390,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* initialize page freezing working state */
prstate.pagefrz.freeze_required = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
Assert(new_relfrozen_xid && new_relmin_mxid);
prstate.pagefrz.FreezePageRelfrozenXid = *new_relfrozen_xid;
@@ -437,7 +437,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* function, when we return the value to the caller, so that the caller
* doesn't set the VM bit incorrectly.
*/
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
@@ -551,7 +551,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* If checksums are enabled, heap_prune_satisfies_vacuum() may have caused
* an FPI to be emitted.
*/
- hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
+ did_tuple_hint_fpi = fpi_before != pgWalUsage.wal_fpi;
/*
* Process HOT chains.
@@ -659,7 +659,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* pd_prune_xid field or the page was marked full, we will update the hint
* bit.
*/
- do_hint = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
+ do_hint_prune = ((PageHeader) page)->pd_prune_xid != prstate.new_prune_xid ||
PageIsFull(page);
/*
@@ -667,7 +667,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* plans we prepared, or not.
*/
do_freeze = false;
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (prstate.pagefrz.freeze_required)
{
@@ -702,14 +702,14 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (RelationNeedsWAL(relation))
{
- if (hint_bit_fpi)
+ if (did_tuple_hint_fpi)
do_freeze = true;
else if (do_prune)
{
if (XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
}
- else if (do_hint)
+ else if (do_hint_prune)
{
if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
do_freeze = true;
@@ -752,7 +752,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
- if (do_hint)
+ if (do_hint_prune)
{
/*
* Update the page's pd_prune_xid field to either zero, or the lowest
@@ -893,7 +893,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
- if (prstate.freeze)
+ if (prstate.attempt_freeze)
{
if (presult->nfrozen > 0)
{
@@ -1475,7 +1475,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/* Consider freezing any normal tuples which will not be removed */
- if (prstate->freeze)
+ if (prstate->attempt_freeze)
{
bool totally_frozen;
--
2.43.0
[text/x-patch] v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patch (1.3K, 4-v14-0002-Correct-prune-WAL-record-opcode-mention-in-comme.patch)
download | inline diff:
From d89c39061d008ccfe306c9c39e7b74f9555a4ac2 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 14:54:42 -0400
Subject: [PATCH v14 02/24] Correct prune WAL record opcode mention in comment
f83d709760d8 incorrectly refers to a XLOG_HEAP2_PRUNE_FREEZE WAL record
opcode. No such code exists. The relevant opcodes are
XLOG_HEAP2_PRUNE_ON_ACCESS, XLOG_HEAP2_PRUNE_VACUUM_SCAN, and
XLOG_HEAP2_PRUNE_VACUUM_CLEANUP. Correct it.
---
src/backend/access/heap/pruneheap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 7ebd22f00a3..d8ea0c78f77 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -794,7 +794,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
MarkBufferDirty(buffer);
/*
- * Emit a WAL XLOG_HEAP2_PRUNE_FREEZE record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
*/
if (RelationNeedsWAL(relation))
{
@@ -2026,7 +2026,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
}
/*
- * Write an XLOG_HEAP2_PRUNE_FREEZE WAL record
+ * Write an XLOG_HEAP2_PRUNE* WAL record
*
* This is used for several different page maintenance operations:
*
--
2.43.0
[text/x-patch] v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch (5.3K, 5-v14-0004-Keep-all_frozen-updated-in-heap_page_prune_and_f.patch)
download | inline diff:
From 51729486db735989377d18bfc855d0d3d7f32114 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 23 Jul 2025 16:01:24 -0400
Subject: [PATCH v14 04/24] Keep all_frozen updated in
heap_page_prune_and_freeze
We previously relied on only using all-visible and all-frozen together
but it's best to keep them both updated.
Future commits will separate usage of these fields, so it is best not to
rely on all_visible for all_frozen's validity.
---
src/backend/access/heap/pruneheap.c | 21 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 9 ++++-----
2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 28bd6a56749..ea8216e0632 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -142,10 +142,6 @@ typedef struct
* whether to freeze the page or not. The all_visible and all_frozen
* values returned to the caller are adjusted to include LP_DEAD items at
* the end.
- *
- * all_frozen should only be considered valid if all_visible is also set;
- * we don't bother to clear the all_frozen flag every time we clear the
- * all_visible flag.
*/
bool all_visible;
bool all_frozen;
@@ -696,8 +692,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* used anymore. The opportunistic freeze heuristic must be
* improved; however, for now, try to approximate the old logic.
*/
- if (prstate.all_visible && prstate.all_frozen && prstate.nfrozen > 0)
+ if (prstate.all_frozen && prstate.nfrozen > 0)
{
+ Assert(prstate.all_visible);
+
/*
* Freezing would make the page all-frozen. Have already
* emitted an FPI or will do so anyway?
@@ -750,6 +748,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
}
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -819,7 +818,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
*/
if (do_freeze)
{
- if (prstate.all_visible && prstate.all_frozen)
+ if (prstate.all_frozen)
frz_conflict_horizon = prstate.visibility_cutoff_xid;
else
{
@@ -1382,7 +1381,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
if (!HeapTupleHeaderXminCommitted(htup))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1404,7 +1403,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
Assert(prstate->cutoffs);
if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
{
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
break;
}
@@ -1417,7 +1416,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
case HEAPTUPLE_RECENTLY_DEAD:
prstate->recently_dead_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple will soon become DEAD. Update the hint field so
@@ -1436,7 +1435,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* assumption is a bit shaky, but it is what acquire_sample_rows()
* does, so be consistent.
*/
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* If we wanted to optimize for aborts, we might consider marking
@@ -1454,7 +1453,7 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
* will commit and update the counters after we report.
*/
prstate->live_tuples++;
- prstate->all_visible = false;
+ prstate->all_visible = prstate->all_frozen = false;
/*
* This tuple may soon become DEAD. Update the hint field so that
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index ddc9677694c..50cc898087f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2003,7 +2003,6 @@ lazy_scan_prune(LVRelState *vacrel,
* agreement with heap_page_is_all_visible() using an assertion.
*/
#ifdef USE_ASSERT_CHECKING
- /* Note that all_frozen value does not matter when !all_visible */
if (presult.all_visible)
{
TransactionId debug_cutoff;
@@ -2056,6 +2055,7 @@ lazy_scan_prune(LVRelState *vacrel,
*has_lpdead_items = (presult.lpdead_items > 0);
Assert(!presult.all_visible || !(*has_lpdead_items));
+ Assert(!presult.all_frozen || presult.all_visible);
/*
* Handle setting visibility map bit based on information from the VM (as
@@ -2161,11 +2161,10 @@ lazy_scan_prune(LVRelState *vacrel,
/*
* If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen. Note that all_frozen is only valid if all_visible is
- * true, so we must check both all_visible and all_frozen.
+ * it as all-frozen.
*/
- else if (all_visible_according_to_vm && presult.all_visible &&
- presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
+ else if (all_visible_according_to_vm && presult.all_frozen &&
+ !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
{
uint8 old_vmbits;
--
2.43.0
[text/x-patch] v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch (12.1K, 6-v14-0001-Eliminate-COPY-FREEZE-use-of-XLOG_HEAP2_VISIBLE.patch)
download | inline diff:
From cacff6c95e38d370b87148bc48cf6ac5f086ed07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 17 Jun 2025 17:22:10 -0400
Subject: [PATCH v14 01/24] Eliminate COPY FREEZE use of XLOG_HEAP2_VISIBLE
Instead of emitting a separate WAL XLOG_HEAP2_VISIBLE record for setting
bits in the VM, specify the changes to make to the VM block in the
XLOG_HEAP2_MULTI_INSERT record.
This halves the number of WAL records emitted by COPY FREEZE.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 44 ++++++++++------
src/backend/access/heap/heapam_xlog.c | 54 +++++++++++++++++++-
src/backend/access/heap/visibilitymap.c | 67 ++++++++++++++++++++++++-
src/backend/access/rmgrdesc/heapdesc.c | 5 ++
src/include/access/visibilitymap.h | 2 +
5 files changed, 154 insertions(+), 18 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4c5ae205a7a..c8cd9d22726 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2466,7 +2466,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+ {
all_frozen_set = true;
+ /* Lock the vmbuffer before entering the critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+ }
/* NO EREPORT(ERROR) from here till changes are logged */
START_CRIT_SECTION();
@@ -2506,7 +2510,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
* going to add further frozen rows to it.
*
* If we're only adding already frozen rows to a previously empty
- * page, mark it as all-visible.
+ * page, mark it as all-frozen and update the visibility map. We're
+ * already holding a pin on the vmbuffer.
*/
if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
{
@@ -2517,7 +2522,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
vmbuffer, VISIBILITYMAP_VALID_BITS);
}
else if (all_frozen_set)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+ }
/*
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2565,6 +2577,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
xlrec->flags = 0;
if (all_visible_cleared)
xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+
+ /*
+ * We don't have to worry about including a conflict xid in the
+ * WAL record as HEAP_INSERT_FROZEN intentionally violates
+ * visibility rules.
+ */
if (all_frozen_set)
xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
@@ -2627,7 +2645,10 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
XLogBeginInsert();
XLogRegisterData(xlrec, tupledata - scratch.data);
+
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
+ if (all_frozen_set)
+ XLogRegisterBuffer(1, vmbuffer, 0);
XLogRegisterBufData(0, tupledata, totaldatalen);
@@ -2637,26 +2658,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
recptr = XLogInsert(RM_HEAP2_ID, info);
PageSetLSN(page, recptr);
+ if (all_frozen_set)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
}
END_CRIT_SECTION();
- /*
- * If we've frozen everything on the page, update the visibilitymap.
- * We're already holding pin on the vmbuffer.
- */
if (all_frozen_set)
- {
- /*
- * It's fine to use InvalidTransactionId here - this is only used
- * when HEAP_INSERT_FROZEN is specified, which intentionally
- * violates visibility rules.
- */
- visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
- InvalidXLogRecPtr, vmbuffer,
- InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
- }
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
UnlockReleaseBuffer(buffer);
ndone += nthispage;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index cf843277938..faa7c561a8a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -551,6 +551,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
int i;
bool isinit = (XLogRecGetInfo(record) & XLOG_HEAP_INIT_PAGE) != 0;
XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
/*
* Insertion doesn't overwrite MVCC data, so no conflict processing is
@@ -571,11 +572,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
{
Relation reln = CreateFakeRelcacheEntry(rlocator);
- Buffer vmbuffer = InvalidBuffer;
visibilitymap_pin(reln, blkno, &vmbuffer);
visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
ReleaseBuffer(vmbuffer);
+ vmbuffer = InvalidBuffer;
FreeFakeRelcacheEntry(reln);
}
@@ -662,6 +663,57 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);
+ buffer = InvalidBuffer;
+
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
+ XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ visibilitymap_set_vmbits(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
+ /*
+ * It is not possible that the VM was already set for this heap page,
+ * so the vmbuffer must have been modified and marked dirty.
+ */
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
+ FreeFakeRelcacheEntry(reln);
+ }
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
/*
* If the page is running low on free space, update the FSM as well.
* Arbitrarily, our definition of "low" is less than 20%. We can't do much
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 7306c16f05c..aa48a436108 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,7 +14,8 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set a bit in a previously pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page and log
+ * visibilitymap_set_vmbits - set bit(s) in a pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -321,6 +322,70 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
return status;
}
+/*
+ * Set flags in the VM block contained in the passed in vmBuf.
+ *
+ * This function is for callers which include the VM changes in the same WAL
+ * record as the modifications of the heap page which rendered it all-visible.
+ * Callers separately logging the VM changes should invoke visibilitymap_set()
+ * instead.
+ *
+ * Caller must have pinned and exclusive locked the correct block of the VM in
+ * vmBuf. This block should contain the VM bits for the given heapBlk.
+ *
+ * During normal operation (i.e. not recovery), this should be called in a
+ * critical section which also makes any necessary changes to the heap page
+ * and, if relevant, emits WAL.
+ *
+ * Caller is responsible for WAL logging the changes to the VM buffer and for
+ * making any changes needed to the associated heap page. This includes
+ * maintaining any invariants such as ensuring the buffer containing heapBlk
+ * is pinned and exclusive locked.
+ */
+uint8
+visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
+{
+ BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
+ uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
+ uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
+ Page page;
+ uint8 *map;
+ uint8 status;
+
+#ifdef TRACE_VISIBILITYMAP
+ elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
+ flags, RelationGetRelationName(rel), heapBlk);
+#endif
+
+ /* Call in same critical section where WAL is emitted. */
+ Assert(InRecovery || CritSectionCount > 0);
+
+ /* Flags should be valid. Also never clear bits with this function */
+ Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
+
+ /* Must never set all_frozen bit without also setting all_visible bit */
+ Assert(flags != VISIBILITYMAP_ALL_FROZEN);
+
+ /* Check that we have the right VM page pinned */
+ if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
+ elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
+
+ Assert(BufferIsExclusiveLocked(vmBuf));
+
+ page = BufferGetPage(vmBuf);
+ map = (uint8 *) PageGetContents(page);
+
+ status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
+ if (flags != status)
+ {
+ map[mapByte] |= (flags << mapOffset);
+ MarkBufferDirty(vmBuf);
+ }
+
+ return status;
+}
+
/*
* visibilitymap_get_status - get status of bits
*
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 82b62c95de5..b48d7dc1d24 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -16,6 +16,7 @@
#include "access/heapam_xlog.h"
#include "access/rmgrdesc_utils.h"
+#include "access/visibilitymapdefs.h"
#include "storage/standbydefs.h"
/*
@@ -354,6 +355,10 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, "ntuples: %d, flags: 0x%02X", xlrec->ntuples,
xlrec->flags);
+ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+ appendStringInfo(buf, ", vm_flags: 0x%02X",
+ VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+
if (XLogRecHasBlockData(record, 0) && !isinit)
{
appendStringInfoString(buf, ", offsets:");
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index be21c6dd1a3..fc7056a91ea 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -37,6 +37,8 @@ extern uint8 visibilitymap_set(Relation rel,
Buffer vmBuf,
TransactionId cutoff_xid,
uint8 flags);
+extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
[text/x-patch] v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patch (7.3K, 7-v14-0007-Update-PruneState.all_-visible-frozen-sooner-in-.patch)
download | inline diff:
From de93f7eaffb009436cae2f80571ba0148f99db7a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:25:44 -0400
Subject: [PATCH v14 07/24] Update PruneState.all_[visible|frozen] sooner in
pruning
We don't clear PruneState.all_visible and all_frozen during pruning when
we see LP_DEAD items because we want to still opportunistically freeze a
page if it would become frozen after vacuum's third phase.
Currently, this is fine because heap_page_prune_and_freeze() doesn't set
PD_ALL_VISIBLE or set bits in the VM. If we want to do that in the
future, we need all_visible and all_frozen to be accurate earlier in
heap_page_prune_and_freeze(). To do this, we must also move up
determination of the freeze conflict horizon. We use the visibility
cutoff xid even if the whole page won't be frozen until after vacuum's
third phase.
---
src/backend/access/heap/pruneheap.c | 95 ++++++++++++++---------------
1 file changed, 45 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 4ed74de6f27..5e536bd0d4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -296,7 +296,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* pre-freeze checks.
*
* do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
- * been decided before calling this function.
+ * been decided before calling this function. *frz_conflict_horizon is set to
+ * the snapshot conflict horizon we for the WAL record should we decide to freeze
+ * tuples.
*
* prstate is an input/output parameter.
*
@@ -308,7 +310,8 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi,
bool do_prune,
bool do_hint_prune,
- PruneState *prstate)
+ PruneState *prstate,
+ TransactionId *frz_conflict_horizon)
{
bool do_freeze = false;
@@ -378,6 +381,22 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* critical section.
*/
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+
+ /*
+ * Calculate what the snapshot conflict horizon should be for a record
+ * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
+ * for conflicts when the whole page is eligible to become all-frozen
+ * in the VM once we're done with it. Otherwise we generate a
+ * conservative cutoff by stepping back from OldestXmin.
+ */
+ if (prstate->all_frozen)
+ *frz_conflict_horizon = prstate->visibility_cutoff_xid;
+ else
+ {
+ /* Avoids false conflicts when hot_standby_feedback in use */
+ *frz_conflict_horizon = prstate->cutoffs->OldestXmin;
+ TransactionIdRetreat(*frz_conflict_horizon);
+ }
}
else if (prstate->nfrozen > 0)
{
@@ -478,6 +497,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_hint_prune;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId frz_conflict_horizon = InvalidTransactionId;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -546,10 +566,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* are tuples present that are not visible to everyone or if there are
* dead tuples which are not yet removable. However, dead tuples which
* will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not clear
- * all_visible when we see LP_DEAD items. We fix that at the end of the
- * function, when we return the value to the caller, so that the caller
- * doesn't set the VM bit incorrectly.
+ * opportunistically freezing. Because of that, we do not immediately
+ * clear all_visible when we see LP_DEAD items. We fix that after
+ * scanning the line pointers, before we return the value to the caller,
+ * so that the caller doesn't set the VM bit incorrectly.
*/
if (prstate.attempt_freeze)
{
@@ -784,8 +804,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
did_tuple_hint_fpi,
do_prune,
do_hint_prune,
- &prstate);
+ &prstate,
+ &frz_conflict_horizon);
+ /*
+ * While scanning the line pointers, we did not clear
+ * all_visible/all_frozen when encountering LP_DEAD items because we
+ * wanted the decision whether or not to freeze the page to be unaffected
+ * by the short-term presence of LP_DEAD items. These LP_DEAD items are
+ * effectively assumed to be LP_UNUSED items in the making. It doesn't
+ * matter which vacuum heap pass (initial pass or final pass) ends up
+ * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ *
+ * Now that we finished determining whether or not to freeze the page,
+ * update all_visible and all_frozen so that they reflect the true state
+ * of the page for setting PD_ALL_VISIBLE and VM bits.
+ */
+ if (prstate.lpdead_items > 0)
+ prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
@@ -846,27 +882,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId frz_conflict_horizon = InvalidTransactionId;
TransactionId conflict_xid;
- /*
- * We can use the visibility_cutoff_xid as our cutoff for
- * conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise we generate a
- * conservative cutoff by stepping back from OldestXmin.
- */
- if (do_freeze)
- {
- if (prstate.all_frozen)
- frz_conflict_horizon = prstate.visibility_cutoff_xid;
- else
- {
- /* Avoids false conflicts when hot_standby_feedback in use */
- frz_conflict_horizon = prstate.cutoffs->OldestXmin;
- TransactionIdRetreat(frz_conflict_horizon);
- }
- }
-
if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
conflict_xid = frz_conflict_horizon;
else
@@ -890,30 +907,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
-
- /*
- * It was convenient to ignore LP_DEAD items in all_visible earlier on to
- * make the choice of whether or not to freeze the page unaffected by the
- * short-term presence of LP_DEAD items. These LP_DEAD items were
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
- *
- * Now that freezing has been finalized, unset all_visible if there are
- * any LP_DEAD items on the page. It needs to reflect the present state
- * of the page, as expected by our caller.
- */
- if (prstate.all_visible && prstate.lpdead_items == 0)
- {
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
- }
- else
- {
- presult->all_visible = false;
- presult->all_frozen = false;
- }
-
+ presult->all_visible = prstate.all_visible;
+ presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
/*
--
2.43.0
[text/x-patch] v14-0006-Add-helper-for-freeze-determination-to-heap_page.patch (7.0K, 8-v14-0006-Add-helper-for-freeze-determination-to-heap_page.patch)
download | inline diff:
From aee92ee8a07beade81a82200fbbfe605d499ac4c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 14:22:10 -0400
Subject: [PATCH v14 06/24] Add helper for freeze determination to
heap_page_prune_and_freeze
After scanning through the line pointers on the heap page during
vacuum's first phase, we use several statuses and information we
collected to determine whether or not we will use the freeze plans we
assembled.
Do this in a helper for better readability.
---
src/backend/access/heap/pruneheap.c | 199 +++++++++++++++++-----------
1 file changed, 119 insertions(+), 80 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 740aa07cd83..4ed74de6f27 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -289,6 +289,120 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
}
}
+/*
+ * Decide if we want to go ahead with freezing according to the freeze plans
+ * we prepared for the given heap buffer or not. If the caller specified we
+ * should not freeze tuples, it exits early. Otherwise, it does a few
+ * pre-freeze checks.
+ *
+ * do_prune, do_hint_full_or_prunable, and did_tuple_hint_fpi must all have
+ * been decided before calling this function.
+ *
+ * prstate is an input/output parameter.
+ *
+ * Returns true if we should use our freeze plans and freeze tuples on the page
+ * and false otherwise.
+ */
+static bool
+heap_page_will_freeze(Relation relation, Buffer buffer,
+ bool did_tuple_hint_fpi,
+ bool do_prune,
+ bool do_hint_prune,
+ PruneState *prstate)
+{
+ bool do_freeze = false;
+
+ /*
+ * If the caller specified we should not attempt to freeze any tuples,
+ * validate that everything is in the right state and exit.
+ */
+ if (!prstate->attempt_freeze)
+ {
+ Assert(!prstate->all_frozen && prstate->nfrozen == 0);
+ Assert(prstate->lpdead_items == 0 || !prstate->all_visible);
+ return false;
+ }
+
+ if (prstate->pagefrz.freeze_required)
+ {
+ /*
+ * heap_prepare_freeze_tuple indicated that at least one XID/MXID from
+ * before FreezeLimit/MultiXactCutoff is present. Must freeze to
+ * advance relfrozenxid/relminmxid.
+ */
+ do_freeze = true;
+ }
+ else
+ {
+ /*
+ * Opportunistically freeze the page if we are generating an FPI
+ * anyway and if doing so means that we can set the page all-frozen
+ * afterwards (might not happen until VACUUM's final heap pass).
+ *
+ * XXX: Previously, we knew if pruning emitted an FPI by checking
+ * pgWalUsage.wal_fpi before and after pruning. Once the freeze and
+ * prune records were combined, this heuristic couldn't be used
+ * anymore. The opportunistic freeze heuristic must be improved;
+ * however, for now, try to approximate the old logic.
+ */
+ if (prstate->all_frozen && prstate->nfrozen > 0)
+ {
+ Assert(prstate->all_visible);
+
+ /*
+ * Freezing would make the page all-frozen. Have already emitted
+ * an FPI or will do so anyway?
+ */
+ if (RelationNeedsWAL(relation))
+ {
+ if (did_tuple_hint_fpi)
+ do_freeze = true;
+ else if (do_prune)
+ {
+ if (XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ else if (do_hint_prune)
+ {
+ if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
+ do_freeze = true;
+ }
+ }
+ }
+ }
+
+ if (do_freeze)
+ {
+ /*
+ * Validate the tuples we will be freezing before entering the
+ * critical section.
+ */
+ heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
+ }
+ else if (prstate->nfrozen > 0)
+ {
+ /*
+ * The page contained some tuples that were not already frozen, and we
+ * chose not to freeze them now. The page won't be all-frozen then.
+ */
+ Assert(!prstate->pagefrz.freeze_required);
+
+ prstate->all_frozen = false;
+ prstate->nfrozen = 0; /* avoid miscounts in instrumentation */
+ }
+ else
+ {
+ /*
+ * We have no freeze plans to execute. The page might already be
+ * all-frozen (perhaps only following pruning), though. Such pages
+ * can be marked all-frozen in the VM by our caller, even though none
+ * of its tuples were newly frozen here.
+ */
+ }
+
+ return do_freeze;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -666,87 +780,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Decide if we want to go ahead with freezing according to the freeze
* plans we prepared, or not.
*/
- do_freeze = false;
- if (prstate.attempt_freeze)
- {
- if (prstate.pagefrz.freeze_required)
- {
- /*
- * heap_prepare_freeze_tuple indicated that at least one XID/MXID
- * from before FreezeLimit/MultiXactCutoff is present. Must
- * freeze to advance relfrozenxid/relminmxid.
- */
- do_freeze = true;
- }
- else
- {
- /*
- * Opportunistically freeze the page if we are generating an FPI
- * anyway and if doing so means that we can set the page
- * all-frozen afterwards (might not happen until VACUUM's final
- * heap pass).
- *
- * XXX: Previously, we knew if pruning emitted an FPI by checking
- * pgWalUsage.wal_fpi before and after pruning. Once the freeze
- * and prune records were combined, this heuristic couldn't be
- * used anymore. The opportunistic freeze heuristic must be
- * improved; however, for now, try to approximate the old logic.
- */
- if (prstate.all_frozen && prstate.nfrozen > 0)
- {
- Assert(prstate.all_visible);
+ do_freeze = heap_page_will_freeze(relation, buffer,
+ did_tuple_hint_fpi,
+ do_prune,
+ do_hint_prune,
+ &prstate);
- /*
- * Freezing would make the page all-frozen. Have already
- * emitted an FPI or will do so anyway?
- */
- if (RelationNeedsWAL(relation))
- {
- if (did_tuple_hint_fpi)
- do_freeze = true;
- else if (do_prune)
- {
- if (XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- else if (do_hint_prune)
- {
- if (XLogHintBitIsNeeded() && XLogCheckBufferNeedsBackup(buffer))
- do_freeze = true;
- }
- }
- }
- }
- }
-
- if (do_freeze)
- {
- /*
- * Validate the tuples we will be freezing before entering the
- * critical section.
- */
- heap_pre_freeze_checks(buffer, prstate.frozen, prstate.nfrozen);
- }
- else if (prstate.nfrozen > 0)
- {
- /*
- * The page contained some tuples that were not already frozen, and we
- * chose not to freeze them now. The page won't be all-frozen then.
- */
- Assert(!prstate.pagefrz.freeze_required);
-
- prstate.all_frozen = false;
- prstate.nfrozen = 0; /* avoid miscounts in instrumentation */
- }
- else
- {
- /*
- * We have no freeze plans to execute. The page might already be
- * all-frozen (perhaps only following pruning), though. Such pages
- * can be marked all-frozen in the VM by our caller, even though none
- * of its tuples were newly frozen here.
- */
- }
Assert(!prstate.all_frozen || prstate.all_visible);
/* Any error while applying the changes is critical */
--
2.43.0
[text/x-patch] v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch (16.1K, 9-v14-0008-Set-PD_ALL_VISIBLE-in-heap_page_prune_and_freeze.patch)
download | inline diff:
From 7ae7f9d9f1c05cf66d7fee964db801cbcf52a324 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 16:32:35 -0400
Subject: [PATCH v14 08/24] Set PD_ALL_VISIBLE in heap_page_prune_and_freeze
After phase I of vacuum, if the heap page was rendered all-visible, we
can set it as such in the VM. We also must set the page-level
PD_ALL_VISIBLE bit. By setting PD_ALL_VISIBLE while making the other
changes to the heap page instead of while updating the VM, we can omit
the heap page from the WAL chain during the VM update. The result is
that xl_heap_prune records include updates to PD_ALL_VISIBLE.
This commit doesn't yet remove the heap page from the WAL chain because
it does not change other users of visibilitymap_set().
Note that this is carefully coded such that if the only modification to
the page during heap_page_prune_and_freeze() is setting PD_ALL_VISIBLE
and checksums/wal_log_hints are disabled we will never emit a full page
image of the heap page.
This also fixes a longstanding issue where, when checksums/wal_log_hints
are enabled, an all-visible page being set all-frozen may not mark the
buffer dirty before visibilitymap_set() stamps it with the
xl_heap_visible LSN.
It is noteworthy that the checks for page corruption and an inconsistent
state between the heap page and the VM in lazy_scan_prune() now happen
after having set PD_ALL_VISIBLE. That is not a functional change because
the corruption cases are mutually exclusive with cases where we would
set PD_ALL_VISIBLE.
---
src/backend/access/heap/heapam_xlog.c | 63 +++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 72 ++++++++++++++++++++++++---
src/backend/access/heap/vacuumlazy.c | 29 +----------
src/include/access/heapam.h | 2 +
src/include/access/heapam_xlog.h | 2 +
5 files changed, 125 insertions(+), 43 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index faa7c561a8a..a54238f2b59 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -90,6 +90,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
xlhp_freeze_plan *plans;
OffsetNumber *frz_offsets;
char *dataptr = XLogRecGetBlockData(record, 0, &datalen);
+ bool do_prune;
heap_xlog_deserialize_prune_and_freeze(dataptr, xlrec.flags,
&nplans, &plans, &frz_offsets,
@@ -97,11 +98,13 @@ heap_xlog_prune_freeze(XLogReaderState *record)
&ndead, &nowdead,
&nunused, &nowunused);
+ do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
*/
- if (nredirected > 0 || ndead > 0 || nunused > 0)
+ if (do_prune)
heap_page_prune_execute(buffer,
(xlrec.flags & XLHP_CLEANUP_LOCK) == 0,
redirected, nredirected,
@@ -138,17 +141,52 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/* There should be no more data */
Assert((char *) frz_offsets == dataptr + datalen);
+ /*
+ * The critical integrity requirement here is that we must never end
+ * up with a situation where the visibility map bit is set, and the
+ * page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
+ * then a subsequent page modification would fail to clear the
+ * visibility map bit.
+ */
+ if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ PageSetAllVisible(page);
+
/*
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
*/
-
- PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
+
+ /*
+ * We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
+ * careful not to emit a full page image unless
+ * checksums/wal_log_hints are enabled. We only set the heap page LSN
+ * if full page images were an option when emitting WAL. Otherwise,
+ * subsequent modifications of the page may incorrectly skip emitting
+ * a full page image.
+ */
+ if (do_prune || nplans > 0 ||
+ (xlrec.flags & XLHP_SET_PD_ALL_VIS && XLogHintBitIsNeeded()))
+ PageSetLSN(page, lsn);
}
/*
- * If we released any space or line pointers, update the free space map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE update
+ * the freespace map.
+ *
+ * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
+ * space), we'll still update the FSM for this page. Since the FSM is not
+ * WAL-logged and only updated heuristically, it easily becomes stale in
+ * standbys. If the standby is later promoted and runs VACUUM, it will
+ * skip updating individual free space figures for pages that became
+ * all-visible (or all-frozen, depending on the vacuum mode,) which is
+ * troublesome when FreeSpaceMapVacuum propagates too optimistic free
+ * space values to upper FSM layers; later inserters try to use such pages
+ * only to find out that they are unusable. This can cause long stalls
+ * when there are many such pages.
+ *
+ * Forestall those problems by updating FSM's idea about a page that is
+ * becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
@@ -157,10 +195,16 @@ heap_xlog_prune_freeze(XLogReaderState *record)
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
- XLHP_HAS_NOW_UNUSED_ITEMS))
+ XLHP_HAS_NOW_UNUSED_ITEMS |
+ XLHP_SET_PD_ALL_VIS))
{
Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ /*
+ * We want to avoid holding an exclusive lock on the heap buffer
+ * while doing IO, so we'll release the lock on the heap buffer
+ * first.
+ */
UnlockReleaseBuffer(buffer);
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
@@ -173,10 +217,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
/*
* Replay XLOG_HEAP2_VISIBLE records.
*
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
+ * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
+ * the heap page. We must never end up with a situation where the visibility
+ * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
+ * were to occur, then a subsequent page modification would fail to clear the
+ * visibility map bit.
*/
static void
heap_xlog_visible(XLogReaderState *record)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 5e536bd0d4d..9b25131543b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -495,6 +495,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
@@ -824,6 +825,22 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+
+ /*
+ * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
+ * allowed for the page-level bit to be set and the VM to be clear.
+ * Setting PD_ALL_VISIBLE when we are making the changes to the page that
+ * render it all-visible allows us to omit the heap page from the WAL
+ * chain when later updating the VM -- even when checksums/wal_log_hints
+ * are enabled.
+ */
+ do_set_pd_vis = false;
+ if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ if (prstate.all_visible && !PageIsAllVisible(page))
+ do_set_pd_vis = true;
+ }
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -844,14 +861,17 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_pd_vis)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_pd_vis)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -865,6 +885,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ if (do_set_pd_vis)
+ PageSetAllVisible(page);
+
MarkBufferDirty(buffer);
/*
@@ -891,7 +914,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
log_heap_prune_and_freeze(relation, buffer,
conflict_xid,
- true, reason,
+ true,
+ do_set_pd_vis,
+ reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -2078,6 +2103,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
+ * the page LSN when checksums/wal_log_hints are enabled even if we did not
+ * prune or freeze tuples on the page.
+ *
* Note: This function scribbles on the 'frozen' array.
*
* Note: This is called in a critical section, so careful what you do here.
@@ -2086,6 +2115,7 @@ void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
@@ -2095,6 +2125,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xl_heap_prune xlrec;
XLogRecPtr recptr;
uint8 info;
+ uint8 regbuf_flags;
/* The following local variables hold data registered in the WAL record: */
xlhp_freeze_plan plans[MaxHeapTuplesPerPage];
@@ -2103,8 +2134,21 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlhp_prune_items dead_items;
xlhp_prune_items unused_items;
OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+ bool do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
xlrec.flags = 0;
+ regbuf_flags = REGBUF_STANDARD;
+
+ /*
+ * We can avoid an FPI if the only modification we are making to the heap
+ * page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
+ * Note that if we explicitly skip an FPI, we must not set the heap page
+ * LSN later.
+ */
+ if (!do_prune &&
+ nfrozen == 0 &&
+ (!set_pd_all_vis || !XLogHintBitIsNeeded()))
+ regbuf_flags |= REGBUF_NO_IMAGE;
/*
* Prepare data for the buffer. The arrays are not actually in the
@@ -2112,7 +2156,7 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* page image, the arrays can be omitted.
*/
XLogBeginInsert();
- XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+ XLogRegisterBuffer(0, buffer, regbuf_flags);
if (nfrozen > 0)
{
int nplans;
@@ -2169,6 +2213,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (set_pd_all_vis)
+ xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
xlrec.flags |= XLHP_IS_CATALOG_REL;
if (TransactionIdIsValid(conflict_xid))
@@ -2201,5 +2247,17 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
- PageSetLSN(BufferGetPage(buffer), recptr);
+ /*
+ * We must bump the page LSN if pruning or freezing. If we are only
+ * updating PD_ALL_VISIBLE, though, we can skip doing this unless
+ * wal_log_hints/checksums are enabled. Torn pages are possible if we
+ * update PD_ALL_VISIBLE without bumping the LSN, but this is deemed okay
+ * for page hint updates.
+ */
+ if (do_prune || nfrozen > 0 ||
+ (set_pd_all_vis && XLogHintBitIsNeeded()))
+ {
+ Assert(BufferIsDirty(buffer));
+ PageSetLSN(BufferGetPage(buffer), recptr);
+ }
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 50cc898087f..308abff16ca 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1970,7 +1970,7 @@ lazy_scan_prune(LVRelState *vacrel,
* tuples. Pruning will have determined whether or not the page is
* all-visible.
*/
- prune_options = HEAP_PAGE_PRUNE_FREEZE;
+ prune_options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VIS;
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
@@ -2073,21 +2073,6 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2168,17 +2153,6 @@ lazy_scan_prune(LVRelState *vacrel,
{
uint8 old_vmbits;
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
/*
* Set the page all-frozen (and all-visible) in the VM.
*
@@ -2891,6 +2865,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
+ false,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 34206a6a7d5..2f77d8dbcd6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VIS (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
struct TupleTableSlot;
@@ -390,6 +391,7 @@ extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
TransactionId conflict_xid,
bool cleanup_lock,
+ bool set_pd_all_vis,
PruneReason reason,
HeapTupleFreeze *frozen, int nfrozen,
OffsetNumber *redirected, int nredirected,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index d4c0625b632..7d3fb75dda7 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -294,6 +294,8 @@ typedef struct xl_heap_prune
#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define XLHP_SET_PD_ALL_VIS (1 << 0)
+
/* to handle recovery conflict during logical decoding on standby */
#define XLHP_IS_CATALOG_REL (1 << 1)
--
2.43.0
[text/x-patch] v14-0009-Combine-vacuum-phase-I-VM-update-cases.patch (4.4K, 10-v14-0009-Combine-vacuum-phase-I-VM-update-cases.patch)
download | inline diff:
From a88a7f88097755d430d030753c4080aa4092ef7b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 15 Sep 2025 17:48:38 -0400
Subject: [PATCH v14 09/24] Combine vacuum phase I VM update cases
We update the VM after phase I of vacuum -- either setting both the VM
bits when all bits are currently unset or setting just the frozen bit
when the all-visible bit is already set.
Those two cases shared much of the same code -- leading to unnecessary
duplication. This commit combines them, which is simpler and easier to
understand.
---
src/backend/access/heap/vacuumlazy.c | 68 ++++++++--------------------
1 file changed, 18 insertions(+), 50 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 308abff16ca..5a6bbbd97f2 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2058,15 +2058,22 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_frozen || presult.all_visible);
/*
- * Handle setting visibility map bit based on information from the VM (as
+ * Handle setting visibility map bits based on information from the VM (as
* of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
+ * all_frozen variables.
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
+ /*
+ * If the page is all-frozen, we can pass InvalidTransactionId as our
+ * cutoff_xid, since a snapshotConflictHorizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were
+ * frozen.
+ */
if (presult.all_frozen)
{
Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
@@ -2079,6 +2086,12 @@ lazy_scan_prune(LVRelState *vacrel,
flags);
/*
+ * Even if we are only setting the all-frozen bit, there is a small
+ * chance that the VM was modified sometime between setting
+ * all_visible_according_to_vm and checking the visibility during
+ * pruning. Check the return value of old_vmbits to ensure the
+ * visibility map counters used for logging are accurate.
+ *
* If the page wasn't already set all-visible and/or all-frozen in the
* VM, count it as newly set for logging.
*/
@@ -2100,6 +2113,8 @@ lazy_scan_prune(LVRelState *vacrel,
}
/*
+ * Now handle two potential corruption cases:
+ *
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
* page-level bit is clear. However, it's possible that the bit got
* cleared after heap_vac_scan_next_block() was called, so we must recheck
@@ -2144,53 +2159,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
return presult.ndeleted;
}
--
2.43.0
[text/x-patch] v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch (9.2K, 11-v14-0010-Vacuum-phase-III-set-PD_ALL_VISIBLE-in-vacuum-WA.patch)
download | inline diff:
From aafd0b18341a03d4b48574f28694d04891555c5e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 10:39:31 -0400
Subject: [PATCH v14 10/24] Vacuum phase III set PD_ALL_VISIBLE in vacuum WAL
record
Instead of setting PD_ALL_VISIBLE on the heap page when setting bits in
the VM, set it when flipping the line pointers on the page to LP_UNUSED.
This will allow us to omit the heap page from the VM WAL chain.
To do this, we must check if the page will be all-visible once we flip
the line pointers before we actually do so.
One functional change is that a single critical section surrounds both
the VM update and the heap update. Previously they were each in a
critical section, so we could crash and have set PD_ALL_VISIBLE but not
set bits in the VM.
---
src/backend/access/heap/vacuumlazy.c | 140 ++++++++++++++++++++-------
1 file changed, 105 insertions(+), 35 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5a6bbbd97f2..9bfcd67a61b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -465,6 +465,11 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2793,6 +2798,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
TransactionId visibility_cutoff_xid;
bool all_frozen;
LVSavedErrInfo saved_err_info;
+ uint8 vmflags = 0;
Assert(vacrel->do_index_vacuuming);
@@ -2803,6 +2809,18 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
+ if (heap_page_would_be_all_visible(vacrel, buffer,
+ deadoffsets, num_offsets,
+ &all_frozen, &visibility_cutoff_xid))
+ {
+ vmflags |= VISIBILITYMAP_ALL_VISIBLE;
+ if (all_frozen)
+ {
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ Assert(!TransactionIdIsValid(visibility_cutoff_xid));
+ }
+ }
+
START_CRIT_SECTION();
for (int i = 0; i < num_offsets; i++)
@@ -2822,6 +2840,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
/* Attempt to truncate line pointer array now */
PageTruncateLinePointerArray(page);
+ /*
+ * The page will never have PD_ALL_VISIBLE already set, so if we are
+ * setting the VM, we must set PD_ALL_VISIBLE as well.
+ */
+ if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ PageSetAllVisible(page);
+
/*
* Mark buffer dirty before we write WAL.
*/
@@ -2833,7 +2858,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
log_heap_prune_and_freeze(vacrel->rel, buffer,
InvalidTransactionId,
false, /* no cleanup lock required */
- false,
+ (vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
NULL, 0, /* frozen */
NULL, 0, /* redirected */
@@ -2842,36 +2867,26 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
}
/*
- * End critical section, so we safely can do visibility tests (which
- * possibly need to perform IO and allocate memory!). If we crash now the
- * page (including the corresponding vm bit) might not be marked all
- * visible, but that's fine. A later vacuum will fix that.
+ * Note that we don't end the critical section until after emitting the VM
+ * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
+ * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
+ * to be set and the VM to be clear, we should do our best to keep these
+ * in sync. This does mean that we will take a lock on the VM buffer
+ * inside of a critical section, which is generally discouraged. There is
+ * precedent for this in other callers of visibilitymap_set(), though.
*/
- END_CRIT_SECTION();
/*
- * Now that we have removed the LP_DEAD items from the page, once again
- * check if the page has become all-visible. The page is already marked
- * dirty, exclusively locked, and, if needed, a full page image has been
- * emitted.
+ * Now that we have removed the LP_DEAD items from the page, set the
+ * visibility map if the page became all-visible/all-frozen. Changes to
+ * the heap page have already been logged.
*/
- Assert(!PageIsAllVisible(page));
- if (heap_page_is_all_visible(vacrel, buffer, &visibility_cutoff_xid,
- &all_frozen))
+ if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (all_frozen)
- {
- Assert(!TransactionIdIsValid(visibility_cutoff_xid));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buffer,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
- flags);
+ vmflags);
/* Count the newly set VM page for logging */
vacrel->vm_new_visible_pages++;
@@ -2879,6 +2894,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vacrel->vm_new_visible_frozen_pages++;
}
+ END_CRIT_SECTION();
+
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
@@ -3540,30 +3557,77 @@ dead_items_cleanup(LVRelState *vacrel)
}
/*
- * Check if every tuple in the given page is visible to all current and future
- * transactions. Also return the visibility_cutoff_xid which is the highest
- * xmin amongst the visible tuples. Set *all_frozen to true if every tuple
- * on this page is frozen.
- *
- * This is a stripped down version of lazy_scan_prune(). If you change
- * anything here, make sure that everything stays in sync. Note that an
- * assertion calls us to verify that everybody still agrees. Be sure to avoid
- * introducing new side-effects here.
+ * Wrapper for heap_page_would_be_all_visible() which can be used for
+ * callers that expect no LP_DEAD on the page.
*/
static bool
heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
TransactionId *visibility_cutoff_xid,
bool *all_frozen)
{
+
+ return heap_page_would_be_all_visible(vacrel, buf,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid);
+}
+
+/*
+ * Determines whether or not the heap page in buf is all-visible other than
+ * the dead line pointers referred to by the provided deadoffsets array.
+ *
+ * deadoffsets are the offsets the caller knows about and already removed
+ * associated index entries. Vacuum will call this before setting those line
+ * pointers LP_UNUSED. So, if there are no new LP_DEAD items, then the page
+ * can be set all-visible in the VM by the caller.
+ *
+ * Returns true if the page is all-visible other than the provided
+ * deadoffsets and false otherwise.
+ *
+ * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ *
+ * *all_frozen is an output parameter indicating to the caller if every tuple
+ * on the page is frozen.
+ *
+ * *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
+ * visible tuples. It is only valid if the page is all-visible.
+ *
+ * Callers looking to verify that the page is already all-visible can call
+ * heap_page_is_all_visible().
+ *
+ * This is similar logic to that in heap_prune_record_unchanged_lp_normal() If
+ * you change anything here, make sure that everything stays in sync. Note
+ * that an assertion calls us to verify that everybody still agrees. Be sure
+ * to avoid introducing new side-effects here.
+ */
+static bool
+heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid)
+{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
OffsetNumber offnum,
maxoff;
bool all_visible = true;
+ int matched_dead_count = 0;
*visibility_cutoff_xid = InvalidTransactionId;
*all_frozen = true;
+ Assert(ndeadoffsets == 0 || deadoffsets);
+
+#ifdef USE_ASSERT_CHECKING
+ /* Confirm input deadoffsets[] is strictly sorted */
+ if (ndeadoffsets > 1)
+ {
+ for (int i = 1; i < ndeadoffsets; i++)
+ Assert(deadoffsets[i - 1] < deadoffsets[i]);
+ }
+#endif
+
maxoff = PageGetMaxOffsetNumber(page);
for (offnum = FirstOffsetNumber;
offnum <= maxoff && all_visible;
@@ -3591,9 +3655,15 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
*/
if (ItemIdIsDead(itemid))
{
- all_visible = false;
- *all_frozen = false;
- break;
+ if (!deadoffsets ||
+ matched_dead_count >= ndeadoffsets ||
+ deadoffsets[matched_dead_count] != offnum)
+ {
+ *all_frozen = all_visible = false;
+ break;
+ }
+ matched_dead_count++;
+ continue;
}
Assert(ItemIdIsNormal(itemid));
--
2.43.0
[text/x-patch] v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch (3.0K, 12-v14-0011-Log-setting-empty-pages-PD_ALL_VISIBLE-with-XLOG.patch)
download | inline diff:
From d774b80288042d9a31cbc6477c2f0151f1c9dc2e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 18:11:49 -0400
Subject: [PATCH v14 11/24] Log setting empty pages PD_ALL_VISIBLE with
XLOG_HEAP2_VACUUM_SCAN
Though not a big win for this particular case, if we use the
XLOG_HEAP2_VACUUM_SCAN record to log setting PD_ALL_VISIBLE on the heap
page we can omit the heap page from the WAL chain when setting the
visibility map. A follow-on commit will actually remove the heap page
from the VM set WAL chain.
---
src/backend/access/heap/vacuumlazy.c | 43 +++++++++++++++++++---------
1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 9bfcd67a61b..c016f8f7c25 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1879,23 +1879,38 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
{
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ PageSetAllVisible(page);
MarkBufferDirty(buf);
- /*
- * It's possible that another backend has extended the heap,
- * initialized the page, and then failed to WAL-log the page due
- * to an ERROR. Since heap extension is not WAL-logged, recovery
- * might try to replay our record setting the page all-visible and
- * find that the page isn't initialized, which will cause a PANIC.
- * To prevent that, check whether the page has been previously
- * WAL-logged, and if not, do that now.
- */
- if (RelationNeedsWAL(vacrel->rel) &&
- PageGetLSN(page) == InvalidXLogRecPtr)
- log_newpage_buffer(buf, true);
+ if (RelationNeedsWAL(vacrel->rel))
+ {
+ /*
+ * It's possible that another backend has extended the heap,
+ * initialized the page, and then failed to WAL-log the page
+ * due to an ERROR. Since heap extension is not WAL-logged,
+ * recovery might try to replay our record setting the page
+ * all-visible and find that the page isn't initialized, which
+ * will cause a PANIC. To prevent that, check whether the page
+ * has been previously WAL-logged, and if not, do that now.
+ *
+ * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
+ * heap page. Doing this in a separate record from setting the
+ * VM allows us to omit the heap page from the VM WAL chain.
+ */
+ if (PageGetLSN(page) == InvalidXLogRecPtr)
+ log_newpage_buffer(buf, true);
+ else
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ true, /* set_pd_all_vis */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+ }
- PageSetAllVisible(page);
visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
--
2.43.0
[text/x-patch] v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch (11.8K, 13-v14-0012-Remove-heap-buffer-from-XLOG_HEAP2_VISIBLE-WAL-c.patch)
download | inline diff:
From a63eed81ff73217a12cbb84b2a7f4def3366871a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 11:05:30 -0400
Subject: [PATCH v14 12/24] Remove heap buffer from XLOG_HEAP2_VISIBLE WAL
chain
Now that all users of visibilitymap_set() include setting PD_ALL_VISIBLE
in the WAL record capturing other changes to the heap page, we no longer
need to include the heap buffer in the WAL chain for setting the VM.
---
src/backend/access/heap/heapam.c | 16 +-----
src/backend/access/heap/heapam_xlog.c | 76 +++----------------------
src/backend/access/heap/vacuumlazy.c | 6 +-
src/backend/access/heap/visibilitymap.c | 31 +---------
src/include/access/heapam_xlog.h | 3 +-
src/include/access/visibilitymap.h | 2 +-
6 files changed, 16 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c8cd9d22726..0323e2df409 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8807,21 +8807,14 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
*
* snapshotConflictHorizon comes from the largest xmin on the page being
* marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
*/
XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
+log_heap_visible(Relation rel, Buffer vm_buffer,
TransactionId snapshotConflictHorizon, uint8 vmflags)
{
xl_heap_visible xlrec;
XLogRecPtr recptr;
- uint8 flags;
- Assert(BufferIsValid(heap_buffer));
Assert(BufferIsValid(vm_buffer));
xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
@@ -8830,14 +8823,7 @@ log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
XLogBeginInsert();
XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
return recptr;
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a54238f2b59..68b41f39e69 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -229,15 +229,12 @@ heap_xlog_visible(XLogReaderState *record)
XLogRecPtr lsn = record->EndRecPtr;
xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
+ XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
/*
* If there are any Hot Standby transactions running that have an xmin
@@ -254,70 +251,11 @@ heap_xlog_visible(XLogReaderState *record)
rlocator);
/*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
+ * Even if the heap relation was dropped or truncated and the previously
+ * emitted record skipped the heap page update due to this LSN interlock,
+ * it's still safe to update the visibility map. Any WAL record that
+ * clears the visibility map bit does so before checking the page LSN, so
+ * any bits that need to be cleared will still be cleared.
*/
if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
&vmbuffer) == BLK_NEEDS_REDO)
@@ -341,7 +279,7 @@ heap_xlog_visible(XLogReaderState *record)
reln = CreateFakeRelcacheEntry(rlocator);
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+ visibilitymap_set(reln, blkno, lsn, vmbuffer,
xlrec->snapshotConflictHorizon, vmbits);
ReleaseBuffer(vmbuffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index c016f8f7c25..735f1e7501e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1911,7 +1911,7 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno, buf,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, InvalidTransactionId,
VISIBILITYMAP_ALL_VISIBLE |
@@ -2100,7 +2100,7 @@ lazy_scan_prune(LVRelState *vacrel,
flags |= VISIBILITYMAP_ALL_FROZEN;
}
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
+ old_vmbits = visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
flags);
@@ -2898,7 +2898,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
*/
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno, buffer,
+ visibilitymap_set(vacrel->rel, blkno,
InvalidXLogRecPtr,
vmbuffer, visibility_cutoff_xid,
vmflags);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index aa48a436108..75fcb3f067a 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -233,9 +233,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* when a page that is already all-visible is being marked all-frozen.
*
* Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
+ * this function.
*
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
@@ -244,7 +242,7 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* Returns the state of the page's VM bits before setting flags.
*/
uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
{
@@ -261,18 +259,11 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
#endif
Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
/* Must never set all_frozen bit without also setting all_visible bit */
Assert(flags != VISIBILITYMAP_ALL_FROZEN);
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) || BufferIsExclusiveLocked(heapBuf));
-
/* Check that we have the right VM page pinned */
if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
@@ -294,23 +285,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
+ recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
}
PageSetLSN(page, recptr);
}
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 7d3fb75dda7..82b8f7f2bbc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -440,7 +440,6 @@ typedef struct xl_heap_inplace
* This is what we need to know about setting a visibility map bit
*
* Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
*/
typedef struct xl_heap_visible
{
@@ -493,7 +492,7 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
+extern XLogRecPtr log_heap_visible(Relation rel,
Buffer vm_buffer,
TransactionId snapshotConflictHorizon,
uint8 vmflags);
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index fc7056a91ea..302adf4856a 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,7 +32,7 @@ extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
+ BlockNumber heapBlk,
XLogRecPtr recptr,
Buffer vmBuf,
TransactionId cutoff_xid,
--
2.43.0
[text/x-patch] v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (21.5K, 14-v14-0014-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
download | inline diff:
From 1fc1a338e5d6621f89df46fe29d08c799267b39d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 15:52:18 -0400
Subject: [PATCH v14 14/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase III
Instead of emitting a separate XLOG_HEAP2_VISIBLE record for each page
that is rendered all-visible by vacuum's third phase, include the
updates to the VM in the already emitted XLOG_HEAP2_PRUNE_VACUUM_CLEANUP
record.
The visibilitymap bits are stored in the flags member of the
xl_heap_prune struct.
This can decrease the number of of WAL records vacuum phase III emits by
as much as half.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam_xlog.c | 147 ++++++++++++++++++-------
src/backend/access/heap/pruneheap.c | 37 ++++++-
src/backend/access/heap/vacuumlazy.c | 38 +++----
src/backend/access/rmgrdesc/heapdesc.c | 11 +-
src/include/access/heapam.h | 1 +
src/include/access/heapam_xlog.h | 25 ++++-
6 files changed, 190 insertions(+), 69 deletions(-)
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 68b41f39e69..c1f332f7a9a 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -35,7 +35,9 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Buffer buffer;
RelFileLocator rlocator;
BlockNumber blkno;
- XLogRedoAction action;
+ Buffer vmbuffer = InvalidBuffer;
+ uint8 vmflags = 0;
+ Size freespace = 0;
XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
memcpy(&xlrec, maindataptr, SizeOfHeapPrune);
@@ -50,11 +52,22 @@ heap_xlog_prune_freeze(XLogReaderState *record)
Assert((xlrec.flags & XLHP_CLEANUP_LOCK) != 0 ||
(xlrec.flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
+ if (xlrec.flags & XLHP_VM_ALL_VISIBLE)
+ {
+ vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (xlrec.flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ }
+
/*
- * We are about to remove and/or freeze tuples. In Hot Standby mode,
- * ensure that there are no queries running for which the removed tuples
- * are still visible or which still consider the frozen xids as running.
- * The conflict horizon XID comes after xl_heap_prune.
+ * After xl_heap_prune is the optional snapshot conflict horizon.
+ *
+ * In Hot Standby mode, we must ensure that there are no running queries
+ * which would conflict with the changes in this record. That means we
+ * can't replay this record if it removes tuples that are still visible to
+ * transactions on the standby, freeze tuples with xids that are still
+ * considered running on the standby, or set a page as all-visible in the
+ * VM if it isn't all-visible to all transactions on the standby.
*/
if ((xlrec.flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
{
@@ -71,12 +84,12 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we have a full-page image, restore it and we're done.
+ * If we have a full-page image of the heap block, restore it and we're
+ * done with the heap block.
*/
- action = XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
- (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
- &buffer);
- if (action == BLK_NEEDS_REDO)
+ if (XLogReadBufferForRedoExtended(record, 0, RBM_NORMAL,
+ (xlrec.flags & XLHP_CLEANUP_LOCK) != 0,
+ &buffer) == BLK_NEEDS_REDO)
{
Page page = BufferGetPage(buffer);
OffsetNumber *redirected;
@@ -100,6 +113,11 @@ heap_xlog_prune_freeze(XLogReaderState *record)
do_prune = nredirected > 0 || ndead > 0 || nunused > 0;
+ /* Ensure the record does something */
+ Assert(do_prune || nplans > 0 ||
+ vmflags & VISIBILITYMAP_VALID_BITS ||
+ xlrec.flags & XLHP_SET_PD_ALL_VIS);
+
/*
* Update all line pointers per the record, and repair fragmentation
* if needed.
@@ -147,15 +165,23 @@ heap_xlog_prune_freeze(XLogReaderState *record)
* page-level PD_ALL_VISIBLE bit is clear. If that were to occur,
* then a subsequent page modification would fail to clear the
* visibility map bit.
+ *
+ * Note: we don't worry about updating the page's prunability hints.
+ * At worst this will cause an extra prune cycle to occur soon.
*/
if (xlrec.flags & XLHP_SET_PD_ALL_VIS)
PageSetAllVisible(page);
/*
- * Note: we don't worry about updating the page's prunability hints.
- * At worst this will cause an extra prune cycle to occur soon.
+ * We must never end up with the VM bit set and the page-level
+ * PD_ALL_VISIBLE bit clear. If that were to occur, a subsequent page
+ * modification would fail to clear the VM bit.
*/
- MarkBufferDirty(buffer);
+ Assert(!(vmflags & VISIBILITYMAP_VALID_BITS) || PageIsAllVisible(page));
+
+ /* If this record only sets the VM, no need to dirty the heap page */
+ if (do_prune || nplans > 0 || xlrec.flags & XLHP_SET_PD_ALL_VIS)
+ MarkBufferDirty(buffer);
/*
* We always emit a WAL record when setting PD_ALL_VISIBLE, but we are
@@ -171,47 +197,94 @@ heap_xlog_prune_freeze(XLogReaderState *record)
}
/*
- * If we released any space or line pointers or set PD_ALL_VISIBLE update
- * the freespace map.
+ * If we released any space or line pointers or set PD_ALL_VISIBLE or the
+ * VM, update the freespace map.
*
- * Even if we are just setting PD_ALL_VISIBLE (and thus not freeing up any
- * space), we'll still update the FSM for this page. Since the FSM is not
- * WAL-logged and only updated heuristically, it easily becomes stale in
- * standbys. If the standby is later promoted and runs VACUUM, it will
- * skip updating individual free space figures for pages that became
- * all-visible (or all-frozen, depending on the vacuum mode,) which is
- * troublesome when FreeSpaceMapVacuum propagates too optimistic free
- * space values to upper FSM layers; later inserters try to use such pages
- * only to find out that they are unusable. This can cause long stalls
- * when there are many such pages.
+ * Even if we are just setting PD_ALL_VISIBLE or updating the VM (and thus
+ * not freeing up any space), we'll still update the FSM for this page.
+ * Since the FSM is not WAL-logged and only updated heuristically, it
+ * easily becomes stale in standbys. If the standby is later promoted and
+ * runs VACUUM, it will skip updating individual free space figures for
+ * pages that became all-visible (or all-frozen, depending on the vacuum
+ * mode,) which is troublesome when FreeSpaceMapVacuum propagates too
+ * optimistic free space values to upper FSM layers; later inserters try
+ * to use such pages only to find out that they are unusable. This can
+ * cause long stalls when there are many such pages.
*
* Forestall those problems by updating FSM's idea about a page that is
* becoming all-visible or all-frozen.
*
* Do this regardless of a full-page image being applied, since the FSM
* data is not in the page anyway.
+ *
+ * We want to avoid holding an exclusive lock on the heap buffer while
+ * doing IO (either of the FSM or the VM), so we'll release the lock on
+ * the heap buffer before doing either.
*/
if (BufferIsValid(buffer))
{
if (xlrec.flags & (XLHP_HAS_REDIRECTIONS |
XLHP_HAS_DEAD_ITEMS |
XLHP_HAS_NOW_UNUSED_ITEMS |
- XLHP_SET_PD_ALL_VIS))
- {
- Size freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
+ XLHP_SET_PD_ALL_VIS |
+ (vmflags & VISIBILITYMAP_VALID_BITS)))
+ freespace = PageGetHeapFreeSpace(BufferGetPage(buffer));
- /*
- * We want to avoid holding an exclusive lock on the heap buffer
- * while doing IO, so we'll release the lock on the heap buffer
- * first.
- */
- UnlockReleaseBuffer(buffer);
+ UnlockReleaseBuffer(buffer);
+ }
- XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
+ /*
+ * Now read and update the VM block.
+ *
+ * Note that the heap relation may have been dropped or truncated, leading
+ * us to skip updating the heap block due to the LSN interlock. However,
+ * even in that case, it's still safe to update the visibility map. Any
+ * WAL record that clears the visibility map bit does so before checking
+ * the page LSN, so any bits that need to be cleared will still be
+ * cleared.
+ *
+ * Note that the lock on the heap page was dropped above. In normal
+ * operation this would never be safe because a concurrent query could
+ * modify the heap page and clear PD_ALL_VISIBLE -- violating the
+ * invariant that PD_ALL_VISIBLE must be set if the corresponding bit in
+ * the VM is set.
+ *
+ * In recovery, we expect no other writers, so writing to the VM page
+ * without holding a lock on the heap page is considered safe enough. It
+ * is done this way when replaying xl_heap_visible records (see
+ * heap_xlog_visible()).
+ */
+ if (vmflags & VISIBILITYMAP_VALID_BITS &&
+ XLogReadBufferForRedoExtended(record, 1,
+ RBM_ZERO_ON_ERROR,
+ false,
+ &vmbuffer) == BLK_NEEDS_REDO)
+ {
+ Page vmpage = BufferGetPage(vmbuffer);
+ uint8 old_vmbits = 0;
+ Relation reln = CreateFakeRelcacheEntry(rlocator);
+
+ /* initialize the page if it was read as zeros */
+ if (PageIsNew(vmpage))
+ PageInit(vmpage, BLCKSZ, 0);
+
+ old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+
+ /* Only set VM page LSN if we modified the page */
+ if (old_vmbits != vmflags)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), lsn);
}
- else
- UnlockReleaseBuffer(buffer);
+
+ FreeFakeRelcacheEntry(reln);
}
+
+ if (BufferIsValid(vmbuffer))
+ UnlockReleaseBuffer(vmbuffer);
+
+ if (freespace > 0)
+ XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9b25131543b..9e00fbf3cd1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -20,6 +20,7 @@
#include "access/multixact.h"
#include "access/transam.h"
#include "access/xlog.h"
+#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
#include "executor/instrument.h"
@@ -913,6 +914,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
conflict_xid = prstate.latest_xid_removed;
log_heap_prune_and_freeze(relation, buffer,
+ InvalidBuffer, 0,
conflict_xid,
true,
do_set_pd_vis,
@@ -2088,14 +2090,18 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*
* This is used for several different page maintenance operations:
*
- * - Page pruning, in VACUUM's 1st pass or on access: Some items are
+ * - Page pruning, in vacuum phase I or on-access: Some items are
* redirected, some marked dead, and some removed altogether.
*
- * - Freezing: Items are marked as 'frozen'.
+ * - Freezing: During vacuum phase I, items are marked as 'frozen'
*
- * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
+ * - Reaping: During vacuum phase III, items that are already LP_DEAD are
+ * marked as unused.
*
- * They have enough commonalities that we use a single WAL record for them
+ * - VM updates: After vacuum phase III, the heap page may be marked
+ * all-visible and all-frozen.
+ *
+ * These changes all happen together, so we use a single WAL record for them
* all.
*
* If replaying the record requires a cleanup lock, pass cleanup_lock = true.
@@ -2103,6 +2109,10 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* replaying 'unused' items depends on whether they were all previously marked
* as dead.
*
+ * If the VM is being updated, vmflags will contain the bits to set. In this
+ * case, vmbuffer should already have been updated and marked dirty and should
+ * still be pinned and locked.
+ *
* set_pd_all_vis indicates that we set PD_ALL_VISIBLE and thus should update
* the page LSN when checksums/wal_log_hints are enabled even if we did not
* prune or freeze tuples on the page.
@@ -2113,6 +2123,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
*/
void
log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
@@ -2139,6 +2150,8 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
xlrec.flags = 0;
regbuf_flags = REGBUF_STANDARD;
+ Assert((vmflags & VISIBILITYMAP_VALID_BITS) == vmflags);
+
/*
* We can avoid an FPI if the only modification we are making to the heap
* page is to set PD_ALL_VISIBLE and checksums/wal_log_hints are disabled.
@@ -2157,6 +2170,10 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
*/
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, regbuf_flags);
+
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ XLogRegisterBuffer(1, vmbuffer, 0);
+
if (nfrozen > 0)
{
int nplans;
@@ -2213,6 +2230,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
* Prepare the main xl_heap_prune record. We already set the XLHP_HAS_*
* flag above.
*/
+ if (vmflags & VISIBILITYMAP_ALL_VISIBLE)
+ {
+ xlrec.flags |= XLHP_VM_ALL_VISIBLE;
+ if (vmflags & VISIBILITYMAP_ALL_FROZEN)
+ xlrec.flags |= XLHP_VM_ALL_FROZEN;
+ }
if (set_pd_all_vis)
xlrec.flags |= XLHP_SET_PD_ALL_VIS;
if (RelationIsAccessibleInLogicalDecoding(relation))
@@ -2247,6 +2270,12 @@ log_heap_prune_and_freeze(Relation relation, Buffer buffer,
}
recptr = XLogInsert(RM_HEAP2_ID, info);
+ if (vmflags & VISIBILITYMAP_VALID_BITS)
+ {
+ Assert(BufferIsDirty(vmbuffer));
+ PageSetLSN(BufferGetPage(vmbuffer), recptr);
+ }
+
/*
* We must bump the page LSN if pruning or freezing. If we are only
* updating PD_ALL_VISIBLE, though, we can skip doing this unless
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a0f3984e37f..b6c973cd111 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1906,6 +1906,8 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
else
log_heap_prune_and_freeze(vacrel->rel, buf,
+ InvalidBuffer,
+ 0,
InvalidTransactionId, /* conflict xid */
false, /* cleanup lock */
true, /* set_pd_all_vis */
@@ -2817,6 +2819,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
OffsetNumber unused[MaxHeapTuplesPerPage];
int nunused = 0;
TransactionId visibility_cutoff_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
bool all_frozen;
LVSavedErrInfo saved_err_info;
uint8 vmflags = 0;
@@ -2842,6 +2845,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
vmflags |= VISIBILITYMAP_ALL_FROZEN;
Assert(!TransactionIdIsValid(visibility_cutoff_xid));
}
+
+ /* Take the lock on the vmbuffer before entering a critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
}
START_CRIT_SECTION();
@@ -2868,7 +2874,13 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* setting the VM, we must set PD_ALL_VISIBLE as well.
*/
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
+ {
PageSetAllVisible(page);
+ visibilitymap_set_vmbits(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
+ conflict_xid = visibility_cutoff_xid;
+ }
/*
* Mark buffer dirty before we write WAL.
@@ -2879,7 +2891,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if (RelationNeedsWAL(vacrel->rel))
{
log_heap_prune_and_freeze(vacrel->rel, buffer,
- InvalidTransactionId,
+ vmbuffer, vmflags,
+ conflict_xid,
false, /* no cleanup lock required */
(vmflags & VISIBILITYMAP_VALID_BITS) != 0,
PRUNE_VACUUM_CLEANUP,
@@ -2889,36 +2902,17 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
unused, nunused);
}
- /*
- * Note that we don't end the critical section until after emitting the VM
- * record. This ensures both PD_ALL_VISIBLE and the VM bits are set or
- * unset in the event of a crash. While it is correct for PD_ALL_VISIBLE
- * to be set and the VM to be clear, we should do our best to keep these
- * in sync. This does mean that we will take a lock on the VM buffer
- * inside of a critical section, which is generally discouraged. There is
- * precedent for this in other callers of visibilitymap_set(), though.
- */
+ END_CRIT_SECTION();
- /*
- * Now that we have removed the LP_DEAD items from the page, set the
- * visibility map if the page became all-visible/all-frozen. Changes to
- * the heap page have already been logged.
- */
if ((vmflags & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, visibility_cutoff_xid,
- vmflags);
-
/* Count the newly set VM page for logging */
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
vacrel->vm_new_visible_pages++;
if (all_frozen)
vacrel->vm_new_visible_frozen_pages++;
}
- END_CRIT_SECTION();
-
/* Revert to the previous phase information for error traceback */
restore_vacuum_error_info(vacrel, &saved_err_info);
}
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index b48d7dc1d24..1cb44ca32d3 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -103,7 +103,7 @@ plan_elem_desc(StringInfo buf, void *plan, void *data)
* code, the latter of which is used in frontend (pg_waldump) code.
*/
void
-heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
@@ -287,6 +287,15 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
appendStringInfo(buf, ", isCatalogRel: %c",
xlrec->flags & XLHP_IS_CATALOG_REL ? 'T' : 'F');
+ if (xlrec->flags & XLHP_VM_ALL_VISIBLE)
+ {
+ uint8 vmflags = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (xlrec->flags & XLHP_VM_ALL_FROZEN)
+ vmflags |= VISIBILITYMAP_ALL_FROZEN;
+ appendStringInfo(buf, ", vm_flags: 0x%02X", vmflags);
+ }
+
if (XLogRecHasBlockData(record, 0))
{
Size datalen;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2f77d8dbcd6..be66970c9f0 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -389,6 +389,7 @@ extern void heap_page_prune_execute(Buffer buffer, bool lp_truncate_only,
OffsetNumber *nowunused, int nunused);
extern void heap_get_root_tuples(Page page, OffsetNumber *root_offsets);
extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, uint8 vmflags,
TransactionId conflict_xid,
bool cleanup_lock,
bool set_pd_all_vis,
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 82b8f7f2bbc..833114e0a6e 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -249,7 +249,7 @@ typedef struct xl_heap_update
* Main data section:
*
* xl_heap_prune
- * uint8 flags
+ * uint16 flags
* TransactionId snapshot_conflict_horizon
*
* Block 0 data section:
@@ -284,7 +284,7 @@ typedef struct xl_heap_update
*/
typedef struct xl_heap_prune
{
- uint8 flags;
+ uint16 flags;
/*
* If XLHP_HAS_CONFLICT_HORIZON is set, the conflict horizon XID follows,
@@ -292,11 +292,17 @@ typedef struct xl_heap_prune
*/
} xl_heap_prune;
-#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint8))
+#define SizeOfHeapPrune (offsetof(xl_heap_prune, flags) + sizeof(uint16))
#define XLHP_SET_PD_ALL_VIS (1 << 0)
-/* to handle recovery conflict during logical decoding on standby */
+/*
+ * To handle recovery conflict during logical decoding on standby, we must know
+ * if the table is a catalog table. Note that in visibilitymapdefs.h
+ * VISIBLITYMAP_XLOG_CATALOG_REL is also defined as (1 << 2). xl_heap_prune
+ * records should use XLHP_IS_CATALOG_REL, not VISIBILIYTMAP_XLOG_CATALOG_REL --
+ * even if they only contain updates to the VM.
+ */
#define XLHP_IS_CATALOG_REL (1 << 1)
/*
@@ -332,6 +338,15 @@ typedef struct xl_heap_prune
#define XLHP_HAS_DEAD_ITEMS (1 << 6)
#define XLHP_HAS_NOW_UNUSED_ITEMS (1 << 7)
+/*
+ * The xl_heap_prune record's flags may also contain which VM bits to set.
+ * xl_heap_prune should always use the XLHP_VM_ALL_VISIBLE and
+ * XLHP_VM_ALL_FROZEN flags and translate them to their visibilitymapdefs.h
+ * equivalents, VISIBILITYMAP_ALL_VISIBLE and VISIBILITYMAP_ALL_FROZEN.
+ */
+#define XLHP_VM_ALL_VISIBLE (1 << 8)
+#define XLHP_VM_ALL_FROZEN (1 << 9)
+
/*
* xlhp_freeze_plan describes how to freeze a group of one or more heap tuples
* (appears in xl_heap_prune's xlhp_freeze_plans sub-record)
@@ -498,7 +513,7 @@ extern XLogRecPtr log_heap_visible(Relation rel,
uint8 vmflags);
/* in heapdesc.c, so it can be shared between frontend/backend code */
-extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
+extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
OffsetNumber **frz_offsets,
int *nredirected, OffsetNumber **redirected,
--
2.43.0
[text/x-patch] v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patch (6.8K, 15-v14-0013-Make-heap_page_is_all_visible-independent-of-LVR.patch)
download | inline diff:
From 2f820f93bfe273ed9b9867d3ddc9f4c67dd94296 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:39:31 -0400
Subject: [PATCH v14 13/24] Make heap_page_is_all_visible independent of
LVRelState
Future commits will use this function inside of pruneheap.c where we do
not have access to the LVRelState. We only need a few parameters from
the LVRelState, so just pass those in explicitly.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 57 ++++++++++++++++++----------
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 735f1e7501e..a0f3984e37f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,13 +463,18 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
- TransactionId *visibility_cutoff_xid, bool *all_frozen);
-static bool heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+static bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid);
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2030,8 +2035,9 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(presult.lpdead_items == 0);
- if (!heap_page_is_all_visible(vacrel, buf,
- &debug_cutoff, &debug_all_frozen))
+ if (!heap_page_is_all_visible(vacrel->rel, buf,
+ vacrel->cutoffs.OldestXmin, &debug_all_frozen,
+ &debug_cutoff, &vacrel->offnum))
Assert(false);
Assert(presult.all_frozen == debug_all_frozen);
@@ -2824,9 +2830,11 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
VACUUM_ERRCB_PHASE_VACUUM_HEAP, blkno,
InvalidOffsetNumber);
- if (heap_page_would_be_all_visible(vacrel, buffer,
+ if (heap_page_would_be_all_visible(vacrel->rel, buffer,
+ vacrel->cutoffs.OldestXmin,
deadoffsets, num_offsets,
- &all_frozen, &visibility_cutoff_xid))
+ &all_frozen, &visibility_cutoff_xid,
+ &vacrel->offnum))
{
vmflags |= VISIBILITYMAP_ALL_VISIBLE;
if (all_frozen)
@@ -3576,15 +3584,19 @@ dead_items_cleanup(LVRelState *vacrel)
* callers that expect no LP_DEAD on the page.
*/
static bool
-heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
TransactionId *visibility_cutoff_xid,
- bool *all_frozen)
+ OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(vacrel, buf,
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
NULL, 0,
all_frozen,
- visibility_cutoff_xid);
+ visibility_cutoff_xid,
+ logging_offnum);
}
/*
@@ -3599,7 +3611,7 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * vacrel->cutoffs.OldestXmin is used to determine visibility.
+ * OldestXmin is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3607,6 +3619,9 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* *visibility_cutoff_xid is an output parameter with the highest xmin amongst the
* visible tuples. It is only valid if the page is all-visible.
*
+ * *logging_offnum will have the OffsetNumber of the current tuple being
+ * processed for vacuum's error callback system.
+ *
* Callers looking to verify that the page is already all-visible can call
* heap_page_is_all_visible().
*
@@ -3616,11 +3631,13 @@ heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
* to avoid introducing new side-effects here.
*/
static bool
-heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
+heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
- TransactionId *visibility_cutoff_xid)
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
{
Page page = BufferGetPage(buf);
BlockNumber blockno = BufferGetBlockNumber(buf);
@@ -3655,7 +3672,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
* Set the offset number so that we can display it along with any
* error that occurred while processing this tuple.
*/
- vacrel->offnum = offnum;
+ *logging_offnum = offnum;
itemid = PageGetItemId(page, offnum);
/* Unused or redirect line pointers are of no interest */
@@ -3685,9 +3702,9 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
tuple.t_len = ItemIdGetLength(itemid);
- tuple.t_tableOid = RelationGetRelid(vacrel->rel);
+ tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->cutoffs.OldestXmin,
+ switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
buf))
{
case HEAPTUPLE_LIVE:
@@ -3708,7 +3725,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
if (!TransactionIdPrecedes(xmin,
- vacrel->cutoffs.OldestXmin))
+ OldestXmin))
{
all_visible = false;
*all_frozen = false;
@@ -3743,7 +3760,7 @@ heap_page_would_be_all_visible(LVRelState *vacrel, Buffer buf,
} /* scan along page */
/* Clear the offset information once we have processed the given page. */
- vacrel->offnum = InvalidOffsetNumber;
+ *logging_offnum = InvalidOffsetNumber;
return all_visible;
}
--
2.43.0
[text/x-patch] v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch (3.3K, 16-v14-0015-Set-empty-pages-all-visible-in-XLOG_HEAP2_PRUNE_.patch)
download | inline diff:
From ed61f88812f33cb96cebeabc5c9c43a11cdd5a3e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 16:04:18 -0400
Subject: [PATCH v14 15/24] Set empty pages all-visible in
XLOG_HEAP2_PRUNE_VACUUM_SCAN record
As part of a project to eliminate XLOG_HEAP2_VISIBLE records, eliminate
their usage in phase I vacuum of empty pages.
---
src/backend/access/heap/vacuumlazy.c | 55 +++++++++++++++++-----------
1 file changed, 34 insertions(+), 21 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b6c973cd111..e01fc5bb502 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1882,11 +1882,21 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ bool set_pd_all_vis = true;
+
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
PageSetAllVisible(page);
MarkBufferDirty(buf);
+ visibilitymap_set_vmbits(vacrel->rel, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
+
if (RelationNeedsWAL(vacrel->rel))
{
/*
@@ -1897,34 +1907,37 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
* all-visible and find that the page isn't initialized, which
* will cause a PANIC. To prevent that, check whether the page
* has been previously WAL-logged, and if not, do that now.
- *
- * Otherwise, just emit WAL for setting PD_ALL_VISIBLE on the
- * heap page. Doing this in a separate record from setting the
- * VM allows us to omit the heap page from the VM WAL chain.
*/
if (PageGetLSN(page) == InvalidXLogRecPtr)
+ {
log_newpage_buffer(buf, true);
- else
- log_heap_prune_and_freeze(vacrel->rel, buf,
- InvalidBuffer,
- 0,
- InvalidTransactionId, /* conflict xid */
- false, /* cleanup lock */
- true, /* set_pd_all_vis */
- PRUNE_VACUUM_SCAN, /* reason */
- NULL, 0,
- NULL, 0,
- NULL, 0,
- NULL, 0);
+ set_pd_all_vis = false;
+ }
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM. If we emitted a new page record for the
+ * page above, setting PD_ALL_VISIBLE will already have been
+ * included in that record.
+ */
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ set_pd_all_vis,
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
}
- visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
[text/x-patch] v14-0016-Set-VM-in-heap_page_prune_and_freeze.patch (22.3K, 17-v14-0016-Set-VM-in-heap_page_prune_and_freeze.patch)
download | inline diff:
From 6d11a7bf77706bc4ddbdb156f25f9c53d4b1e615 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 16 Sep 2025 15:46:40 -0400
Subject: [PATCH v14 16/24] Set VM in heap_page_prune_and_freeze
The determination as to whether or not the page can be set
all-visible/all-frozen has already been done by the end of
heap_page_prune_and_freeze(). Vacuum waited until it returns to
lazy_scan_prune() to actually set the VM, though.
This commit moves setting the VM into heap_page_prune_and_freeze().
There are still two separate WAL records -- one for the changes to the
heap page and one for the changes to the VM. But, this is an incremental
step toward logging setting the VM in the same WAL record as pruning and
freezing.
Note that this is not used by on-access pruning.
---
src/backend/access/heap/pruneheap.c | 221 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 146 ++----------------
src/include/access/heapam.h | 24 +--
3 files changed, 221 insertions(+), 170 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9e00fbf3cd1..e3f9967e26c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/visibilitymapdefs.h"
#include "access/xloginsert.h"
@@ -257,7 +258,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* not the relation has indexes, since we cannot safely determine
* that during on-access pruning with the current implementation.
*/
- heap_page_prune_and_freeze(relation, buffer, PRUNE_ON_ACCESS, 0, NULL,
+ heap_page_prune_and_freeze(relation, buffer,
+ InvalidBuffer, false,
+ PRUNE_ON_ACCESS, 0, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -423,16 +426,115 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Determine whether to set the visibility map bits based on information from
+ * the PruneState and blk_known_av, which some callers will provide after
+ * previously examining this heap page's VM bits (e.g. vacuum from the last
+ * heap_vac_scan_next_block() call).
+ *
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
+ * Returns true if the caller should set one or both of the VM bits and false
+ * otherwise.
+ */
+static bool
+heap_page_will_set_vis(Relation relation,
+ BlockNumber heap_blk,
+ Buffer heap_buf,
+ Buffer vmbuffer,
+ bool blk_known_av,
+ PruneState *prstate,
+ uint8 *vmflags,
+ bool *do_set_pd_vis)
+{
+ Page heap_page = BufferGetPage(heap_buf);
+ bool do_set_vm = false;
+
+ if (prstate->all_visible && !PageIsAllVisible(heap_page))
+ *do_set_pd_vis = true;
+
+ if ((prstate->all_visible && !blk_known_av) ||
+ (prstate->all_frozen && !VM_ALL_FROZEN(relation, heap_blk, &vmbuffer)))
+ {
+ *vmflags = VISIBILITYMAP_ALL_VISIBLE;
+ if (prstate->all_frozen)
+ *vmflags |= VISIBILITYMAP_ALL_FROZEN;
+
+ do_set_vm = true;
+ }
+
+ /*
+ * Now handle two potential corruption cases:
+ *
+ * These do not need to happen in a critical section and are not
+ * WAL-logged.
+ *
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that in vacuum the bit
+ * got cleared after heap_vac_scan_next_block() was called, so we must
+ * recheck with buffer lock before concluding that the VM is corrupt.
+ */
+ else if (blk_known_av && !PageIsAllVisible(heap_page) &&
+ visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (prstate->lpdead_items > 0 && PageIsAllVisible(heap_page))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(relation), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buf);
+ visibilitymap_clear(relation, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ }
+
+ /* We should only set the VM if PD_ALL_VISIBLE is set or will be */
+ Assert(!do_set_vm || PageIsAllVisible(heap_page) || *do_set_pd_vis);
+
+ return do_set_vm;
+}
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
* also need to account for a reduction in the length of the line pointer
* array following array truncation by us.
*
+ * vmbuffer is the buffer that must already contain contain the required block
+ * of the visibility map if we are to update it. blk_known_av is the
+ * visibility status of the heap block as of the last call to
+ * find_next_unskippable_block().
+ *
* reason indicates why the pruning is performed. It is included in the WAL
* record for debugging and analysis purposes, but otherwise has no effect.
*
@@ -443,15 +545,20 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* FREEZE indicates that we will also freeze tuples, and will return
* 'all_visible', 'all_frozen' flags to the caller.
*
- * If the HEAP_PRUNE_FREEZE option is set, we will freeze tuples if it's
+ * UPDATE_VIS indicates that we will set the page's status in the VM.
+ *
+ * If the HEAP_PRUNE_FREEZE option is set, we will also freeze tuples if it's
* required in order to advance relfrozenxid / relminmxid, or if it's
* considered advantageous for overall system performance to do so now. The
* 'cutoffs', 'presult', 'new_relfrozen_xid' and 'new_relmin_mxid' arguments
- * are required when freezing. When HEAP_PRUNE_FREEZE option is set, we also
- * set presult->all_visible and presult->all_frozen on exit, to indicate if
- * the VM bits can be set. They are always set to false when the
- * HEAP_PRUNE_FREEZE option is not set, because at the moment only callers
- * that also freeze need that information.
+ * are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VIS is set and the visibility status of the page
+ * has changed, we will update the VM at the same time as pruning and freezing
+ * the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
+ *
*
* cutoffs contains the freeze cutoffs, established by VACUUM at the beginning
* of vacuuming the relation. Required if HEAP_PRUNE_FREEZE option is set.
@@ -478,6 +585,7 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
*/
void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -496,10 +604,13 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool do_set_pd_vis;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
TransactionId frz_conflict_horizon = InvalidTransactionId;
+ uint8 new_vmbits = 0;
+ uint8 old_vmbits = 0;
/* Copy parameters to prstate */
prstate.vistest = vistest;
@@ -828,19 +939,27 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(!prstate.all_frozen || prstate.all_visible);
/*
- * Though callers should set the VM if PD_ALL_VISIBLE is set here, it is
- * allowed for the page-level bit to be set and the VM to be clear.
+ * Determine whether or not to set the page level PD_ALL_VISIBLE and the
+ * visibility map bits based on information from the VM and from
+ * all_visible and all_frozen variables.
+ *
+ * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
+ * allowed for the page-level bit to be set and the VM to be clear. We log
+ * setting PD_ALL_VISIBLE on the heap page in a
+ * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
+ * emitted XLOG_HEAP2_VISIBLE record.
+ *
* Setting PD_ALL_VISIBLE when we are making the changes to the page that
* render it all-visible allows us to omit the heap page from the WAL
* chain when later updating the VM -- even when checksums/wal_log_hints
* are enabled.
*/
do_set_pd_vis = false;
+ do_set_vm = false;
if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- {
- if (prstate.all_visible && !PageIsAllVisible(page))
- do_set_pd_vis = true;
- }
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ &prstate, &new_vmbits, &do_set_pd_vis);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -928,28 +1047,72 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * VACUUM will call heap_page_would_be_all_visible() during the second
+ * pass over the heap to determine all_visible and all_frozen for the page
+ * -- this is a specialized version of that logic. Now that we've finished
+ * pruning and freezing, make sure that we're in total agreement with
+ * heap_page_would_be_all_visible() using an assertion.
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
+
+ if (!heap_page_is_all_visible(relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc))
+ Assert(false);
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == prstate.visibility_cutoff_xid);
+ }
+#endif
+
+ /* Now set the VM */
+ if (do_set_vm)
+ {
+ TransactionId vm_conflict_horizon;
+
+ Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
+
+ /*
+ * The conflict horizon for that record must be the newest xmin on the
+ * page. However, if the page is completely frozen, there can be no
+ * conflict and the vm_conflict_horizon should remain
+ * InvalidTransactionId. This includes the case that we just froze
+ * all the tuples; the prune-freeze record included the conflict XID
+ * already so a snapshotConflictHorizon sufficient to make everything
+ * safe for REDO was logged when the page's tuples were frozen.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ old_vmbits = visibilitymap_set(relation, blockno,
+ InvalidXLogRecPtr,
+ vmbuffer, vm_conflict_horizon,
+ new_vmbits);
+ }
+
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
-
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e01fc5bb502..8ec0476a0d4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -463,11 +463,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
int num_offsets);
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -2014,7 +2009,9 @@ lazy_scan_prune(LVRelState *vacrel,
if (vacrel->nindexes == 0)
prune_options |= HEAP_PAGE_PRUNE_MARK_UNUSED_NOW;
- heap_page_prune_and_freeze(rel, buf, PRUNE_VACUUM_SCAN, prune_options,
+ heap_page_prune_and_freeze(rel, buf,
+ vmbuffer, all_visible_according_to_vm,
+ PRUNE_VACUUM_SCAN, prune_options,
&vacrel->cutoffs,
vacrel->vistest,
&presult,
@@ -2035,33 +2032,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- if (!heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum))
- Assert(false);
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -2095,112 +2065,28 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
/*
- * Handle setting visibility map bits based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables.
+ * For the purposes of logging, count whether or not the page was newly
+ * set all-visible and, potentially, all-frozen.
*/
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)))
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- /*
- * If the page is all-frozen, we can pass InvalidTransactionId as our
- * cutoff_xid, since a snapshotConflictHorizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were
- * frozen.
- */
- if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- flags |= VISIBILITYMAP_ALL_FROZEN;
- }
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * Even if we are only setting the all-frozen bit, there is a small
- * chance that the VM was modified sometime between setting
- * all_visible_according_to_vm and checking the visibility during
- * pruning. Check the return value of old_vmbits to ensure the
- * visibility map counters used for logging are accurate.
- *
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ vacrel->vm_new_visible_pages++;
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- vacrel->vm_new_frozen_pages++;
+ vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
-
- /*
- * Now handle two potential corruption cases:
- *
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
-
return presult.ndeleted;
}
@@ -3590,7 +3476,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Wrapper for heap_page_would_be_all_visible() which can be used for
* callers that expect no LP_DEAD on the page.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index be66970c9f0..797cd51145d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -235,19 +235,14 @@ typedef struct PruneFreezeResult
int recently_dead_tuples;
/*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
*
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PRUNE_FREEZE option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
*/
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/*
* Whether or not the page makes rel truncation unsafe. This is set to
@@ -375,6 +370,7 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
struct GlobalVisState;
extern void heap_page_prune_opt(Relation relation, Buffer buffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
+ Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
int options,
const struct VacuumCutoffs *cutoffs,
@@ -403,6 +399,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
[text/x-patch] v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (12.6K, 18-v14-0017-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
download | inline diff:
From 9904f827846bb2660dbc9ff0ecb1d24dbe9dc3bc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:29:59 -0400
Subject: [PATCH v14 17/24] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Instead of emitting a separate WAL record for every block rendered
all-visible/frozen by vacuum's phase I, include the changes to the VM in
the XLOG_HEAP2_PRUNE_VACUUM_SCAN record already emitted.
This is only enabled for vacuum's prune/freeze work, not for on-access
pruning.
---
src/backend/access/heap/pruneheap.c | 183 +++++++++++++++++-----------
src/include/access/heapam.h | 3 +-
2 files changed, 112 insertions(+), 74 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index e3f9967e26c..a14c793da7e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -662,50 +662,58 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.deadoffsets = presult->deadoffsets;
/*
- * Caller may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Keep track of whether or not the page will be all-visible and
+ * all-frozen for use in opportunistic freezing and to update the VM if
+ * the caller requests it.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM attempts freezing. But other callers could. The
+ * visibility bookkeeping is required for opportunistic freezing (in
+ * addition to setting the VM bits) because we only consider
+ * opportunistically freezing tuples if the whole page would become
+ * all-frozen or if the whole page will be frozen except for dead tuples
+ * that will be removed by vacuum. But if consider_update_vm is false,
+ * we'll not set the VM even if the page is discovered to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible when we see LP_DEAD items. We fix that after
- * scanning the line pointers, before we return the value to the caller,
- * so that the caller doesn't set the VM bit incorrectly.
+ * If only HEAP_PAGE_PRUNE_UPDATE_ViS is passed and not
+ * HEAP_PAGE_PRUNE_FREEZE, prstate.all_frozen must be initialized to false
+ * because we will not call heap_prepare_freeze_tuple() on each tuple.
+ *
+ * Dead tuples which will be removed by the end of vacuuming should not
+ * preclude us from opportunistically freezing, so we do not clear
+ * all_visible when we see LP_DEAD items. We fix that after determining
+ * whether or not to freeze but before deciding whether or not to update
+ * the VM so that we don't set the VM bit incorrectly.
+ *
+ * If not freezing and not updating the VM, we avoid the extra
+ * bookkeeping. Initializing all_visible to false allows skipping the work
+ * to update them in heap_prune_record_unchanged_lp_normal().
*/
if (prstate.attempt_freeze)
{
prstate.all_visible = true;
prstate.all_frozen = true;
}
+ else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ {
+ prstate.all_visible = true;
+ prstate.all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate.all_visible = false;
prstate.all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon.
+ * This is most likely to happen when updating the VM and/or freezing all
+ * live tuples on the page. It is updated before returning to the caller
+ * because vacuum does assert-build only validation on the page using this
+ * field.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -943,16 +951,15 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* visibility map bits based on information from the VM and from
* all_visible and all_frozen variables.
*
- * Though callers should set the VM if PD_ALL_VISIBLE is set, it is
- * allowed for the page-level bit to be set and the VM to be clear. We log
- * setting PD_ALL_VISIBLE on the heap page in a
- * XLOG_HEAP2_PRUNE_VACUUM_SCAN record and setting the VM bits in a later
- * emitted XLOG_HEAP2_VISIBLE record.
+ * It is allowed for the page-level bit to be set and the VM to be clear,
+ * however, we have a strong preference for keeping them in sync.
*
- * Setting PD_ALL_VISIBLE when we are making the changes to the page that
- * render it all-visible allows us to omit the heap page from the WAL
- * chain when later updating the VM -- even when checksums/wal_log_hints
- * are enabled.
+ * Prior to Postgres 19, it was possible for the page-level bit to be set
+ * and the VM bit to be clear. This could happen if we crashed after
+ * setting PD_ALL_VISIBLE but before setting bits in the VM.
+ *
+ * As such, it is possible to only update the VM when PD_ALL_VISIBLE is
+ * already set.
*/
do_set_pd_vis = false;
do_set_vm = false;
@@ -961,6 +968,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
blockno, buffer, vmbuffer, blk_known_av,
&prstate, &new_vmbits, &do_set_pd_vis);
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -991,7 +1002,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze || do_set_pd_vis)
+ if (do_prune || do_freeze || do_set_pd_vis || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1008,12 +1019,31 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (do_set_pd_vis)
PageSetAllVisible(page);
- MarkBufferDirty(buffer);
+ if (do_prune || do_freeze || do_set_pd_vis)
+ MarkBufferDirty(buffer);
+
+ if (do_set_vm)
+ {
+ Assert(PageIsAllVisible(page));
+
+ old_vmbits = visibilitymap_set_vmbits(relation, blockno,
+ vmbuffer, new_vmbits);
+ if (old_vmbits == new_vmbits)
+ {
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+ /* Unset so we don't emit WAL since no change occurred */
+ do_set_vm = false;
+ }
+ }
/*
- * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did
+ * Emit a WAL XLOG_HEAP2_PRUNE* record showing what we did If we were
+ * only updating the VM and it turns out it was already set, we will
+ * have unset do_set_vm earlier. As such, check it again before
+ * emitting the record.
*/
- if (RelationNeedsWAL(relation))
+ if (RelationNeedsWAL(relation) &&
+ (do_prune || do_freeze || do_set_pd_vis || do_set_vm))
{
/*
* The snapshotConflictHorizon for the whole record should be the
@@ -1025,15 +1055,45 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* on the standby with xids older than the youngest tuple this
* record will freeze will conflict.
*/
- TransactionId conflict_xid;
+ TransactionId conflict_xid = InvalidTransactionId;
- if (TransactionIdFollows(frz_conflict_horizon, prstate.latest_xid_removed))
+ /*
+ * If we are updating the VM, the conflict horizon is almost
+ * always the visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization,
+ * we can use the visibility_cutoff_xid as the conflict horizon if
+ * the page will be all-frozen. This is true even if there are
+ * LP_DEAD line pointers because we ignored those when maintaining
+ * the visibility_cutoff_xid. This will have been calculated
+ * earlier as the frz_conflict_horizon when we determined we would
+ * freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = prstate.visibility_cutoff_xid;
+ else if (do_freeze)
conflict_xid = frz_conflict_horizon;
- else
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(prstate.latest_xid_removed, conflict_xid))
conflict_xid = prstate.latest_xid_removed;
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning
+ * or freezing any tuples and are setting an already all-visible
+ * page all-frozen in the VM. In this case, all of the tuples on
+ * the page must already be visible to all MVCC snapshots on the
+ * standby.
+ */
+ if (!do_prune && !do_freeze && do_set_vm &&
+ blk_known_av && (new_vmbits & VISIBILITYMAP_ALL_FROZEN))
+ conflict_xid = InvalidTransactionId;
+
log_heap_prune_and_freeze(relation, buffer,
- InvalidBuffer, 0,
+ vmbuffer, new_vmbits,
conflict_xid,
true,
do_set_pd_vis,
@@ -1047,6 +1107,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
END_CRIT_SECTION();
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
/*
@@ -1078,32 +1141,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
}
#endif
- /* Now set the VM */
- if (do_set_vm)
- {
- TransactionId vm_conflict_horizon;
-
- Assert((new_vmbits & VISIBILITYMAP_VALID_BITS) != 0);
-
- /*
- * The conflict horizon for that record must be the newest xmin on the
- * page. However, if the page is completely frozen, there can be no
- * conflict and the vm_conflict_horizon should remain
- * InvalidTransactionId. This includes the case that we just froze
- * all the tuples; the prune-freeze record included the conflict XID
- * already so a snapshotConflictHorizon sufficient to make everything
- * safe for REDO was logged when the page's tuples were frozen.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
- old_vmbits = visibilitymap_set(relation, blockno,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
- }
-
/* Copy information back for caller */
presult->ndeleted = prstate.ndeleted;
presult->nnewlpdead = prstate.ndead;
@@ -2261,7 +2298,7 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phase III, the heap page may be marked
+ * - VM updates: After vacuum phases I and III, the heap page may be marked
* all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 797cd51145d..cac7a4c2899 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -239,7 +239,8 @@ typedef struct PruneFreezeResult
* visibility map before updating it during phase I of vacuuming.
* new_vmbits are the state of those bits after phase I of vacuuming.
*
- * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set.
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VIS option is set and
+ * we have actually updated the VM.
*/
uint8 new_vmbits;
uint8 old_vmbits;
--
2.43.0
[text/x-patch] v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (21.1K, 19-v14-0018-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
download | inline diff:
From 6f94908b0649956e1d1abbbd5c362a57282c2c26 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Sep 2025 17:42:54 -0400
Subject: [PATCH v14 18/24] Remove XLOG_HEAP2_VISIBLE entirely
There are now no users of this, so eliminate it entirely.
This includes the xl_heap_visible struct as well as all of the functions
used to emit and replay XLOG_HEAP2_VISIBLE records.
ci-os-only:
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 40 ++--------
src/backend/access/heap/heapam_xlog.c | 96 +++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 14 ++--
src/backend/access/heap/visibilitymap.c | 83 +-------------------
src/backend/access/rmgrdesc/heapdesc.c | 10 ---
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +--
src/include/access/heapam_xlog.h | 19 -----
src/include/access/visibilitymap.h | 11 +--
src/include/access/visibilitymapdefs.h | 9 ---
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 36 insertions(+), 268 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index bb260cffa68..5f07f179415 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0323e2df409..ab514ce65ec 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2524,11 +2524,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(relation,
- BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(relation,
+ BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
}
/*
@@ -8799,36 +8799,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
-
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
- XLogRegisterBuffer(0, vm_buffer, 0);
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index c1f332f7a9a..a8908373067 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -251,8 +251,8 @@ heap_xlog_prune_freeze(XLogReaderState *record)
*
* In recovery, we expect no other writers, so writing to the VM page
* without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * is also done this way when replaying COPY FREEZE records (see
+ * heap_xlog_multi_insert()).
*/
if (vmflags & VISIBILITYMAP_VALID_BITS &&
XLogReadBufferForRedoExtended(record, 1,
@@ -268,7 +268,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- old_vmbits = visibilitymap_set_vmbits(reln, blkno, vmbuffer, vmflags);
+ old_vmbits = visibilitymap_set(reln, blkno, vmbuffer, vmflags);
/* Only set VM page LSN if we modified the page */
if (old_vmbits != vmflags)
@@ -287,81 +287,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * It is imperative that the previously emitted record set PD_ALL_VISIBLE on
- * the heap page. We must never end up with a situation where the visibility
- * map bit is set, and the page-level PD_ALL_VISIBLE bit is clear. If that
- * were to occur, then a subsequent page modification would fail to clear the
- * visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- RelFileLocator rlocator;
- BlockNumber blkno;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Even if the heap relation was dropped or truncated and the previously
- * emitted record skipped the heap page update due to this LSN interlock,
- * it's still safe to update the visibility map. Any WAL record that
- * clears the visibility map bit does so before checking the page LSN, so
- * any bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -739,8 +664,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* In recovery, we expect no other writers, so writing to the VM page
* without holding a lock on the heap page is considered safe enough. It
- * is done this way when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * is done this way when replaying xl_heap_prune records (see
+ * heap_xlog_prune_and_freeze()).
*/
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -753,10 +678,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(reln, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(reln, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
/*
* It is not possible that the VM was already set for this heap page,
@@ -1342,9 +1267,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index a14c793da7e..39d59a43ff7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1026,8 +1026,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
{
Assert(PageIsAllVisible(page));
- old_vmbits = visibilitymap_set_vmbits(relation, blockno,
- vmbuffer, new_vmbits);
+ old_vmbits = visibilitymap_set(relation, blockno,
+ vmbuffer, new_vmbits);
if (old_vmbits == new_vmbits)
{
LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ec0476a0d4..28436389d63 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1887,10 +1887,10 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
PageSetAllVisible(page);
MarkBufferDirty(buf);
- visibilitymap_set_vmbits(vacrel->rel, blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set(vacrel->rel, blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN);
if (RelationNeedsWAL(vacrel->rel))
{
@@ -2775,9 +2775,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
if ((vmflags & VISIBILITYMAP_VALID_BITS) != 0)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(vacrel->rel,
- blkno,
- vmbuffer, vmflags);
+ visibilitymap_set(vacrel->rel,
+ blkno,
+ vmbuffer, vmflags);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 75fcb3f067a..38d3131e56b 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,82 +219,6 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
- */
-uint8
-visibilitymap_set(Relation rel, BlockNumber heapBlk,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (XLogRecPtrIsInvalid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, vmBuf, cutoff_xid, flags);
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
-}
/*
* Set flags in the VM block contained in the passed in vmBuf.
@@ -318,8 +241,8 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk,
* is pinned and exclusive locked.
*/
uint8
-visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags)
+visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 1cb44ca32d3..93505cb8c56 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -460,9 +453,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..2fdd4af90a8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -454,7 +454,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 4222bdab078..c619643e121 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 833114e0a6e..61ceaf2a98b 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -451,19 +450,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -507,11 +493,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 302adf4856a..c5b1e1f7adb 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "utils/relcache.h"
@@ -31,14 +30,8 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(Relation rel, BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags);
+extern uint8 visibilitymap_set(Relation rel, BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 5ad5c020877..e01bce4c99f 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e90af5b2ad3..32c0f4719c3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4268,7 +4268,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
[text/x-patch] v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch (7.1K, 20-v14-0019-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch)
download | inline diff:
From cbfb5ee8a412651c604307cd0bd611f187ed348a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Fri, 18 Jul 2025 16:30:04 -0400
Subject: [PATCH v14 19/24] Rename GlobalVisTestIsRemovableXid() to
GlobalVisXidVisibleToAll()
Currently, we only use GlobalVisTestIsRemovableXid() to check if a
tuple's xmax is visible to all, meaning we can remove it. But future
commits will use GlobalVisTestIsRemovableXid() to test if a tuple's xmin
is visible to all for the purposes of determining if setting the page
all-visible in the VM. In that case, it makes more sense to call the
function GlobalVisXidVisibleToAll().
Reviewed-by: Kirill Reshke <[email protected]>
---
src/backend/access/heap/heapam_visibility.c | 6 +++---
src/backend/access/heap/pruneheap.c | 14 +++++++-------
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/storage/ipc/procarray.c | 13 ++++++-------
src/include/utils/snapmgr.h | 4 ++--
5 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05f6946fe60..4ebc8abdbeb 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1447,7 +1447,7 @@ HeapTupleSatisfiesNonVacuumable(HeapTuple htup, Snapshot snapshot,
{
Assert(TransactionIdIsValid(dead_after));
- if (GlobalVisTestIsRemovableXid(snapshot->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(snapshot->vistest, dead_after))
res = HEAPTUPLE_DEAD;
}
else
@@ -1512,8 +1512,8 @@ HeapTupleIsSurelyDead(HeapTuple htup, GlobalVisState *vistest)
return false;
/* Deleter committed, so tuple is dead if the XID is old enough. */
- return GlobalVisTestIsRemovableXid(vistest,
- HeapTupleHeaderGetRawXmax(tuple));
+ return GlobalVisXidVisibleToAll(vistest,
+ HeapTupleHeaderGetRawXmax(tuple));
}
/*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 39d59a43ff7..471151fae2e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -218,7 +218,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
*/
vistest = GlobalVisTestFor(relation);
- if (!GlobalVisTestIsRemovableXid(vistest, prune_xid))
+ if (!GlobalVisXidVisibleToAll(vistest, prune_xid))
return;
/*
@@ -727,9 +727,9 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* Determining HTSV only once for each tuple is required for correctness,
* to deal with cases where running HTSV twice could result in different
* results. For example, RECENTLY_DEAD can turn to DEAD if another
- * checked item causes GlobalVisTestIsRemovableFullXid() to update the
- * horizon, or INSERT_IN_PROGRESS can change to DEAD if the inserting
- * transaction aborts.
+ * checked item causes GlobalVisXidVisibleToAll() to update the horizon,
+ * or INSERT_IN_PROGRESS can change to DEAD if the inserting transaction
+ * aborts.
*
* It's also good for performance. Most commonly tuples within a page are
* stored at decreasing offsets (while the items are stored at increasing
@@ -1199,11 +1199,11 @@ heap_prune_satisfies_vacuum(PruneState *prstate, HeapTuple tup, Buffer buffer)
* Determine whether or not the tuple is considered dead when compared
* with the provided GlobalVisState. On-access pruning does not provide
* VacuumCutoffs. And for vacuum, even if the tuple's xmax is not older
- * than OldestXmin, GlobalVisTestIsRemovableXid() could find the row dead
- * if the GlobalVisState has been updated since the beginning of vacuuming
+ * than OldestXmin, GlobalVisXidVisibleToAll() could find the row dead if
+ * the GlobalVisState has been updated since the beginning of vacuuming
* the relation.
*/
- if (GlobalVisTestIsRemovableXid(prstate->vistest, dead_after))
+ if (GlobalVisXidVisibleToAll(prstate->vistest, dead_after))
return HEAPTUPLE_DEAD;
return res;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 8f8a1ad7796..496cca69410 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -536,7 +536,7 @@ vacuumRedirectAndPlaceholder(Relation index, Relation heaprel, Buffer buffer)
*/
if (dt->tupstate == SPGIST_REDIRECT &&
(!TransactionIdIsValid(dt->xid) ||
- GlobalVisTestIsRemovableXid(vistest, dt->xid)))
+ GlobalVisXidVisibleToAll(vistest, dt->xid)))
{
dt->tupstate = SPGIST_PLACEHOLDER;
Assert(opaque->nRedirection > 0);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 200f72c6e25..f67f01c17c2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -4181,8 +4181,7 @@ GlobalVisUpdate(void)
* See comment for GlobalVisState for details.
*/
bool
-GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
- FullTransactionId fxid)
+GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid)
{
/*
* If fxid is older than maybe_needed bound, it definitely is visible to
@@ -4223,7 +4222,7 @@ GlobalVisTestIsRemovableFullXid(GlobalVisState *state,
* relfrozenxid).
*/
bool
-GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
+GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid)
{
FullTransactionId fxid;
@@ -4237,7 +4236,7 @@ GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid)
*/
fxid = FullXidRelativeTo(state->definitely_needed, xid);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
@@ -4251,12 +4250,12 @@ GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableFullXid(state, fxid);
+ return GlobalVisFullXidVisibleToAll(state, fxid);
}
/*
* Convenience wrapper around GlobalVisTestFor() and
- * GlobalVisTestIsRemovableXid(), see their comments.
+ * GlobalVisTestIsVisibleXid(), see their comments.
*/
bool
GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
@@ -4265,7 +4264,7 @@ GlobalVisCheckRemovableXid(Relation rel, TransactionId xid)
state = GlobalVisTestFor(rel);
- return GlobalVisTestIsRemovableXid(state, xid);
+ return GlobalVisXidVisibleToAll(state, xid);
}
/*
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..a0ea2cfcea2 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -100,8 +100,8 @@ extern char *ExportSnapshot(Snapshot snapshot);
*/
typedef struct GlobalVisState GlobalVisState;
extern GlobalVisState *GlobalVisTestFor(Relation rel);
-extern bool GlobalVisTestIsRemovableXid(GlobalVisState *state, TransactionId xid);
-extern bool GlobalVisTestIsRemovableFullXid(GlobalVisState *state, FullTransactionId fxid);
+extern bool GlobalVisXidVisibleToAll(GlobalVisState *state, TransactionId xid);
+extern bool GlobalVisFullXidVisibleToAll(GlobalVisState *state, FullTransactionId fxid);
extern bool GlobalVisCheckRemovableXid(Relation rel, TransactionId xid);
extern bool GlobalVisCheckRemovableFullXid(Relation rel, FullTransactionId fxid);
--
2.43.0
[text/x-patch] v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patch (10.7K, 21-v14-0020-Use-GlobalVisState-to-determine-page-level-visib.patch)
download | inline diff:
From aeb0c7ed54566dfd8b67d4ad50d46938b1ccf95d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:38:24 -0400
Subject: [PATCH v14 20/24] Use GlobalVisState to determine page level
visibility
During pruning and during vacuum's third phase, we try to determine if
the whole page can be set all-visible in the visibility map. Instead of
using OldestXmin to determine if all the tuples on a page are visible to
everyone, use the GlobalVisState. This allows us to start setting the VM
during on-access pruning in a future commit.
It is possible for the GlobalVisState to change during the course of a
vacuum. In all but extraordinary cases, it moves forward, meaning more
pages could potentially be set in the VM.
Because comparing a transaction ID to the GlobalVisState requires more
operations than comparing it to another single transaction ID, we now
wait until after examining all the tuples on the page and if we have
maintained the visibility_cutoff_xid, we compare that to the
GlobalVisState just once per page. This works because if the page is
all-visible and has live, committed tuples on it, the
visibility_cutoff_xid will contain the newest xmin on the page. If
everyone can see it, the page is truly all-visible.
Doing this may mean we examine more tuples' xmins than before, as we may
have set all_visible to false sooner when encountering a live tuple
newer than OldestXmin. However, these extra comparisons were found not
to be significant in a profile.
---
src/backend/access/heap/heapam_visibility.c | 28 +++++++++++++
src/backend/access/heap/pruneheap.c | 46 +++++++++------------
src/backend/access/heap/vacuumlazy.c | 20 ++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 60 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 4ebc8abdbeb..edd529dc3c0 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1189,6 +1189,34 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Nearly the same as HeapTupleSatisfiesVacuum, but uses a GlobalVisState to
+ * determine whether or not a tuple is HEAPTUPLE_DEAD Or
+ * HEAPTUPLE_RECENTLY_DEAD. It serves the same purpose but can be used by
+ * callers that have not calculated a single OldestXmin value.
+ */
+HTSV_Result
+HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup, GlobalVisState *vistest,
+ Buffer buffer)
+{
+ TransactionId dead_after = InvalidTransactionId;
+ HTSV_Result res;
+
+ res = HeapTupleSatisfiesVacuumHorizon(htup, buffer, &dead_after);
+
+ if (res == HEAPTUPLE_RECENTLY_DEAD)
+ {
+ Assert(TransactionIdIsValid(dead_after));
+
+ if (GlobalVisXidVisibleToAll(vistest, dead_after))
+ res = HEAPTUPLE_DEAD;
+ }
+ else
+ Assert(!TransactionIdIsValid(dead_after));
+
+ return res;
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 471151fae2e..bb7a1357a89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -134,10 +134,9 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon, when setting the VM or when
+ * freezing all the live tuples on the page.
*
* NOTE: all_visible and all_frozen don't include LP_DEAD items. That's
* convenient for heap_page_prune_and_freeze(), to use them to decide
@@ -706,14 +705,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon.
- * This is most likely to happen when updating the VM and/or freezing all
- * live tuples on the page. It is updated before returning to the caller
- * because vacuum does assert-build only validation on the page using this
- * field.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState.
+ *
+ * If we encounter an uncommitted tuple, this field is unmaintained. If
+ * the page is being set all-visible or when freezing all live tuples on
+ * the page, it is used to calculate the snapshot conflict horizon.
*/
prstate.visibility_cutoff_xid = InvalidTransactionId;
@@ -909,6 +906,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.ndead > 0 ||
prstate.nunused > 0;
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them is not visible to everyone, the page cannot be
+ * all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ !GlobalVisXidVisibleToAll(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* Even if we don't prune anything, if we found a new value for the
* pd_prune_xid field or the page was marked full, we will update the hint
@@ -1129,7 +1136,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc))
Assert(false);
@@ -1655,19 +1662,6 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' when freezing is requested. We
- * could use GlobalVisTestIsRemovableXid instead, if a
- * non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 28436389d63..341115dbbbe 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -464,7 +464,7 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2733,7 +2733,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
InvalidOffsetNumber);
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3478,14 +3478,13 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ return heap_page_would_be_all_visible(rel, buf, vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3504,7 +3503,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* *all_frozen is an output parameter indicating to the caller if every tuple
* on the page is frozen.
@@ -3525,7 +3524,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3597,8 +3596,8 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
tuple.t_len = ItemIdGetLength(itemid);
tuple.t_tableOid = RelationGetRelid(rel);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin,
- buf))
+ switch (HeapTupleSatisfiesVacuumGlobalVis(&tuple, vistest,
+ buf))
{
case HEAPTUPLE_LIVE:
{
@@ -3617,8 +3616,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
* that everyone sees it as committed?
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin,
- OldestXmin))
+ if (!GlobalVisXidVisibleToAll(vistest, xmin))
{
all_visible = false;
*all_frozen = false;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index cac7a4c2899..35a25cf0b04 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -401,7 +401,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -413,6 +413,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+extern HTSV_Result HeapTupleSatisfiesVacuumGlobalVis(HeapTuple htup,
+ GlobalVisState *vistest, Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
[text/x-patch] v14-0021-Inline-TransactionIdFollows-Precedes.patch (5.0K, 22-v14-0021-Inline-TransactionIdFollows-Precedes.patch)
download | inline diff:
From 7ea26725c69aba6f269692387a6e923614181cc4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 09:57:13 -0400
Subject: [PATCH v14 21/24] Inline TransactionIdFollows/Precedes()
Calling these from on-access pruning code had noticeable overhead in a
profile. There does not seem to be a reason not to inline them.
Reviewed-by: Kirill Reshke <[email protected]>
---
src/backend/access/transam/transam.c | 64 -------------------------
src/include/access/transam.h | 70 ++++++++++++++++++++++++++--
2 files changed, 66 insertions(+), 68 deletions(-)
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 9a39451a29a..553d6756cb3 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -273,70 +273,6 @@ TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
}
-/*
- * TransactionIdPrecedes --- is id1 logically < id2?
- */
-bool
-TransactionIdPrecedes(TransactionId id1, TransactionId id2)
-{
- /*
- * If either ID is a permanent XID then we can just do unsigned
- * comparison. If both are normal, do a modulo-2^32 comparison.
- */
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 < id2);
-
- diff = (int32) (id1 - id2);
- return (diff < 0);
-}
-
-/*
- * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
- */
-bool
-TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 <= id2);
-
- diff = (int32) (id1 - id2);
- return (diff <= 0);
-}
-
-/*
- * TransactionIdFollows --- is id1 logically > id2?
- */
-bool
-TransactionIdFollows(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 > id2);
-
- diff = (int32) (id1 - id2);
- return (diff > 0);
-}
-
-/*
- * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
- */
-bool
-TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
-{
- int32 diff;
-
- if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
- return (id1 >= id2);
-
- diff = (int32) (id1 - id2);
- return (diff >= 0);
-}
-
/*
* TransactionIdLatest --- get latest XID among a main xact and its children
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 7d82cd2eb56..c9e20418275 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -255,6 +255,72 @@ typedef struct TransamVariablesData
} TransamVariablesData;
+
+/*
+ * TransactionIdPrecedes --- is id1 logically < id2?
+ */
+static inline bool
+TransactionIdPrecedes(TransactionId id1, TransactionId id2)
+{
+ /*
+ * If either ID is a permanent XID then we can just do unsigned
+ * comparison. If both are normal, do a modulo-2^32 comparison.
+ */
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 < id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff < 0);
+}
+
+/*
+ * TransactionIdPrecedesOrEquals --- is id1 logically <= id2?
+ */
+static inline bool
+TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 <= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff <= 0);
+}
+
+/*
+ * TransactionIdFollows --- is id1 logically > id2?
+ */
+static inline bool
+TransactionIdFollows(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 > id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff > 0);
+}
+
+/*
+ * TransactionIdFollowsOrEquals --- is id1 logically >= id2?
+ */
+static inline bool
+TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2)
+{
+ int32 diff;
+
+ if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
+ return (id1 >= id2);
+
+ diff = (int32) (id1 - id2);
+ return (diff >= 0);
+}
+
+
/* ----------------
* extern declarations
* ----------------
@@ -274,10 +340,6 @@ extern bool TransactionIdDidAbort(TransactionId transactionId);
extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-extern bool TransactionIdPrecedes(TransactionId id1, TransactionId id2);
-extern bool TransactionIdPrecedesOrEquals(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollows(TransactionId id1, TransactionId id2);
-extern bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2);
extern TransactionId TransactionIdLatest(TransactionId mainxid,
int nxids, const TransactionId *xids);
extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
--
2.43.0
[text/x-patch] v14-0022-Unset-all-visible-sooner-if-not-freezing.patch (2.5K, 23-v14-0022-Unset-all-visible-sooner-if-not-freezing.patch)
download | inline diff:
From eea3df3f0660f868df56fa0043c182b2fb3c0258 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:35:13 -0400
Subject: [PATCH v14 22/24] Unset all-visible sooner if not freezing
In prune/freeze code, we delay unsetting all-visible/all-frozen in the
presence of dead items to allow opportunistically freezing tuples if the
whole page would be frozen except for those dead items -- which are
removed later in vacuum's third phase.
Future commits will allow on-access pruning to set the VM, which means
all-visible will be initialized to true instead of false and we will do
extra bookkeeping in heap_prune_unchanged_lp_normal() to keep track of
whether or not the page is all-visible.
Because on-access pruning will not freeze tuples, it makes sense to
unset all-visible as soon as we encounter an LP_DEAD item and
avoid continued bookkeeping since we know the page is not all-visible
and we won't be able to remove those dead items.
---
src/backend/access/heap/pruneheap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bb7a1357a89..c29f47ab151 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1522,8 +1522,11 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible until later during pruning.
- * Removable dead tuples shouldn't preclude freezing the page.
+ * Removable dead tuples shouldn't preclude freezing the page. If we won't
+ * attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1776,8 +1779,11 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible until later, at the end of
* heap_page_prune_and_freeze(). This will allow us to attempt to freeze
* the page after pruning. As long as we unset it before updating the
- * visibility map, this will be correct.
+ * visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all-visible now.
*/
+ if (!prstate->attempt_freeze)
+ prstate->all_visible = prstate->all_frozen = false;
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
[text/x-patch] v14-0024-Set-pd_prune_xid-on-insert.patch (6.5K, 24-v14-0024-Set-pd_prune_xid-on-insert.patch)
download | inline diff:
From 0134ca707f4c64620ff26c69d703b79ec421ac91 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v14 24/24] Set pd_prune_xid on insert
Now that we can set the VM during read-only queries, it makes sense to
start setting the page prunable hint on insert. This will allow
heap_page_prune_and_freeze() to be called when the page is full or
mostly full.
For years there has been a note in heap_insert() and heap_multi_insert()
pointing out that setting pd_prune_xid would help clean up aborted
inserted tuples that would otherwise not be cleaned up until vacuum.
So, that's another benefit of setting it.
Setting pd_prune_xid on insert causes a page to be pruned and then
written out which then affects the reported number of hits in the
index-killtuples isolation test. This is a quirk of how hits are tracked
which sometimes leads them to be double counted. This should probably be
fixed or changed independently.
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../isolation/expected/index-killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 94d673d92c0..47aa9638724 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2104,6 +2104,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2163,15 +2164,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2181,7 +2186,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2545,8 +2549,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index a8908373067..a2c4e4f47fe 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -486,6 +486,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -635,9 +641,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/isolation/expected/index-killtuples.out b/src/test/isolation/expected/index-killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/isolation/expected/index-killtuples.out
+++ b/src/test/isolation/expected/index-killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
[text/x-patch] v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patch (27.9K, 25-v14-0023-Allow-on-access-pruning-to-set-pages-all-visible.patch)
download | inline diff:
From feedf2af7c6e0f025d4c0b35d7f7cb9df71e18a9 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 14:34:30 -0400
Subject: [PATCH v14 23/24] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum marked pages as all-visible or all-frozen.
Supporting this requires passing information about whether the relation
is modified from the executor down to the scan descriptor.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
---
src/backend/access/heap/heapam.c | 15 +++-
src/backend/access/heap/heapam_handler.c | 15 +++-
src/backend/access/heap/pruneheap.c | 73 +++++++++++++++----
src/backend/access/index/indexam.c | 46 ++++++++++++
src/backend/access/table/tableam.c | 39 +++++++++-
src/backend/executor/execMain.c | 4 +
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeBitmapHeapscan.c | 7 +-
src/backend/executor/nodeIndexscan.c | 18 +++--
src/backend/executor/nodeSeqscan.c | 24 ++++--
src/include/access/genam.h | 11 +++
src/include/access/heapam.h | 24 +++++-
src/include/access/relscan.h | 6 ++
src/include/access/tableam.h | 30 +++++++-
src/include/nodes/execnodes.h | 6 ++
.../t/035_standby_logical_decoding.pl | 3 +-
16 files changed, 285 insertions(+), 38 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ab514ce65ec..94d673d92c0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -555,6 +555,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -569,7 +570,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1246,6 +1249,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1284,6 +1288,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1316,6 +1326,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..f05b9e4968d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
return &hscan->xs_base;
}
@@ -99,6 +100,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -138,7 +145,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ scan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2471,6 +2479,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2517,7 +2526,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_ALLOW_VM_SET)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c29f47ab151..3eaee398735 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -45,6 +45,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
const struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -185,9 +187,13 @@ static void page_verify_redirects(Page page);
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -251,6 +257,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
{
OffsetNumber dummy_off_loc;
PruneFreezeResult presult;
+ int options = 0;
+
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ options = HEAP_PAGE_PRUNE_UPDATE_VIS;
+ }
/*
* For now, pass mark_unused_now as false regardless of whether or
@@ -258,8 +271,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
* that during on-access pruning with the current implementation.
*/
heap_page_prune_and_freeze(relation, buffer,
- InvalidBuffer, false,
- PRUNE_ON_ACCESS, 0, NULL,
+ vmbuffer ? *vmbuffer : InvalidBuffer,
+ false, /* blk_known_av */
+ PRUNE_ON_ACCESS, options, NULL,
vistest, &presult, &dummy_off_loc, NULL, NULL);
/*
@@ -443,6 +457,8 @@ heap_page_will_set_vis(Relation relation,
Buffer heap_buf,
Buffer vmbuffer,
bool blk_known_av,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
PruneState *prstate,
uint8 *vmflags,
bool *do_set_pd_vis)
@@ -450,6 +466,32 @@ heap_page_will_set_vis(Relation relation,
Page heap_page = BufferGetPage(heap_buf);
bool do_set_vm = false;
+ *do_set_pd_vis = false;
+
+ if (!prstate->attempt_update_vm)
+ {
+ Assert(!prstate->all_visible && !prstate->all_frozen);
+ Assert(*vmflags == 0);
+ return false;
+ }
+
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buf) || XLogCheckBufferNeedsBackup(heap_buf)))
+ {
+ prstate->all_visible = prstate->all_frozen = false;
+ return false;
+ }
+
if (prstate->all_visible && !PageIsAllVisible(heap_page))
*do_set_pd_vis = true;
@@ -473,6 +515,9 @@ heap_page_will_set_vis(Relation relation,
* page-level bit is clear. However, it's possible that in vacuum the bit
* got cleared after heap_vac_scan_next_block() was called, so we must
* recheck with buffer lock before concluding that the VM is corrupt.
+ *
+ * XXX: This will never trigger for on-access pruning because it passes
+ * blk_known_av as false. Should we remove that condition here?
*/
else if (blk_known_av && !PageIsAllVisible(heap_page) &&
visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
@@ -615,6 +660,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.vistest = vistest;
prstate.mark_unused_now = (options & HEAP_PAGE_PRUNE_MARK_UNUSED_NOW) != 0;
prstate.attempt_freeze = (options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate.attempt_update_vm = (options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0;
prstate.cutoffs = cutoffs;
/*
@@ -692,7 +738,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
prstate.all_visible = true;
prstate.all_frozen = true;
}
- else if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
+ else if (prstate.attempt_update_vm)
{
prstate.all_visible = true;
prstate.all_frozen = false;
@@ -951,7 +997,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
if (prstate.lpdead_items > 0)
prstate.all_visible = prstate.all_frozen = false;
- Assert(!prstate.all_frozen || prstate.all_visible);
+
/*
* Determine whether or not to set the page level PD_ALL_VISIBLE and the
@@ -968,12 +1014,12 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
* As such, it is possible to only update the VM when PD_ALL_VISIBLE is
* already set.
*/
- do_set_pd_vis = false;
- do_set_vm = false;
- if ((options & HEAP_PAGE_PRUNE_UPDATE_VIS) != 0)
- do_set_vm = heap_page_will_set_vis(relation,
- blockno, buffer, vmbuffer, blk_known_av,
- &prstate, &new_vmbits, &do_set_pd_vis);
+ do_set_vm = heap_page_will_set_vis(relation,
+ blockno, buffer, vmbuffer, blk_known_av,
+ reason, do_prune, do_freeze,
+ &prstate, &new_vmbits, &do_set_pd_vis);
+
+ Assert(!prstate.all_frozen || prstate.all_visible);
/* Lock vmbuffer before entering a critical section */
if (do_set_vm)
@@ -1133,7 +1179,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
if (!heap_page_is_all_visible(relation, buffer,
prstate.vistest,
@@ -2298,8 +2343,8 @@ heap_log_freeze_plan(HeapTupleFreeze *tuples, int ntuples,
* - Reaping: During vacuum phase III, items that are already LP_DEAD are
* marked as unused.
*
- * - VM updates: After vacuum phases I and III, the heap page may be marked
- * all-visible and all-frozen.
+ * - VM updates: After vacuum phases I and III and on-access, the heap page
+ * may be marked all-visible and all-frozen.
*
* These changes all happen together, so we use a single WAL record for them
* all.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 86d11f4ec79..4603ece09bd 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -289,6 +289,32 @@ index_beginscan(Relation heapRelation,
return scan;
}
+/*
+ * Similar to index_beginscan(), but allows the caller to indicate whether the
+ * query modifies the underlying base relation. This is used when the caller
+ * wants to attempt marking pages in the base relation as all-visible in the
+ * visibility map during on-access pruning.
+ */
+IndexScanDesc
+index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan(heapRelation,
+ indexRelation,
+ snapshot,
+ instrument,
+ nkeys, norderbys);
+
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+
+ return scan;
+}
+
/*
* index_beginscan_bitmap - start a scan of an index with amgetbitmap
*
@@ -620,6 +646,26 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
return scan;
}
+/*
+ * Parallel version of index_beginscan_vmset()
+ */
+IndexScanDesc
+index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_base_rel)
+{
+ IndexScanDesc scan;
+
+ scan = index_beginscan_parallel(heaprel, indexrel,
+ instrument,
+ nkeys, norderbys,
+ pscan);
+ scan->xs_heapfetch->modifies_base_rel = modifies_base_rel;
+ return scan;
+}
+
/* ----------------
* index_getnext_tid - get the next TID from a scan
*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index a56c5eceb14..67dbf99f5b5 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -49,6 +49,10 @@
char *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
bool synchronize_seqscans = true;
+/* Helper for table_beginscan_parallel() and table_beginscan_parallel_vmset() */
+static TableScanDesc table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags);
+
/* ----------------------------------------------------------------------------
* Slot functions.
@@ -162,12 +166,14 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
}
-TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+/*
+ * Common helper for table_beginscan_parallel() and table_beginscan_parallel_vmset()
+ */
+static TableScanDesc
+table_beginscan_parallel_common(Relation relation, ParallelTableScanDesc pscan,
+ uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
- SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -188,6 +194,31 @@ table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
pscan, flags);
}
+TableScanDesc
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
+/*
+ * Parallel version of table_beginscan_vmset()
+ */
+TableScanDesc
+table_beginscan_parallel_vmset(Relation relation, ParallelTableScanDesc pscan,
+ bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return table_beginscan_parallel_common(relation, pscan, flags);
+}
+
/* ----------------------------------------------------------------------------
* Index scan related functions.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ff12e2e1364..2e0474c948a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation is modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index fdc65c2b42b..28a06dcd244 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index bf24f3d7fe0..af6db9f7919 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,16 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ modifies_rel);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 7fcaa37fe62..c2ffbd3b08e 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,16 +102,22 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+
+ bool modifies_base_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
*/
- scandesc = index_beginscan(node->ss.ss_currentRelation,
- node->iss_RelationDesc,
- estate->es_snapshot,
- &node->iss_Instrument,
- node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ scandesc = index_beginscan_vmset(node->ss.ss_currentRelation,
+ node->iss_RelationDesc,
+ estate->es_snapshot,
+ &node->iss_Instrument,
+ node->iss_NumScanKeys,
+ node->iss_NumOrderByKeys,
+ modifies_base_rel);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index 94047d29430..fd69275c181 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,18 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
- scandesc = table_beginscan(node->ss.ss_currentRelation,
- estate->es_snapshot,
- 0, NULL);
+ scandesc = table_beginscan_vmset(node->ss.ss_currentRelation,
+ estate->es_snapshot,
+ 0, NULL, modifies_rel);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -366,6 +371,7 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
ParallelContext *pcxt)
{
EState *estate = node->ss.ps.state;
+ bool modifies_rel;
ParallelTableScanDesc pscan;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
@@ -373,8 +379,11 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ modifies_rel = bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation, pscan,
+ modifies_rel);
}
/* ----------------------------------------------------------------
@@ -404,8 +413,13 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ bool modifies_rel =
+ bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids);
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel_vmset(node->ss.ss_currentRelation,
+ pscan,
+ modifies_rel);
}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 5b2ab181b5f..bf272c2c37f 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -180,6 +180,11 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys);
+extern IndexScanDesc index_beginscan_vmset(Relation heapRelation,
+ Relation indexRelation,
+ Snapshot snapshot,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys, bool modifies_heap_rel);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
@@ -206,6 +211,12 @@ extern IndexScanDesc index_beginscan_parallel(Relation heaprel,
IndexScanInstrumentation *instrument,
int nkeys, int norderbys,
ParallelIndexScanDesc pscan);
+
+extern IndexScanDesc index_beginscan_parallel_vmset(Relation heaprel, Relation indexrel,
+ IndexScanInstrumentation *instrument,
+ int nkeys, int norderbys,
+ ParallelIndexScanDesc pscan,
+ bool modifies_rel);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
struct TupleTableSlot;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 35a25cf0b04..4da629067d1 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -94,6 +94,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -116,8 +123,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
@@ -369,7 +386,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
/* in heap/pruneheap.c */
struct GlobalVisState;
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(Relation relation, Buffer buffer,
Buffer vmbuffer, bool blk_known_av,
PruneReason reason,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index b5e0fb386c0..f496e0b4939 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -121,6 +121,12 @@ typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
typedef struct IndexFetchTableData
{
Relation rel;
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchTableData;
struct IndexScanInstrumentation;
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index b2ce35e2a34..e31c21cf8eb 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -62,6 +62,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* whether or not scan should attempt to set the VM */
+ SO_ALLOW_VM_SET = 1 << 10,
} ScanOptions;
/*
@@ -881,6 +883,25 @@ table_beginscan(Relation rel, Snapshot snapshot,
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
}
+/*
+ * Similar to table_beginscan(), but allows the caller to indicate whether the
+ * query modifies the relation. This is used when the caller wants to attempt
+ * marking pages in the relation as all-visible in the visibility map during
+ * on-access pruning.
+ */
+static inline TableScanDesc
+table_beginscan_vmset(Relation rel, Snapshot snapshot,
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
+{
+ uint32 flags = SO_TYPE_SEQSCAN |
+ SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
+
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
+ return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
+}
+
/*
* Like table_beginscan(), but for scanning catalog. It'll automatically use a
* snapshot appropriate for scanning catalog relations.
@@ -918,10 +939,13 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, struct ScanKeyData *key)
+ int nkeys, struct ScanKeyData *key, bool modifies_rel)
{
uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ if (!modifies_rel)
+ flags |= SO_ALLOW_VM_SET;
+
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
}
@@ -1130,6 +1154,10 @@ extern void table_parallelscan_initialize(Relation rel,
extern TableScanDesc table_beginscan_parallel(Relation relation,
ParallelTableScanDesc pscan);
+extern TableScanDesc table_beginscan_parallel_vmset(Relation relation,
+ ParallelTableScanDesc pscan,
+ bool modifies_rel);
+
/*
* Restart a parallel scan. Call this in the leader process. Caller is
* responsible for making sure that all workers have finished the scan
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3a920cc7d17..c854be93436 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query either through
+ * UPDATE/DELETE/INSERT/MERGE or SELECT FOR UPDATE
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index c9c182892cf..f5c0c65b260 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -745,7 +746,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
view thread (143+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
In-Reply-To: <CAAKRu_YOJ3VTKo4Z9vB2hGeTnwVWsL39gXH09vyBUQ7bGtDnKA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox