public inbox for [email protected]
help / color / mirror / Atom feedRe: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
17+ messages / 6 participants
[nested] [flat]
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
@ 2026-01-06 09:40 Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
0 siblings, 1 reply; 17+ messages in thread
From: Andrey Borodin @ 2026-01-06 09:40 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
> On 6 Jan 2026, at 00:24, Melanie Plageman <[email protected]> wrote:
>
> <v32-0014-Pass-down-information-on-table-modification-to-s.patch>
I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
Best regards, Andrey Borodin.
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
@ 2026-01-06 17:31 ` Melanie Plageman <[email protected]>
2026-01-07 05:55 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-07 08:14 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
0 siblings, 3 replies; 17+ messages in thread
From: Melanie Plageman @ 2026-01-06 17:31 UTC (permalink / raw)
To: Andrey Borodin <[email protected]>; +Cc: Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Tue, Jan 6, 2026 at 4:40 AM Andrey Borodin <[email protected]> wrote:
>
> > <v32-0014-Pass-down-information-on-table-modification-to-s.patch>
>
> I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
I've added attributed your review on the patches you specifically
mention here (and from previous emails you sent). Let me know if there
are other patches you reviewed that you did not mention.
> In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
Great point, I simply hadn't tested those cases and didn't think to
add them. I've added them in attached v33.
While looking at other callers of index_beginscan(), I was wondering
if systable_beginscan() and systable_beginscan_ordered() should ever
pass SO_HINT_REL_READ_ONLY. I guess we would need to pass if the
operation is read-only above the index_beginscan() -- I'm not sure if
we always know in the caller of systable_beginscan() whether this
operation will modify the catalog. That seems like it could be a
separate project, though, so maybe it is better to say this feature is
just for regular tables.
As for the other cases: We don't have the relation range table index
in check_exclusion_or_unique_constraints(), so I don't think we can do
it there.
And I think that the other index scan cases like in replication code
or get_actual_variable_endpoint() are too small to be worth it, don't
have the needed info, or don't do on-access pruning (bc of the
snapshot type they use).
> Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
Ah, yes, I forgot to remove that when I removed the old
visibilitymap_set() and made visibilitymap_set_vmbits() into
visiblitymap_set(). Done in v33.
- Melanie
Attachments:
[text/x-patch] v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch (10.2K, 2-v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch)
download | inline diff:
From 8e1286c1a6dbfe3309d111aaa21af5a8e6237bb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 8 Dec 2025 15:49:54 -0500
Subject: [PATCH v33 01/16] Combine visibilitymap_set() cases in
lazy_scan_prune()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
lazy_scan_prune() previously had two separate cases that called
visibilitymap_set() after pruning and freezing. These branches were
nearly identical except that one attempted to avoid dirtying the heap
buffer. However, that situation can never occur — the heap buffer cannot
be clean at that point (and we would hit an assertion if it were).
In lazy_scan_prune(), when we change a previously all-visible page to
all-frozen and the page was recorded as all-visible in the visibility
map by find_next_unskippable_block(), the heap buffer will always be
dirty. Either we have just frozen a tuple and already dirtied the
buffer, or the buffer was modified between find_next_unskippable_block()
and heap_page_prune_and_freeze() and then pruned in
heap_page_prune_and_freeze().
Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
attempting to add a clean heap buffer to the WAL chain would assert out
anyway.
Since the “clean heap buffer with already set VM” case is impossible,
the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
Doing so makes the intent clearer and emphasizes that the heap buffer
must always be marked dirty before being added to the WAL chain.
This commit also adds a test case for vacuuming when no heap
modifications are required. Currently this ensures that the heap buffer
is marked dirty before it is added to the WAL chain, but if we later
remove the heap buffer from the VM-set WAL chain or pass it with the
REGBUF_NO_CHANGES flag, this test would guard that behavior.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
---
.../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
3 files changed, 82 insertions(+), 69 deletions(-)
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index 09fa5933a35..e10f1706015 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -204,6 +205,49 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+ pg_truncate_visibility_map
+----------------------------
+
+(1 row)
+
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (0,0)
+(1 row)
+
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+ ?column?
+----------
+ t
+(1 row)
+
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+ pg_visibility_map_summary
+---------------------------
+ (1,1)
+(1 row)
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 5af06ec5b76..57af8a0c5b6 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,4 +1,5 @@
CREATE EXTENSION pg_visibility;
+CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -94,6 +95,25 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
+-- test the case where vacuum phase I does not need to modify the heap buffer
+-- and only needs to set the VM
+create table test_vac_unmodified_heap(a int);
+insert into test_vac_unmodified_heap values (1);
+vacuum (freeze) test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+-- though the VM is truncated, the heap page-level visibility hint,
+-- PD_ALL_VISIBLE should still be set
+SELECT (flags & x'0004'::int) <> 0
+ FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
+-- vacuum sets the VM
+vacuum test_vac_unmodified_heap;
+select pg_visibility_map_summary('test_vac_unmodified_heap');
+
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2086a577199..2da35c85e76 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2122,16 +2122,14 @@ lazy_scan_prune(LVRelState *vacrel,
* of last heap_vac_scan_next_block() call), and from all_visible and
* all_frozen variables
*/
- if (!all_visible_according_to_vm && presult.all_visible)
+ if ((presult.all_visible && !all_visible_according_to_vm) ||
+ (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
{
uint8 old_vmbits;
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
if (presult.all_frozen)
- {
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
flags |= VISIBILITYMAP_ALL_FROZEN;
- }
/*
* It should never be the case that the visibility map page is set
@@ -2139,15 +2137,25 @@ lazy_scan_prune(LVRelState *vacrel,
* checksums are not enabled). Regardless, set both bits so that we
* get back in sync.
*
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page. However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
+ * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
+ * unnecessarily dirtying the heap buffer. Nearly the only scenario
+ * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
+ * removed -- and that isn't worth optimizing for. And if we add the
+ * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
+ * it must be marked dirty.
*/
PageSetAllVisible(page);
MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
InvalidXLogRecPtr,
vmbuffer, presult.vm_conflict_horizon,
@@ -2219,65 +2227,6 @@ lazy_scan_prune(LVRelState *vacrel,
VISIBILITYMAP_VALID_BITS);
}
- /*
- * If the all-visible page is all-frozen but not marked as such yet, mark
- * it as all-frozen.
- */
- else if (all_visible_according_to_vm && presult.all_frozen &&
- !VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- {
- uint8 old_vmbits;
-
- /*
- * Avoid relying on all_visible_according_to_vm as a proxy for the
- * page-level PD_ALL_VISIBLE bit being set, since it might have become
- * stale -- even when all_visible is set
- */
- if (!PageIsAllVisible(page))
- {
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
- }
-
- /*
- * Set the page all-frozen (and all-visible) in the VM.
- *
- * We can pass InvalidTransactionId as our cutoff_xid, since a
- * snapshotConflictHorizon sufficient to make everything safe for REDO
- * was logged when the page's tuples were frozen.
- */
- Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
-
- /*
- * The page was likely already set all-visible in the VM. However,
- * there is a small chance that it was modified sometime between
- * setting all_visible_according_to_vm and checking the visibility
- * during pruning. Check the return value of old_vmbits anyway to
- * ensure the visibility map counters used for logging are accurate.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
-
- /*
- * We already checked that the page was not set all-frozen in the VM
- * above, so we don't need to test the value of old_vmbits.
- */
- else
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
-
return presult.ndeleted;
}
--
2.43.0
[text/x-patch] v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch (16.9K, 3-v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch)
download | inline diff:
From 4d37243f9fa0dc4e264a28bcee448787fb8d7f65 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Thu, 11 Dec 2025 10:48:13 -0500
Subject: [PATCH v33 02/16] Eliminate use of cached VM value in
lazy_scan_prune()
lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
whether the page was marked all-visible in the VM at the time it was
last checked in find_next_unskippable_block(). This behavior is
historical, dating back to commit 608195a3a365, when we did not pin the
VM page until deciding we must read it. Now that the VM page is already
pinned, there is no meaningful benefit to relying on a cached VM status.
Removing this cached value simplifies the logic in both lazy_scan_heap()
and lazy_scan_prune(). It also clarifies future work that will set the
visibility map on-access: such paths will not have a cached value
available, which would make the logic harder to reason about. And
eliminating it enables us to detect and repair VM corruption on-access.
Along with removing the cached value and unconditionally checking the
visibility status of the heap page, this commit also moves the VM
corruption handling to occur first. This reordering should have no
performance impact, since the checks are inexpensive and performed only
once per page. It does, however, make the control flow easier to
understand. The new restructuring also makes it possible to set the VM
after fixing corruption (if pruning found the page all-visible).
Now that no callers of visibilitymap_set() use its return value, change
its (and visibilitymap_set_vmbits()) return type to void.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Xuneng Zhou <[email protected]>
Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
---
src/backend/access/heap/vacuumlazy.c | 182 +++++++++++-------------
src/backend/access/heap/visibilitymap.c | 9 +-
src/include/access/visibilitymap.h | 18 +--
3 files changed, 94 insertions(+), 115 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 2da35c85e76..3733a1cbc47 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -248,13 +248,6 @@ typedef enum
*/
#define EAGER_SCAN_REGION_SIZE 4096
-/*
- * heap_vac_scan_next_block() sets these flags to communicate information
- * about the block it read to the caller.
- */
-#define VAC_BLK_WAS_EAGER_SCANNED (1 << 0)
-#define VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM (1 << 1)
-
typedef struct LVRelState
{
/* Target heap relation and its indexes */
@@ -360,7 +353,6 @@ typedef struct LVRelState
/* State maintained by heap_vac_scan_next_block() */
BlockNumber current_block; /* last block returned */
BlockNumber next_unskippable_block; /* next unskippable block */
- bool next_unskippable_allvis; /* its visibility status */
bool next_unskippable_eager_scanned; /* if it was eagerly scanned */
Buffer next_unskippable_vmbuffer; /* buffer containing its VM bit */
@@ -434,7 +426,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
bool sharelock, Buffer vmbuffer);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
- Buffer vmbuffer, bool all_visible_according_to_vm,
+ Buffer vmbuffer,
bool *has_lpdead_items, bool *vm_page_frozen);
static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
@@ -1277,7 +1269,6 @@ lazy_scan_heap(LVRelState *vacrel)
/* Initialize for the first heap_vac_scan_next_block() call */
vacrel->current_block = InvalidBlockNumber;
vacrel->next_unskippable_block = InvalidBlockNumber;
- vacrel->next_unskippable_allvis = false;
vacrel->next_unskippable_eager_scanned = false;
vacrel->next_unskippable_vmbuffer = InvalidBuffer;
@@ -1293,13 +1284,13 @@ lazy_scan_heap(LVRelState *vacrel)
MAIN_FORKNUM,
heap_vac_scan_next_block,
vacrel,
- sizeof(uint8));
+ sizeof(bool));
while (true)
{
Buffer buf;
Page page;
- uint8 blk_info = 0;
+ bool was_eager_scanned = false;
int ndeleted = 0;
bool has_lpdead_items;
void *per_buffer_data = NULL;
@@ -1368,13 +1359,13 @@ lazy_scan_heap(LVRelState *vacrel)
if (!BufferIsValid(buf))
break;
- blk_info = *((uint8 *) per_buffer_data);
+ was_eager_scanned = *((bool *) per_buffer_data);
CheckBufferIsPinnedOnce(buf);
page = BufferGetPage(buf);
blkno = BufferGetBlockNumber(buf);
vacrel->scanned_pages++;
- if (blk_info & VAC_BLK_WAS_EAGER_SCANNED)
+ if (was_eager_scanned)
vacrel->eager_scanned_pages++;
/* Report as block scanned, update error traceback information */
@@ -1445,7 +1436,6 @@ lazy_scan_heap(LVRelState *vacrel)
if (got_cleanup_lock)
ndeleted = lazy_scan_prune(vacrel, buf, blkno, page,
vmbuffer,
- blk_info & VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM,
&has_lpdead_items, &vm_page_frozen);
/*
@@ -1462,8 +1452,7 @@ lazy_scan_heap(LVRelState *vacrel)
* exclude pages skipped due to cleanup lock contention from eager
* freeze algorithm caps.
*/
- if (got_cleanup_lock &&
- (blk_info & VAC_BLK_WAS_EAGER_SCANNED))
+ if (got_cleanup_lock && was_eager_scanned)
{
/* Aggressive vacuums do not eager scan. */
Assert(!vacrel->aggressive);
@@ -1630,7 +1619,6 @@ heap_vac_scan_next_block(ReadStream *stream,
{
BlockNumber next_block;
LVRelState *vacrel = callback_private_data;
- uint8 blk_info = 0;
/* relies on InvalidBlockNumber + 1 overflowing to 0 on first call */
next_block = vacrel->current_block + 1;
@@ -1693,8 +1681,8 @@ heap_vac_scan_next_block(ReadStream *stream,
* otherwise they would've been unskippable.
*/
vacrel->current_block = next_block;
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- *((uint8 *) per_buffer_data) = blk_info;
+ /* Block was not eager scanned */
+ *((bool *) per_buffer_data) = false;
return vacrel->current_block;
}
else
@@ -1706,11 +1694,7 @@ heap_vac_scan_next_block(ReadStream *stream,
Assert(next_block == vacrel->next_unskippable_block);
vacrel->current_block = next_block;
- if (vacrel->next_unskippable_allvis)
- blk_info |= VAC_BLK_ALL_VISIBLE_ACCORDING_TO_VM;
- if (vacrel->next_unskippable_eager_scanned)
- blk_info |= VAC_BLK_WAS_EAGER_SCANNED;
- *((uint8 *) per_buffer_data) = blk_info;
+ *((bool *) per_buffer_data) = vacrel->next_unskippable_eager_scanned;
return vacrel->current_block;
}
}
@@ -1735,7 +1719,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
bool next_unskippable_eager_scanned = false;
- bool next_unskippable_allvis;
*skipsallvis = false;
@@ -1745,7 +1728,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
next_unskippable_block,
&next_unskippable_vmbuffer);
- next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
/*
* At the start of each eager scan region, normal vacuums with eager
@@ -1764,7 +1746,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
* A block is unskippable if it is not all visible according to the
* visibility map.
*/
- if (!next_unskippable_allvis)
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
break;
@@ -1821,7 +1803,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
/* write the local variables back to vacrel */
vacrel->next_unskippable_block = next_unskippable_block;
- vacrel->next_unskippable_allvis = next_unskippable_allvis;
vacrel->next_unskippable_eager_scanned = next_unskippable_eager_scanned;
vacrel->next_unskippable_vmbuffer = next_unskippable_vmbuffer;
}
@@ -1982,9 +1963,7 @@ cmpOffsetNumbers(const void *a, const void *b)
* Caller must hold pin and buffer cleanup lock on the buffer.
*
* vmbuffer is the buffer containing the VM block with visibility information
- * for the heap block, blkno. all_visible_according_to_vm is the saved
- * visibility status of the heap block looked up earlier by the caller. We
- * won't rely entirely on this status, as it may be out of date.
+ * for the heap block, blkno.
*
* *has_lpdead_items is set to true or false depending on whether, upon return
* from this function, any LP_DEAD items are still present on the page.
@@ -2001,7 +1980,6 @@ lazy_scan_prune(LVRelState *vacrel,
BlockNumber blkno,
Page page,
Buffer vmbuffer,
- bool all_visible_according_to_vm,
bool *has_lpdead_items,
bool *vm_page_frozen)
{
@@ -2015,6 +1993,8 @@ lazy_scan_prune(LVRelState *vacrel,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
+ uint8 old_vmbits = 0;
+ uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2117,70 +2097,7 @@ lazy_scan_prune(LVRelState *vacrel,
Assert(!presult.all_visible || !(*has_lpdead_items));
Assert(!presult.all_frozen || presult.all_visible);
- /*
- * Handle setting visibility map bit based on information from the VM (as
- * of last heap_vac_scan_next_block() call), and from all_visible and
- * all_frozen variables
- */
- if ((presult.all_visible && !all_visible_according_to_vm) ||
- (presult.all_frozen && !VM_ALL_FROZEN(rel, blkno, &vmbuffer)))
- {
- uint8 old_vmbits;
- uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- flags |= VISIBILITYMAP_ALL_FROZEN;
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * Even if PD_ALL_VISIBLE is already set, we don't need to worry about
- * unnecessarily dirtying the heap buffer. Nearly the only scenario
- * where PD_ALL_VISIBLE is set but the VM is not is if the VM was
- * removed -- and that isn't worth optimizing for. And if we add the
- * heap buffer to the WAL chain (without passing REGBUF_NO_CHANGES),
- * it must be marked dirty.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- flags);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the
- * VM, count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
+ old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
/*
* As of PostgreSQL 9.2, the visibility map bit should never be set if the
@@ -2188,8 +2105,8 @@ lazy_scan_prune(LVRelState *vacrel,
* cleared after heap_vac_scan_next_block() was called, so we must recheck
* with buffer lock before concluding that the VM is corrupt.
*/
- else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
- visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+ if (!PageIsAllVisible(page) &&
+ (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
{
ereport(WARNING,
(errcode(ERRCODE_DATA_CORRUPTED),
@@ -2198,6 +2115,8 @@ lazy_scan_prune(LVRelState *vacrel,
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
}
/*
@@ -2225,6 +2144,71 @@ lazy_scan_prune(LVRelState *vacrel,
MarkBufferDirty(buf);
visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
VISIBILITYMAP_VALID_BITS);
+ /* VM bits are now clear */
+ old_vmbits = 0;
+ }
+
+ if (!presult.all_visible)
+ return presult.ndeleted;
+
+ /* Set the visibility map and page visibility hint */
+ new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (presult.all_frozen)
+ new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ /* Nothing to do */
+ if (old_vmbits == new_vmbits)
+ return presult.ndeleted;
+
+ Assert(presult.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set while
+ * the page-level bit is clear, but the reverse is allowed (if checksums
+ * are not enabled). Regardless, set both bits so that we get back in
+ * sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL chain
+ * when setting the VM. We don't worry about unnecessarily dirtying the
+ * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
+ * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
+ * the VM bits clear, so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buf);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId as
+ * the cutoff_xid, since a snapshot conflict horizon sufficient to make
+ * everything safe for REDO was logged when the page's tuples were frozen.
+ */
+ Assert(!presult.all_frozen ||
+ !TransactionIdIsValid(presult.vm_conflict_horizon));
+
+ visibilitymap_set(vacrel->rel, blkno, buf,
+ InvalidXLogRecPtr,
+ vmbuffer, presult.vm_conflict_horizon,
+ new_vmbits);
+
+ /*
+ * If the page wasn't already set all-visible and/or all-frozen in the VM,
+ * count it as newly set for logging.
+ */
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ vacrel->vm_new_visible_pages++;
+ if (presult.all_frozen)
+ {
+ vacrel->vm_new_visible_frozen_pages++;
+ *vm_page_frozen = true;
+ }
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult.all_frozen)
+ {
+ vacrel->vm_new_frozen_pages++;
+ *vm_page_frozen = true;
}
return presult.ndeleted;
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 2382d18f72b..3047bd46def 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -240,10 +240,8 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
* You must pass a buffer containing the correct map page to this function.
* Call visibilitymap_pin first to pin the right one. This function doesn't do
* any I/O.
- *
- * Returns the state of the page's VM bits before setting flags.
*/
-uint8
+void
visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
uint8 flags)
@@ -320,7 +318,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
}
LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
- return status;
}
/*
@@ -343,7 +340,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
*
* rlocator is used only for debugging messages.
*/
-uint8
+void
visibilitymap_set_vmbits(BlockNumber heapBlk,
Buffer vmBuf, uint8 flags,
const RelFileLocator rlocator)
@@ -386,8 +383,6 @@ visibilitymap_set_vmbits(BlockNumber heapBlk,
map[mapByte] |= (flags << mapOffset);
MarkBufferDirty(vmBuf);
}
-
- return status;
}
/*
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index 47ad489a9a7..a0166c5b410 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -32,15 +32,15 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern uint8 visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern uint8 visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(Relation rel,
+ BlockNumber heapBlk, Buffer heapBuf,
+ XLogRecPtr recptr,
+ Buffer vmBuf,
+ TransactionId cutoff_xid,
+ uint8 flags);
+extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
--
2.43.0
[text/x-patch] v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch (6.7K, 4-v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch)
download | inline diff:
From 0fc1b4cbb4e67b193eca8347dca1bf8053d2020e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 13:36:39 -0500
Subject: [PATCH v33 03/16] Refactor lazy_scan_prune() VM clear logic into
helper
Encapsulating them in a helper makes the whole function clearer. There
is no functional change other than moving it into a helper.
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/vacuumlazy.c | 132 +++++++++++++++++----------
1 file changed, 85 insertions(+), 47 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 3733a1cbc47..5857fd1bfb6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,6 +424,11 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1957,6 +1962,83 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2099,54 +2181,10 @@ lazy_scan_prune(LVRelState *vacrel,
old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(page) &&
- (old_vmbits & VISIBILITYMAP_VALID_BITS) != 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
+ if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
+ presult.lpdead_items, vmbuffer,
+ old_vmbits))
old_vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (presult.lpdead_items > 0 && PageIsAllVisible(page))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- vacrel->relname, blkno)));
-
- PageClearAllVisible(page);
- MarkBufferDirty(buf);
- visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- /* VM bits are now clear */
- old_vmbits = 0;
- }
if (!presult.all_visible)
return presult.ndeleted;
--
2.43.0
[text/x-patch] v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch (26.8K, 5-v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch)
download | inline diff:
From 5c65e73246b4968ddfa9d3739f53d0d8734b8727 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v33 04/16] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/pruneheap.c | 315 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 150 +------------
src/include/access/heapam.h | 20 ++
3 files changed, 299 insertions(+), 186 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index af788b29714..53b7711ab21 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -775,10 +795,148 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * Returns true if it cleared corruption and false otherwise.
+ */
+static bool
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ if (identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ *old_vmbits))
+ *old_vmbits = 0;
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -793,12 +951,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -823,13 +982,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1011,6 +1175,65 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear, but the reverse is allowed (if
+ * checksums are not enabled). Regardless, set both bits so that we
+ * get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+ }
+
+ /* Save the vmbits for caller */
+ presult->old_vmbits = old_vmbits;
+ presult->new_vmbits = new_vmbits;
}
@@ -1485,6 +1708,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5857fd1bfb6..fe816299f4b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -424,11 +424,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits);
+
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1962,83 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * Returns true if it cleared corruption and false otherwise.
- */
-static bool
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- return true;
- }
-
- return false;
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2070,13 +1989,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2176,75 +2094,25 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- if (identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- old_vmbits))
- old_vmbits = 0;
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- Assert(presult.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear, but the reverse is allowed (if checksums
- * are not enabled). Regardless, set both bits so that we get back in
- * sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
/*
* If the page wasn't already set all-visible and/or all-frozen in the VM,
* count it as newly set for logging.
*/
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
{
vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
+ if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
vacrel->vm_new_visible_frozen_pages++;
*vm_page_frozen = true;
}
}
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
+ else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
{
+ Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
vacrel->vm_new_frozen_pages++;
*vm_page_frozen = true;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ce48fac42ba..2c07e197dc8 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * old_vmbits are the state of the all-visible and all-frozen bits in the
+ * visibility map before updating it during phase I of vacuuming.
+ * new_vmbits are the state of those bits after phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ uint8 new_vmbits;
+ uint8 old_vmbits;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
[text/x-patch] v33-0005-Move-VM-assert-into-prune-freeze-code.patch (10.9K, 6-v33-0005-Move-VM-assert-into-prune-freeze-code.patch)
download | inline diff:
From 0162c78c42764cdb0ecf0ad82eb616954d15a94d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:57:34 -0500
Subject: [PATCH v33 05/16] Move VM assert into prune/freeze code
This is a step toward setting the VM in the same WAL record as pruning
and freezing. It moves the check of the heap page into prune/freeze code
before setting the VM. This allows us to remove some fields of the
PruneFreezeResult.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/pruneheap.c | 86 ++++++++++++++++++++++------
src/backend/access/heap/vacuumlazy.c | 68 +---------------------
src/include/access/heapam.h | 25 +++-----
3 files changed, 77 insertions(+), 102 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 53b7711ab21..85ac1a54882 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -932,6 +932,31 @@ heap_page_will_set_vm(PruneState *prstate,
return true;
}
+#ifdef USE_ASSERT_CHECKING
+
+/*
+ * Wrapper for heap_page_would_be_all_visible() which can be used for callers
+ * that expect no LP_DEAD on the page. Currently assert-only, but there is no
+ * reason not to use it outside of asserts.
+ */
+static bool
+heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum)
+{
+
+ return heap_page_would_be_all_visible(rel, buf,
+ OldestXmin,
+ NULL, 0,
+ all_frozen,
+ visibility_cutoff_xid,
+ logging_offnum);
+}
+#endif
+
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
@@ -985,6 +1010,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1142,23 +1168,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1176,6 +1187,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1222,12 +1273,11 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index fe816299f4b..b7d834969d6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -459,20 +459,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
-static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- OffsetNumber *deadoffsets,
- int ndeadoffsets,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
static void update_relstats_all_indexes(LVRelState *vacrel);
static void vacuum_error_callback(void *arg);
static void update_vacuum_error_info(LVRelState *vacrel,
@@ -2035,32 +2021,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3522,29 +3482,6 @@ dead_items_cleanup(LVRelState *vacrel)
vacrel->pvs = NULL;
}
-#ifdef USE_ASSERT_CHECKING
-
-/*
- * Wrapper for heap_page_would_be_all_visible() which can be used for callers
- * that expect no LP_DEAD on the page. Currently assert-only, but there is no
- * reason not to use it outside of asserts.
- */
-static bool
-heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum)
-{
-
- return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
- NULL, 0,
- all_frozen,
- visibility_cutoff_xid,
- logging_offnum);
-}
-#endif
/*
* Check whether the heap page in buf is all-visible except for the dead
@@ -3568,15 +3505,12 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* - *logging_offnum: OffsetNumber of current tuple being processed;
* used by vacuum's error callback system.
*
- * Callers looking to verify that the page is already all-visible can call
- * heap_page_is_all_visible().
- *
* This logic is closely related to heap_prune_record_unchanged_lp_normal().
* If you modify this function, ensure consistency with that code. An
* assertion cross-checks that both remain in agreement. Do not introduce new
* side-effects.
*/
-static bool
+bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2c07e197dc8..e0da1f7cdcc 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* old_vmbits are the state of the all-visible and all-frozen bits in the
* visibility map before updating it during phase I of vacuuming.
@@ -453,6 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
+extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ OffsetNumber *deadoffsets,
+ int ndeadoffsets,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
--
2.43.0
[text/x-patch] v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.2K, 7-v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
download | inline diff:
From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v33 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 275 ++++++++++++++++------------
1 file changed, 157 insertions(+), 118 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 85ac1a54882..b3ea42f1be1 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -795,6 +800,68 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be visible to all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ do_set_vm &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshotConflictHorizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ */
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are updating the VM, the conflict horizon is almost always the
+ * visibility cutoff XID.
+ *
+ * Separately, if we are freezing any tuples, as an optimization, we can
+ * use the visibility_cutoff_xid as the conflict horizon if the page will
+ * be all-frozen. This is true even if there are LP_DEAD line pointers
+ * because we ignored those when maintaining the visibility_cutoff_xid.
+ * This will have been calculated earlier as the frz_conflict_horizon when
+ * we determined we would freeze.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -1010,7 +1077,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1018,6 +1084,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1081,6 +1148,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1102,14 +1200,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1123,6 +1224,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1130,29 +1251,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1162,43 +1266,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1213,7 +1282,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1223,67 +1293,36 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
+ presult->new_vmbits = new_vmbits;
+ presult->old_vmbits = old_vmbits;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
- if (do_set_vm)
+ if (prstate.attempt_freeze)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled). Regardless, set both bits so that we
- * get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
}
-
- /* Save the vmbits for caller */
- presult->old_vmbits = old_vmbits;
- presult->new_vmbits = new_vmbits;
}
--
2.43.0
[text/x-patch] v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 8-v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
download | inline diff:
From 8a3d02ccb9165d53e50c391dd4d71cc108c9ef15 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v33 07/16] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b7d834969d6..afa2c3af833 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1896,9 +1896,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1915,13 +1918,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
[text/x-patch] v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.9K, 9-v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
download | inline diff:
From bdda4434c391863526b8f93a95ab398595a6906b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v33 08/16] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 110 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 371 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ad9d6338ec2..f219c7a71cf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2542,11 +2542,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8831,50 +8831,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f765345e9e4..9a29fda3601 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3ea42f1be1..cac09dff31f 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1240,8 +1240,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index afa2c3af833..4d7e1636526 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1918,11 +1918,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2800,9 +2800,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
* This function is intended for callers that log VM changes together
* with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
*
* vmBuf must be pinned and exclusively locked, and it must cover the VM bits
* corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e25dd6bc366..f7ddb56fc30 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -449,7 +449,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index afffab77106..f8681dcc9c7 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b9e671fcda8..308cfff999e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4331,7 +4331,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
[text/x-patch] v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patch (2.4K, 10-v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patch)
download | inline diff:
From 6fee46f117980751d5ca5c73a08fe8823de50414 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v33 09/16] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4d7e1636526..4b2a26f7336 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3570,6 +3570,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3609,12 +3610,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3647,8 +3650,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+ case HEAPTUPLE_DEAD:
case HEAPTUPLE_INSERT_IN_PROGRESS:
case HEAPTUPLE_DELETE_IN_PROGRESS:
{
--
2.43.0
[text/x-patch] v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch (4.7K, 11-v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch)
download | inline diff:
From 45fce23bcffa39701fc25ccd67ad455edfb99a0f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v33 10/16] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 13 +++++++++----
src/backend/commands/analyze.c | 6 +-----
src/include/access/tableam.h | 5 ++---
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 09a456e9966..df2440e82a7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1026,7 +1026,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
+
+ case HEAPTUPLE_DEAD:
/* Count dead and recently-dead rows */
*deadrows += 1;
break;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a483424152c..53adac9139b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
double rowstoskip = -1; /* -1 means not set yet */
uint32 randseed; /* Seed for block sampler(s) */
BlockNumber totalblocks;
- TransactionId OldestXmin;
BlockSamplerData bs;
ReservoirStateData rstate;
TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
- /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
/* Prepare for sampling block numbers */
randseed = pg_prng_uint32(&pg_global_prng_state);
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
{
vacuum_delay_point(true);
- while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+ while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
{
/*
* The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2ec5289d4d..c9fa9f259cd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
* tuples.
*/
static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
[text/x-patch] v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (11.9K, 12-v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
download | inline diff:
From 834ce896a3cc2d38b9506db702863182c0b3e166 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v33 11/16] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/heapam_visibility.c | 22 +++++++++
src/backend/access/heap/pruneheap.c | 53 ++++++++++-----------
src/backend/access/heap/vacuumlazy.c | 38 ++++++++++-----
src/include/access/heapam.h | 4 +-
4 files changed, 76 insertions(+), 41 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 05e70b7d92a..b4489020609 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1053,6 +1053,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cac09dff31f..da09c769b4d 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. As soon as a tuple is encountered that is not visible to
+ * all, this field is unmaintained. As long as it is maintained, it can be
+ * used to calculate the snapshot conflict horizon when updating the VM
+ * and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1008,14 +1009,14 @@ heap_page_will_set_vm(PruneState *prstate,
*/
static bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -1102,6 +1103,16 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ prstate.all_visible = prstate.all_frozen = false;
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1283,10 +1294,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1807,28 +1817,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b2a26f7336..c97ad2a931a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2754,7 +2754,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3519,7 +3519,7 @@ dead_items_cleanup(LVRelState *vacrel)
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3535,7 +3535,7 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3618,7 +3618,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
Assert(!TransactionIdIsValid(dead_after));
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3627,16 +3627,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3667,6 +3668,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index e0da1f7cdcc..ac771390a37 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -438,7 +438,7 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
extern bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -452,6 +452,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
[text/x-patch] v33-0012-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 13-v33-0012-Unset-all_visible-sooner-if-not-freezing.patch)
download | inline diff:
From f01a815565075cc30ca43aadc577b51fa90f639e Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v33 12/16] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index da09c769b4d..9f1257529b9 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1682,8 +1682,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1943,8 +1948,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
[text/x-patch] v33-0013-Track-which-relations-are-modified-by-a-query.patch (2.6K, 14-v33-0013-Track-which-relations-are-modified-by-a-query.patch)
download | inline diff:
From 5b62fa1efe6cec0f92429da72b110927bf42418f Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v33 13/16] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ca14cdabdd0..6a0283985c3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -916,6 +916,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cc3c5de71eb..dcb2ef2275c 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 02265456978..29e2e2da7ea 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -676,6 +676,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
[text/x-patch] v33-0014-Pass-down-information-on-table-modification-to-s.patch (24.7K, 15-v33-0014-Pass-down-information-on-table-modification-to-s.patch)
download | inline diff:
From 25f4a45c95cfdefe9eb96730270bfdab6a7d245c Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v33 14/16] Pass down information on table modification to scan
node
Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 9 +++++++-
src/backend/executor/nodeIndexscan.c | 18 ++++++++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 107 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6887e421442..4d9684b1b19 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2843,7 +2843,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index c08ea927ac5..b502d4088d7 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2058,7 +2058,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index df2440e82a7..e88db52fd7e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index a29be6f467b..5ac7d22e49f 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ed0508c605..4df56087841 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 90ab4e91b56..8ae54217f36 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 87491796523..2ff29b6e40b 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4ab4a3893d5..4261baf4a41 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f976c0e5c7e..eb35dbbc853 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6378,7 +6378,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13768,7 +13768,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22626,7 +22626,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23090,7 +23090,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index e5fa0578889..8c114fa56fa 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3157,7 +3157,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3238,7 +3238,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 6ae0f959592..6d3e9d2f311 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 173d2fe548d..db1b322c665 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -204,7 +204,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -382,7 +382,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -601,7 +601,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -665,7 +665,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 2c68327cb29..62dff010d10 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -105,11 +105,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index only scan is not parallel, or if we're
* serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys,
+ flags);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 84823f0b615..0ec660c8fa9 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index b8119face43..7718376bc2f 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..ac181853225 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7181,7 +7181,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index e37834c406d..43b9d8aaaf1 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -177,7 +177,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ac771390a37..a0e89365c70 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c9fa9f259cd..6066ae156de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
[text/x-patch] v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch (11.0K, 16-v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch)
download | inline diff:
From 5beb927efb98f05c12dfd84f584c11c48d18bd96 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v33 15/16] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 ++++++-
src/backend/access/heap/heapam_handler.c | 15 ++++++-
src/backend/access/heap/pruneheap.c | 40 ++++++++++++++++++-
src/include/access/heapam.h | 24 +++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 89 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f219c7a71cf..8940297f6f3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -573,6 +573,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -587,7 +588,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1264,6 +1267,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1302,6 +1306,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1334,6 +1344,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e88db52fd7e..ab175948c5b 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2477,6 +2485,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2523,7 +2532,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f1257529b9..04aa56e81b6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -951,6 +964,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -964,6 +980,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -974,6 +992,24 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->attempt_update_vm)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS &&
+ prstate->all_visible &&
+ !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*old_vmbits = visibilitymap_get_status(relation, heap_blk,
&vmbuffer);
@@ -1171,6 +1207,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0e89365c70..7e68928f3e9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
[text/x-patch] v33-0016-Set-pd_prune_xid-on-insert.patch (6.8K, 17-v33-0016-Set-pd_prune_xid-on-insert.patch)
download | inline diff:
From 5e27f30bd970c7546a2ec763533d03ec44c1d69b Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v33 16/16] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8940297f6f3..18413d5878f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2122,6 +2122,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2181,15 +2182,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2199,7 +2204,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2563,8 +2567,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9a29fda3601..49cc83a6479 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-01-07 05:55 ` Kirill Reshke <[email protected]>
2 siblings, 0 replies; 17+ messages in thread
From: Kirill Reshke @ 2026-01-07 05:55 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Tue, 6 Jan 2026 at 22:32, Melanie Plageman <[email protected]> wrote:
>
> Ah, yes, I forgot to remove that when I removed the old
> visibilitymap_set() and made visibilitymap_set_vmbits() into
> visiblitymap_set(). Done in v33.
>
> - Melanie
I think 0001-0003 and 0009-0010 are ready.
> That test creates a table, inserts tuples, accesses one page, deletes
all the data, accesses a single page again (until the table is
vacuumed, the pages will still be there and have to be scanned even
though the data is deleted). The first time we set the VM on-access,
we have to extend the VM. That VM access is an extend and not a hit.
Once we set pd_prune_xid on the page, the extend happens during the
first access (before the delete), so when we access the VM after the
delete step, that is counted as a hit and we end up with more hits in
the stats.
Good
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-01-07 08:14 ` Chao Li <[email protected]>
2026-01-27 22:58 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2 siblings, 1 reply; 17+ messages in thread
From: Chao Li @ 2026-01-07 08:14 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
> On Jan 7, 2026, at 01:31, Melanie Plageman <[email protected]> wrote:
>
> On Tue, Jan 6, 2026 at 4:40 AM Andrey Borodin <[email protected]> wrote:
>>
>>> <v32-0014-Pass-down-information-on-table-modification-to-s.patch>
>>
>> I've tried to take an attempt to review some patches of this patchset. It's huge and mostly polished.
>
> I've added attributed your review on the patches you specifically
> mention here (and from previous emails you sent). Let me know if there
> are other patches you reviewed that you did not mention.
>
>> In a step "Pass down information on table modification to scan node" you pass SO_HINT_REL_READ_ONLY flag in IndexNext() and BitmapTableScanSetup(), but not in IndexNextWithReorder() and IndexOnlyNext(). Is there a reason why index scans with ordering cannot use on-access VM setting?
>
> Great point, I simply hadn't tested those cases and didn't think to
> add them. I've added them in attached v33.
>
> While looking at other callers of index_beginscan(), I was wondering
> if systable_beginscan() and systable_beginscan_ordered() should ever
> pass SO_HINT_REL_READ_ONLY. I guess we would need to pass if the
> operation is read-only above the index_beginscan() -- I'm not sure if
> we always know in the caller of systable_beginscan() whether this
> operation will modify the catalog. That seems like it could be a
> separate project, though, so maybe it is better to say this feature is
> just for regular tables.
>
> As for the other cases: We don't have the relation range table index
> in check_exclusion_or_unique_constraints(), so I don't think we can do
> it there.
>
> And I think that the other index scan cases like in replication code
> or get_actual_variable_endpoint() are too small to be worth it, don't
> have the needed info, or don't do on-access pruning (bc of the
> snapshot type they use).
>
>> Also, comment about visibilitymap_set() says "Callers that log VM changes separately should use visibilitymap_set()" as if visibilitymap_set() is some other function.
>
> Ah, yes, I forgot to remove that when I removed the old
> visibilitymap_set() and made visibilitymap_set_vmbits() into
> visiblitymap_set(). Done in v33.
>
> - Melanie
> <v33-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v33-0002-Eliminate-use-of-cached-VM-value-in-lazy_scan_pr.patch><v33-0003-Refactor-lazy_scan_prune-VM-clear-logic-into-hel.patch><v33-0004-Set-the-VM-in-heap_page_prune_and_freeze.patch><v33-0005-Move-VM-assert-into-prune-freeze-code.patch><v33-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v33-0007-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v33-0008-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v33-0009-Simplify-heap_page_would_be_all_visible-visibili.patch><v33-0010-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch><v33-0011-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v33-0012-Unset-all_visible-sooner-if-not-freezing.patch><v33-0013-Track-which-relations-are-modified-by-a-query.patch><v33-0014-Pass-down-information-on-table-modification-to-s.patch><v33-0015-Allow-on-access-pruning-to-set-pages-all-visible.patch><v33-0016-Set-pd_prune_xid-on-insert.patch>
I see the same problem in 0009 and 0010:
0009
```
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3570,6 +3570,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3609,12 +3610,14 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
TransactionId xmin;
+ Assert(!TransactionIdIsValid(dead_after));
+
/* Check comments in lazy_scan_prune. */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
@@ -3647,8 +3650,10 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
```
0010:
```
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1047,6 +1047,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1069,16 +1070,20 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
*liverows += 1;
break;
- case HEAPTUPLE_DEAD:
case HEAPTUPLE_RECENTLY_DEAD:
+ Assert(TransactionIdIsValid(dead_after));
+ /* FALLTHROUGH */
```
I believe the reason why we add Assert(TransactionIdIsValid(dead_after)) under HEAPTUPLE_RECENTLY_DEAD is to ensure that when HeapTupleSatisfiesVacuumHorizon() returns HEAPTUPLE_RECENTLY_DEAD, dead_after must be set. So the goal of the assert is to catch bugs of HeapTupleSatisfiesVacuumHorizon().
From this perspective, I now feel dead_after should be initialized to InvalidTransactionId. Otherwise, say HeapTupleSatisfiesVacuumHorizon() has a bug and miss to set dead_after, then the assert mostly like won’t be fired, because it holds a random value, most likely not be 0.
I know this comment conflicts to one of my previous comments, sorry about that. As I read this patch once and again, I am getting more understanding to it.
0014
```
+ /* set if the query doesn't modify the rel */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
```
Nit: I think it’s better to replace “rel” to “relation”. For a function comment, if there is a parameter named “rel”, then we can use it to refer to the parameter, without such a context, I guess here a while word is better.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-07 08:14 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Chao Li <[email protected]>
@ 2026-01-27 22:58 ` Melanie Plageman <[email protected]>
0 siblings, 0 replies; 17+ messages in thread
From: Melanie Plageman @ 2026-01-27 22:58 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Xuneng Zhou <[email protected]>; Andres Freund <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Wed, Jan 7, 2026 at 3:15 AM Chao Li <[email protected]> wrote:
>
> I believe the reason why we add Assert(TransactionIdIsValid(dead_after)) under HEAPTUPLE_RECENTLY_DEAD is to ensure that when HeapTupleSatisfiesVacuumHorizon() returns HEAPTUPLE_RECENTLY_DEAD, dead_after must be set. So the goal of the assert is to catch bugs of HeapTupleSatisfiesVacuumHorizon().
>
> From this perspective, I now feel dead_after should be initialized to InvalidTransactionId. Otherwise, say HeapTupleSatisfiesVacuumHorizon() has a bug and miss to set dead_after, then the assert mostly like won’t be fired, because it holds a random value, most likely not be 0.
Actually, thinking about it more, I decided to remove the assertions
on dead_after from those patches entirely. I don't use dead_after and
only pass it in because HeapTupleSatisfiesVacuumHorizon requires it.
In fact, I don't care if the function correctly sets dead_after since
I don't use it.
> + /* set if the query doesn't modify the rel */
> + SO_HINT_REL_READ_ONLY = 1 << 10,
> ```
>
> Nit: I think it’s better to replace “rel” to “relation”. For a function comment, if there is a parameter named “rel”, then we can use it to refer to the parameter, without such a context, I guess here a while word is better.
k
I'm currently working on a new version that incorporates Andres'
review feedback and will post soon.
- Melanie
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-01-24 00:28 ` Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2 siblings, 1 reply; 17+ messages in thread
From: Andres Freund @ 2026-01-24 00:28 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
Hi,
On 2026-01-06 12:31:57 -0500, Melanie Plageman wrote:
> Subject: [PATCH v33 01/16] Combine visibilitymap_set() cases in
> lazy_scan_prune()
>
> lazy_scan_prune() previously had two separate cases that called
> visibilitymap_set() after pruning and freezing. These branches were
> nearly identical except that one attempted to avoid dirtying the heap
> buffer. However, that situation can never occur — the heap buffer cannot
> be clean at that point (and we would hit an assertion if it were).
>
> In lazy_scan_prune(), when we change a previously all-visible page to
> all-frozen and the page was recorded as all-visible in the visibility
> map by find_next_unskippable_block(), the heap buffer will always be
> dirty. Either we have just frozen a tuple and already dirtied the
> buffer, or the buffer was modified between find_next_unskippable_block()
> and heap_page_prune_and_freeze() and then pruned in
> heap_page_prune_and_freeze().
>
> Additionally, XLogRegisterBuffer() asserts that the buffer is dirty, so
> attempting to add a clean heap buffer to the WAL chain would assert out
> anyway.
>
> Since the “clean heap buffer with already set VM” case is impossible,
> the two visibilitymap_set() branches in lazy_scan_prune() can be merged.
> Doing so makes the intent clearer and emphasizes that the heap buffer
> must always be marked dirty before being added to the WAL chain.
>
> This commit also adds a test case for vacuuming when no heap
> modifications are required. Currently this ensures that the heap buffer
> is marked dirty before it is added to the WAL chain, but if we later
> remove the heap buffer from the VM-set WAL chain or pass it with the
> REGBUF_NO_CHANGES flag, this test would guard that behavior.
>
> Author: Melanie Plageman <[email protected]>
> Reviewed-by: Chao Li <[email protected]>
> Reviewed-by: Srinath Reddy Sadipiralla <[email protected]>
> Reviewed-by: Kirill Reshke <[email protected]>
> Reviewed-by: Xuneng Zhou <[email protected]>
> Discussion: https://postgr.es/m/5CEAA162-67B1-44DA-B60D-8B65717E8B05%40gmail.com
> Discussion: https://postgr.es/m/flat/CAAKRu_ZWx5gCbeCf7PWCv8p5%3D%3Db7EEws0VD2wksDxpXCvCyHvQ%40mail.gmail.com
> ---
> .../pg_visibility/expected/pg_visibility.out | 44 ++++++++++
> contrib/pg_visibility/sql/pg_visibility.sql | 20 +++++
> src/backend/access/heap/vacuumlazy.c | 87 ++++---------------
> 3 files changed, 82 insertions(+), 69 deletions(-)
>
> diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
> index 09fa5933a35..e10f1706015 100644
> --- a/contrib/pg_visibility/expected/pg_visibility.out
> +++ b/contrib/pg_visibility/expected/pg_visibility.out
> @@ -1,4 +1,5 @@
> CREATE EXTENSION pg_visibility;
> +CREATE EXTENSION pageinspect;
I think this would need a EXTRA_INSTALL = contrib/pageinspect to work reliably
in make. You should be able to see a failure without the fix if you remove
the tmp_install/ dir in an autoconf build and then run make check for
pg_visibility.
I'm slightly wary of embedding numerical bitmaks values in the tests, but I
don't see a better alternative right now.
Other than the EXTRA_INSTALL thing I think this is ready.
> From 4d37243f9fa0dc4e264a28bcee448787fb8d7f65 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Thu, 11 Dec 2025 10:48:13 -0500
> Subject: [PATCH v33 02/16] Eliminate use of cached VM value in
> lazy_scan_prune()
>
> lazy_scan_prune() takes a parameter from lazy_scan_heap() indicating
> whether the page was marked all-visible in the VM at the time it was
> last checked in find_next_unskippable_block(). This behavior is
> historical, dating back to commit 608195a3a365, when we did not pin the
> VM page until deciding we must read it. Now that the VM page is already
> pinned, there is no meaningful benefit to relying on a cached VM status.
>
> Removing this cached value simplifies the logic in both lazy_scan_heap()
> and lazy_scan_prune(). It also clarifies future work that will set the
> visibility map on-access: such paths will not have a cached value
> available, which would make the logic harder to reason about. And
> eliminating it enables us to detect and repair VM corruption on-access.
>
> Along with removing the cached value and unconditionally checking the
> visibility status of the heap page, this commit also moves the VM
> corruption handling to occur first. This reordering should have no
> performance impact, since the checks are inexpensive and performed only
> once per page. It does, however, make the control flow easier to
> understand. The new restructuring also makes it possible to set the VM
> after fixing corruption (if pruning found the page all-visible).
>
> Now that no callers of visibilitymap_set() use its return value, change
> its (and visibilitymap_set_vmbits()) return type to void.
> @@ -1735,7 +1719,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
> BlockNumber next_unskippable_block = vacrel->next_unskippable_block + 1;
> Buffer next_unskippable_vmbuffer = vacrel->next_unskippable_vmbuffer;
> bool next_unskippable_eager_scanned = false;
> - bool next_unskippable_allvis;
>
> *skipsallvis = false;
>
> @@ -1745,7 +1728,6 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
> next_unskippable_block,
> &next_unskippable_vmbuffer);
>
> - next_unskippable_allvis = (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0;
>
> /*
> * At the start of each eager scan region, normal vacuums with eager
> @@ -1764,7 +1746,7 @@ find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis)
> * A block is unskippable if it is not all visible according to the
> * visibility map.
> */
> - if (!next_unskippable_allvis)
> + if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> {
> Assert((mapbits & VISIBILITYMAP_ALL_FROZEN) == 0);
> break;
This feels a bit independent from the rest, but it doesn't matter.
> @@ -2225,6 +2144,71 @@ lazy_scan_prune(LVRelState *vacrel,
> MarkBufferDirty(buf);
> visibilitymap_clear(vacrel->rel, blkno, vmbuffer,
> VISIBILITYMAP_VALID_BITS);
> + /* VM bits are now clear */
> + old_vmbits = 0;
> + }
> +
> + if (!presult.all_visible)
> + return presult.ndeleted;
> +
> + /* Set the visibility map and page visibility hint */
> + new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
> +
> + if (presult.all_frozen)
> + new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
> +
> + /* Nothing to do */
> + if (old_vmbits == new_vmbits)
> + return presult.ndeleted;
>
> + Assert(presult.all_visible);
Given that there's an explicit return for this case a few lines above, I don't
understand what the assert is trying to do?
> + /*
> + * It should never be the case that the visibility map page is set while
> + * the page-level bit is clear
I'd perhaps add a parenthetical saying (and if so, we cleared it above) or
such.
> + visibilitymap_set(vacrel->rel, blkno, buf,
> + InvalidXLogRecPtr,
> + vmbuffer, presult.vm_conflict_horizon,
> + new_vmbits);
> +
> + /*
> + * If the page wasn't already set all-visible and/or all-frozen in the VM,
> + * count it as newly set for logging.
> + */
> + if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> + {
> + vacrel->vm_new_visible_pages++;
> + if (presult.all_frozen)
> + {
> + vacrel->vm_new_visible_frozen_pages++;
> + *vm_page_frozen = true;
Not this patches fault, but I find "vm_new_visible_pages" and
"vm_new_visible_frozen_pages" pretty odd names. The concept is all-visible and
frozen. The page itself isn't visible or invisible...
> Subject: [PATCH v33 03/16] Refactor lazy_scan_prune() VM clear logic into
> helper
> +/*
> + * Helper to correct any corruption detected on a heap page and its
> + * corresponding visibility map page after pruning but before setting the
> + * visibility map. It examines the heap page, the associated VM page, and the
> + * number of dead items previously identified.
> + *
> + * This function must be called while holding an exclusive lock on the heap
> + * buffer, and the dead items must have been discovered under that same lock.
> +
> + * The provided vmbits must reflect the current state of the VM block
> + * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
> + * is pinned, and the heap buffer is exclusively locked, ensuring that no
> + * other backend can update the VM bits corresponding to this heap page.
> + *
> + * Returns true if it cleared corruption and false otherwise.
> + */
I don't love that the caller now has to assume that old_vmbits is zero after
this function returns. Somehow that feels like split responsiility.
I guess I'd take a pointer to old_vmbits and update it accordingly inside
identify_and_fix_vm_corruption() rather than in the caller.
> From 5c65e73246b4968ddfa9d3739f53d0d8734b8727 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 2 Dec 2025 15:07:42 -0500
> Subject: [PATCH v33 04/16] Set the VM in heap_page_prune_and_freeze()
>
> This has no independent benefit. It is meant for ease of review. As of
> this commit, there is still a separate WAL record emitted for setting
> the VM after pruning and freezing. But it is easier to review if moving
> the logic into pruneheap.c is separate from setting the VM in the same
> WAL record.
It seems a bit noisy to refactor the related code in some of the preceding
commits and then refactor it into a slightly different shape as part of this
commit (c.f. heap_page_will_set_vm()).
It's also a bit odd that a function that sounds rather read-only does stuff
like clearing VM/all-visible.
Why are we not doing fixing up of the page *before* we prune it? It's a bit
insane that we do the WAL logging for pruning, which in turn will often
include an FPI, before we do the fixups. The fixes aren't WAL logged, so this
actually leads to the standby getting further out of sync.
I realize this isn't your mess, but brrr.
Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
HEAP_PAGE_PRUNE_UPDATE_VM would be set?
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -424,11 +424,7 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
> static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
> BlockNumber blkno, Page page,
> bool sharelock, Buffer vmbuffer);
> -static bool identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
> - BlockNumber heap_blk, Page heap_page,
> - int nlpdead_items,
> - Buffer vmbuffer,
> - uint8 vmbits);
> +
> static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
> BlockNumber blkno, Page page,
> Buffer vmbuffer,
> @@ -1962,83 +1958,6 @@ cmpOffsetNumbers(const void *a, const void *b)
> return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
> }
Spurious newline inserted.
> /*
> * If the page wasn't already set all-visible and/or all-frozen in the VM,
> * count it as newly set for logging.
> */
> - if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> + if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
> + (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> {
> vacrel->vm_new_visible_pages++;
> - if (presult.all_frozen)
> + if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> {
> vacrel->vm_new_visible_frozen_pages++;
> *vm_page_frozen = true;
> }
> }
> - else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> - presult.all_frozen)
> + else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> + (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> {
> + Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
> vacrel->vm_new_frozen_pages++;
> *vm_page_frozen = true;
> }
It's a bit odd that we figure out all of this by inspecting old/new vmbits and
have that logic in multiple places.
> Subject: [PATCH v33 05/16] Move VM assert into prune/freeze code
Feels like a somewhat too narrow description, given that it changes the API
for heap_page_prune_and_freeze() by removing variables from PruneFreezeResult.
> +#ifdef USE_ASSERT_CHECKING
> +
> +/*
> + * Wrapper for heap_page_would_be_all_visible() which can be used for callers
> + * that expect no LP_DEAD on the page. Currently assert-only, but there is no
> + * reason not to use it outside of asserts.
> + */
If so, why would we want it in pruneheap.c? Seems a bit odd to have
heap_page_would_be_all_visible() defined in vacuumlazy.c but defined
heap_page_is_all_visible() in pruneheap.c.
> From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Tue, 2 Dec 2025 16:16:22 -0500
> Subject: [PATCH v33 06/16] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
> prune/freeze
>
> Vacuum no longer emits a separate WAL record for each page set
> all-visible or all-frozen during phase I. Instead, visibility map
> updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
> is already emitted for pruning and freezing.
>
> Previously, heap_page_prune_and_freeze() determined whether a page was
> all-visible, but the corresponding VM bits were only set later in
> lazy_scan_prune(). Now the VM is updated immediately in
> heap_page_prune_and_freeze(), at the same time as the heap
> modifications.
>
> This change applies only to vacuum phase I, not to pruning performed
> during normal page access.
> +/*
> + * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
> + * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
> + */
> +static TransactionId
> +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> + uint8 old_vmbits, uint8 new_vmbits,
> + TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> + TransactionId visibility_cutoff_xid)
> +{
The logic for horizons is now split between this and "Calculate what the
snapshot conflict horizon should be for a record" in heap_page_will_freeze().
Although I guess I don't understand that code:
/*
* Calculate what the snapshot conflict horizon should be for a record
* freezing tuples. We can use the visibility_cutoff_xid as our cutoff
* for conflicts when the whole page is eligible to become all-frozen
* in the VM once we're done with it. Otherwise, we generate a
* conservative cutoff by stepping back from OldestXmin.
*/
if (prstate->all_frozen)
prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
else
{
/* Avoids false conflicts when hot_standby_feedback in use */
prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
TransactionIdRetreat(prstate->frz_conflict_horizon);
}
Why does it make sense to use OldestXmin? Consider e.g. the case where there
is one very old tuple that needs to be frozen and one new live tuple on a
page. Because of the new tuple we can't mark the page all-frozen. But there's
also no reason to not use much less aggressive horizon than OldestXmin, namely
the newer of xmin,xmax of the old frozen tuple?
I also don't understand what the "false conflicts" thing is referencing.
> + TransactionId conflict_xid;
> +
> + /*
> + * We can omit the snapshot conflict horizon if we are not pruning or
> + * freezing any tuples and are setting an already all-visible page
> + * all-frozen in the VM. In this case, all of the tuples on the page must
> + * already be visible to all MVCC snapshots on the standby.
> + */
The last sentence here is a bit confusing, because they don't just need to
already be visible to everyone, they already need to be frozen. Right?
> + if (!do_prune &&
> + !do_freeze &&
> + do_set_vm &&
I'm confused by the do_set_vm check here. Doesn't it mean that we will *not*
return InvalidTransactionId if !prstate->attempt_update_vm? I don't undestand
why that would make sense.
I guess we'll compute a bogus cutoff in that cse, but never use it, since
we'll also not emit WAL? Or maybe we'll just unnecessarily go through the
code below, because the code turns out to do ok regardles? It's confusing
either way.
> + (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
> + (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
I wonder if some of this code would end up cleaner if we tracked the bits we
intend to add, rather than the target set of bits is.
> + /*
> + * The snapshotConflictHorizon for the whole record should be the most
> + * conservative of all the horizons calculated for any of the possible
> + * modifications. If this record will prune tuples, any transactions on
> + * the standby older than the youngest xmax of the most recently removed
> + * tuple this record will prune will conflict. If this record will freeze
> + * tuples, any transactions on the standby with xids older than the
> + * youngest tuple this record will freeze will conflict.
> + */
> + conflict_xid = InvalidTransactionId;
I'd move this first assignment into an else.
> + /*
> + * If we are updating the VM, the conflict horizon is almost always the
> + * visibility cutoff XID.
> + *
> + * Separately, if we are freezing any tuples, as an optimization, we can
> + * use the visibility_cutoff_xid as the conflict horizon if the page will
> + * be all-frozen.
What does "as an optimization" mean here?
Note that the code actually uses visibility_cutoff_xid even if the page is
just marked all-visible, but not all-frozen (due to the do_set_vm check being
earlier)
> This is true even if there are LP_DEAD line pointers
> + * because we ignored those when maintaining the visibility_cutoff_xid.
I must just be missing something because I can't follow this at all. I guess
it could be correct because we later then add in knowledge of removed xids in
via the TransactionIdFollows check below? But if that's it, this is extremely
confusingly worded.
Sorry, running out of brain power. More another day.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
@ 2026-01-28 23:16 ` Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-02-20 21:34 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
0 siblings, 2 replies; 17+ messages in thread
From: Melanie Plageman @ 2026-01-28 23:16 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
Thanks for the review!
I pushed v33 0001-0003 after incorporating your feedback.
On Fri, Jan 23, 2026 at 7:28 PM Andres Freund <[email protected]> wrote:
>
> On 2026-01-06 12:31:57 -0500, Melanie Plageman wrote:
>
> > + */
> > + if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > + {
> > + vacrel->vm_new_visible_pages++;
> > + if (presult.all_frozen)
> > + {
> > + vacrel->vm_new_visible_frozen_pages++;
> > + *vm_page_frozen = true;
>
> Not this patches fault, but I find "vm_new_visible_pages" and
> "vm_new_visible_frozen_pages" pretty odd names. The concept is all-visible and
> frozen. The page itself isn't visible or invisible...
I thought having the extra word "all" in there made it too long. And
since "vm" is there, that isn't set unless the page is
_all_-visible/all-frozen. But if you think it gives people the wrong
idea, I am willing to change it. I can omit vm and make it:
new_all_visible_all_frozen_pages
new_all_visible_pages
new_all_frozen_pages
Is that clearer?
> > From 5c65e73246b4968ddfa9d3739f53d0d8734b8727 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Tue, 2 Dec 2025 15:07:42 -0500
> > Subject: [PATCH v33 04/16] Set the VM in heap_page_prune_and_freeze()
> >
> > This has no independent benefit. It is meant for ease of review. As of
> > this commit, there is still a separate WAL record emitted for setting
> > the VM after pruning and freezing. But it is easier to review if moving
> > the logic into pruneheap.c is separate from setting the VM in the same
> > WAL record.
>
> It seems a bit noisy to refactor the related code in some of the preceding
> commits and then refactor it into a slightly different shape as part of this
> commit (c.f. heap_page_will_set_vm()).
I understand what you are saying. I don't see a good way to keep it
reviewable otherwise, though.
> It's also a bit odd that a function that sounds rather read-only does stuff
> like clearing VM/all-visible.
I thought about this a lot. Ultimately, I ended up keeping it the way it is.
I think the other option is changing from this:
do_set_vm = heap_page_will_set_vm(&prstate,
params->relation,
blockno, buffer, page,
vmbuffer,
params->reason,
do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits, &new_vmbits);
to this:
heap_page_prepare_vm_set(&prstate,
params->relation,
blockno, buffer, page,
vmbuffer,
params->reason,
do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits, &new_vmbits);
do_set_vm = (new_vmbits & VISIBILITYMAP_VALID_BITS) != 0;
or heap_page_plan_vm_set()
heap_page_will_set_vm() has symmetry with heap_page_will_freeze(), the
helper that decides whether or not we will freeze tuples. I like that
symmetry since heap_page_will_set_vm() decides whether or not to set
the VM.
Now, heap_page_plan/prepare_vm_set() does indirectly hint that
something like clearing VM/all-visible could happen -- if you
understand that preparing the VM to have bits set also includes
clearing any existing corruption. And "prepare" or "plan" has more
symmetry with prune_freeze_plan() -- though that function does not
make changes on the page.
Ultimately, clearing the VM/page of corruption is pretty anomalous
from the rest of the code in heap_page_prune_and_freeze(). All other
changes to the page are done in a single critical section at the
bottom of the function.
I could see an argument for moving identify_and_fix_vm_corruption()
out of the helper and into heap_page_prune_and_freeze() but then we'd
have to move visibilitymap_get_status() out too. And that takes away a
lot of the benefit of encapsulating all that logic.
Ultimately, I don't like any of those alternative structures. But if
you prefer the names and return value change I have above
(heap_page_prepare/plan_vm_set()), I'm fine with going with that.
> Why are we not doing fixing up of the page *before* we prune it? It's a bit
> insane that we do the WAL logging for pruning, which in turn will often
> include an FPI, before we do the fixups. The fixes aren't WAL logged, so this
> actually leads to the standby getting further out of sync.
>
> I realize this isn't your mess, but brrr.
Well, after this patch set, clearing the VM does happen before we emit
WAL for pruning. It wouldn't be hard to move the corruption fixups to
the beginning of heap_page_prune_and_freeze() in the new code
structure. But it would split visibility map-related logic into two
parts of heap_page_prune_and_freeze(). Would it be worth it? What
benefit would we get? Do you just feel that it should logically come
first?
> Do we actually forsee a case where only one of HEAP_PAGE_PRUNE_FREEZE |
> HEAP_PAGE_PRUNE_UPDATE_VM would be set?
Yes, when setting the VM on-access, it is too expensive to call
heap_prepare_freeze_tuple() on each tuple. I could work on trying to
optimize it, but it isn't currently viable.
> > - if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
> > + if ((presult.old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
> > + (presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> > {
> > vacrel->vm_new_visible_pages++;
> > - if (presult.all_frozen)
> > + if ((presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > {
> > vacrel->vm_new_visible_frozen_pages++;
> > *vm_page_frozen = true;
> > }
> > }
> > - else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> > - presult.all_frozen)
> > + else if ((presult.old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
> > + (presult.new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > {
> > + Assert((presult.new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0);
> > vacrel->vm_new_frozen_pages++;
> > *vm_page_frozen = true;
> > }
>
> It's a bit odd that we figure out all of this by inspecting old/new vmbits and
> have that logic in multiple places.
I changed PruneFreezeResult to have just the counters that have to be
reflected in vacrel for logging (vm_new_frozen_pages, etc) instead of
passing the bits back.
> > Subject: [PATCH v33 05/16] Move VM assert into prune/freeze code
>
> Feels like a somewhat too narrow description, given that it changes the API
> for heap_page_prune_and_freeze() by removing variables from PruneFreezeResult.
I've tried to fix this.
> > +/*
> > + * Wrapper for heap_page_would_be_all_visible() which can be used for callers
> > + * that expect no LP_DEAD on the page. Currently assert-only, but there is no
> > + * reason not to use it outside of asserts.
> > + */
>
> If so, why would we want it in pruneheap.c? Seems a bit odd to have
> heap_page_would_be_all_visible() defined in vacuumlazy.c but defined
> heap_page_is_all_visible() in pruneheap.c.
You're right. I've kept them both in vacuumlazy.c
> > From cdf5776fadeae3430c692999b37f8a7ec944bda1 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <[email protected]>
> > Date: Tue, 2 Dec 2025 16:16:22 -0500
> > +static TransactionId
> > +get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
> > + uint8 old_vmbits, uint8 new_vmbits,
> > + TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
> > + TransactionId visibility_cutoff_xid)
> > +{
>
> The logic for horizons is now split between this and "Calculate what the
> snapshot conflict horizon should be for a record" in heap_page_will_freeze().
That is true in master too. We determine frz_conflict_horizon in
heap_page_will_freeze() and later before emitting the WAL record
decide which of the latest_xid_removed and frz_conflict_horizon that
we should use as the snapshot conflict horizon for the combined
record.
All I've done is expand that part (the part before emitting the WAL
record) a bit because now we have to consider what the horizon would
be if we set the VM.
If I really wanted to calculate it only in a single place, I could
maintain a new variable, all_frozen_except_dead, and remove the
frz_conflict_horizon from heap_page_will_freeze(). Then, in
get_conflict_xid(), I could have the following logic:
if (do_set_vm)
conflict_xid = visibility_cutoff_xid;
else if (do_freeze)
{
if (all_frozen_except_dead)
conflict_xid = visibility_cutoff_xid;
else
{
conflict_xid = OldestXmin;
TransactionIdRetreat(conflict_xid);
}
}
else
conflict_xid = InvalidTransactionId;
I think using all_frozen_except_dead while maintaining
visibility_cutoff_xid (in heap_prune_record_unchanged_lp_normal()) has
the potential to be confusing, though. We'd need to keep updating
visibility_cutoff_xid when all_visible is false but
all_frozen_except_dead is true as well as when all_visible is true.
And because we don't care about all_visible_except_dead, it gets even
more confusing to make sure we are maintaining the right variables in
the right situations.
Alternatively, we could keep maintenance of visibility_cutoff_xid the
same and only use all_frozen_except_dead to avoid having the conflict
xid calculation in two places. We would just set it after
prune_freeze_plan() and use it the way it is in the snippet above. But
I don't know if this is better than just having a separate freeze
conflict horizon calculated in the will_freeze code. It is just as
confusing to understand and just as many variables but in a different
place.
For now, I've kept it as is.
> Although I guess I don't understand that code:
>
> /*
> * Calculate what the snapshot conflict horizon should be for a record
> * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
> * for conflicts when the whole page is eligible to become all-frozen
> * in the VM once we're done with it. Otherwise, we generate a
> * conservative cutoff by stepping back from OldestXmin.
> */
> if (prstate->all_frozen)
> prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
> else
> {
> /* Avoids false conflicts when hot_standby_feedback in use */
> prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
> TransactionIdRetreat(prstate->frz_conflict_horizon);
> }
>
> Why does it make sense to use OldestXmin? Consider e.g. the case where there
> is one very old tuple that needs to be frozen and one new live tuple on a
> page. Because of the new tuple we can't mark the page all-frozen. But there's
> also no reason to not use much less aggressive horizon than OldestXmin, namely
> the newer of xmin,xmax of the old frozen tuple?
We don't track the newest frozen xmin right now. Doing so wouldn't be
free (i.e. more comparisons which may matter in a query without much
other overhead). And we can't get rid of any of the other cutoffs we
track. We'd still need to maintain the visibility_cutoff_xid and
latest_removed_xid. Which also means it doesn't simplify the code.
The only purpose it would serve is to make the snapshot conflict
horizon more accurate/more aggressive when we freeze tuples, which
would lead to canceling less queries than master -- which is outside
the purview of this patch. There's also a set of complications around
maintaining this number accurately mentioned by Peter in [1].
I've added a new patch in the series 0001 that expands the comment
about this in heap_page_will_freeze() and describes that we are using
a coarse cutoff because we don't track anything else.
But I don't think changing this behavior is a blocker for this feature.
> I also don't understand what the "false conflicts" thing is referencing.
Yea, neither do I. I ported that comment over (it's from before I
started modifying this code) and have never really understood what it
meant -- wouldn't we have more false conflicts if we use a newer
cutoff? (because OldestXmin will be newer than visibility_cutoff_xid).
If the page isn't all-visible, we won't maintain
visibility_cutoff_xid. But if we did actually track the newest live
tuple xmin on the page, that could very well be newer than OldestXmin
and thus using OldestXmin would cancel less and avoid more false
conflicts. But that feels like a _very_ big stretch (in a hypothetical
world that doesn't exist). Anyway, I've deleted the comment (in 0001)
since it clearly is not adding value.
> > + TransactionId conflict_xid;
> > +
> > + /*
> > + * We can omit the snapshot conflict horizon if we are not pruning or
> > + * freezing any tuples and are setting an already all-visible page
> > + * all-frozen in the VM. In this case, all of the tuples on the page must
> > + * already be visible to all MVCC snapshots on the standby.
> > + */
>
> The last sentence here is a bit confusing, because they don't just need to
> already be visible to everyone, they already need to be frozen. Right?
Right. Well that also means that all tuples are visible to all MVCC
snapshots on the standby too -- I picked that language up from
somewhere else in the code and thought it sounded good. But I've
edited it to say frozen, which is more accurate.
> > + if (!do_prune &&
> > + !do_freeze &&
> > + do_set_vm &&
>
> I'm confused by the do_set_vm check here. Doesn't it mean that we will *not*
> return InvalidTransactionId if !prstate->attempt_update_vm? I don't undestand
> why that would make sense.
>
> I guess we'll compute a bogus cutoff in that cse, but never use it, since
> we'll also not emit WAL? Or maybe we'll just unnecessarily go through the
> code below, because the code turns out to do ok regardles? It's confusing
> either way.
Ah no, that was a relic from when I didn't clear the new vmbits in the
event that they were the same as old_vmbits. Now that I do that, I
don't need to qualify it with do_set_vm anymore. I've fixed that.
> > + (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
> > + (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
>
> I wonder if some of this code would end up cleaner if we tracked the bits we
> intend to add, rather than the target set of bits is.
I played around with this idea. Unfortunately, this specific check
wouldn't be much simpler, as I'd have to check that _only_ the
all-frozen bit is set. And I think that it is probably more clear with
explicit old and new vmbits. Those make it clear that we are setting a
formerly all-visible page all-frozen.
In some of the other places where I use old/new vmbits, I tried only
using the delta/bits that need to be newly set. But, for example, in
visibilitymap_set(), I need both the all-visible and all-frozen bits
to be passed (even if all-visible is already set), because it asserts
that if all-frozen is passed both all-visible and all-frozen are
passed. If I only keep track of the new bits that need to be set, I
would require other logic before visibilitymap_set(). And it makes
sense that visibilitymap_set() requires both because you don't ever
want people setting just all-frozen -- and you don't want the API to
make it easy to do that.
> > + /*
> > + * If we are updating the VM, the conflict horizon is almost always the
> > + * visibility cutoff XID.
> > + *
> > + * Separately, if we are freezing any tuples, as an optimization, we can
> > + * use the visibility_cutoff_xid as the conflict horizon if the page will
> > + * be all-frozen.
>
> What does "as an optimization" mean here?
What I meant was that visibility_cutoff_xid is going to be older than
OldestXmin for an all-frozen page, so it will lead to canceling fewer
queries. So, using it is kind of an "optimization". But I re-read that
comment and it was way too confusing. I actually cut that whole
paragraph because it should be discussed in heap_page_will_freeze()
where we are actually handling it.
> Note that the code actually uses visibility_cutoff_xid even if the page is
> just marked all-visible, but not all-frozen (due to the do_set_vm check being
> earlier)
And that is correct. But the comment (which is now removed) was
misleading, you are right.
> > This is true even if there are LP_DEAD line pointers
> > + * because we ignored those when maintaining the visibility_cutoff_xid.
>
> I must just be missing something because I can't follow this at all. I guess
> it could be correct because we later then add in knowledge of removed xids in
> via the TransactionIdFollows check below? But if that's it, this is extremely
> confusingly worded.
Yes, we add in latest_xid_removed which is what makes it correct. But
I agree that the wording was terrible. I've cut those details from
get_conflict_xid() and kept them where they are relevant in
heap_page_will_freeze().
- Melanie
[1] https://www.postgresql.org/message-id/CAH2-WzkB-Pt3zPeTXvMik6jcJn%2BdcpUqO-tt_hc13bD6sGRLPg%40mail.g...
Attachments:
[text/x-patch] v34-0001-Clarify-some-heap-tuple-freezing-related-comment.patch (3.3K, 2-v34-0001-Clarify-some-heap-tuple-freezing-related-comment.patch)
download | inline diff:
From e76a68f196d96f269d5719b46d8ba1c9b8950870 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 28 Jan 2026 12:51:26 -0500
Subject: [PATCH v34 01/14] Clarify some heap tuple freezing related comments
Some of the comments about calculating the snapshot conflict horizon for
a record freezing tuples needed further elaboration.
---
src/backend/access/heap/pruneheap.c | 30 +++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 632c2427952..b9d2b48104c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -736,17 +736,26 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
heap_pre_freeze_checks(buffer, prstate->frozen, prstate->nfrozen);
/*
- * Calculate what the snapshot conflict horizon should be for a record
- * freezing tuples. We can use the visibility_cutoff_xid as our cutoff
- * for conflicts when the whole page is eligible to become all-frozen
- * in the VM once we're done with it. Otherwise, we generate a
+ * Determine the snapshot conflict horizon for freezing tuples on the
+ * page.
+ *
+ * We don't track the newest xmin that will become frozen, so we must
+ * use a coarser (more conservative) cutoff as the conflict horizon.
+ *
+ * We can use the visibility_cutoff_xid as our cutoff for conflicts
+ * when the whole page is eligible to become all-frozen in the VM once
+ * we're done with it. Otherwise, we generate an even more
* conservative cutoff by stepping back from OldestXmin.
+ *
+ * Ignoring dead items when all other tuples will be frozen allows us
+ * to pick an older horizon (visibility_cutoff_xid will be older than
+ * OldestXmin). We will later adjust this horizon to account for dead
+ * items, moving it forward if the newest removed xid is newer.
*/
if (prstate->all_frozen)
prstate->frz_conflict_horizon = prstate->visibility_cutoff_xid;
else
{
- /* Avoids false conflicts when hot_standby_feedback in use */
prstate->frz_conflict_horizon = prstate->cutoffs->OldestXmin;
TransactionIdRetreat(prstate->frz_conflict_horizon);
}
@@ -877,11 +886,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* While scanning the line pointers, we did not clear
* all_visible/all_frozen when encountering LP_DEAD items because we
- * wanted the decision whether or not to freeze the page to be unaffected
- * by the short-term presence of LP_DEAD items. These LP_DEAD items are
- * effectively assumed to be LP_UNUSED items in the making. It doesn't
- * matter which vacuum heap pass (initial pass or final pass) ends up
- * setting the page all-frozen, as long as the ongoing VACUUM does it.
+ * wanted the decision whether or not to opportunistically freeze the page
+ * to be unaffected by the short-term presence of LP_DEAD items. These
+ * LP_DEAD items are effectively assumed to be LP_UNUSED items in the
+ * making. It doesn't matter which vacuum heap pass (initial pass or final
+ * pass) ends up setting the page all-frozen, as long as the ongoing
+ * VACUUM does it.
*
* Now that we finished determining whether or not to freeze the page,
* update all_visible and all_frozen so that they reflect the true state
--
2.43.0
[text/x-patch] v34-0002-Set-the-VM-in-heap_page_prune_and_freeze.patch (27.4K, 3-v34-0002-Set-the-VM-in-heap_page_prune_and_freeze.patch)
download | inline diff:
From 9a6cc0cb97af38cf6eedf1a39eede8f3f9926cb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 15:07:42 -0500
Subject: [PATCH v34 02/14] Set the VM in heap_page_prune_and_freeze()
This has no independent benefit. It is meant for ease of review. As of
this commit, there is still a separate WAL record emitted for setting
the VM after pruning and freezing. But it is easier to review if moving
the logic into pruneheap.c is separate from setting the VM in the same
WAL record.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t%40gfpkkwmdnepx
---
src/backend/access/heap/pruneheap.c | 318 +++++++++++++++++++++++----
src/backend/access/heap/vacuumlazy.c | 163 +-------------
src/include/access/heapam.h | 20 ++
3 files changed, 303 insertions(+), 198 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b9d2b48104c..014c3c92d6c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -19,7 +19,7 @@
#include "access/htup_details.h"
#include "access/multixact.h"
#include "access/transam.h"
-#include "access/visibilitymapdefs.h"
+#include "access/visibilitymap.h"
#include "access/xlog.h"
#include "access/xloginsert.h"
#include "commands/vacuum.h"
@@ -44,6 +44,8 @@ typedef struct
bool mark_unused_now;
/* whether to attempt freezing tuples */
bool attempt_freeze;
+ /* whether or not to attempt updating the VM */
+ bool attempt_update_vm;
struct VacuumCutoffs *cutoffs;
/*-------------------------------------------------------
@@ -140,16 +142,17 @@ typedef struct
* all_visible and all_frozen indicate if the all-visible and all-frozen
* bits in the visibility map can be set for this page after pruning.
*
- * visibility_cutoff_xid is the newest xmin of live tuples on the page.
- * The caller can use it as the conflict horizon, when setting the VM
- * bits. It is only valid if we froze some tuples, and all_frozen is
- * true.
+ * visibility_cutoff_xid is the newest xmin of live tuples on the page. It
+ * can be used as the conflict horizon when setting the VM or when
+ * freezing all the tuples on the page. It is only valid when all the live
+ * tuples on the page are all-visible.
*
* NOTE: all_visible and all_frozen initially don't include LP_DEAD items.
* That's convenient for heap_page_prune_and_freeze() to use them to
- * decide whether to freeze the page or not. The all_visible and
- * all_frozen values returned to the caller are adjusted to include
- * LP_DEAD items after we determine whether to opportunistically freeze.
+ * decide whether to opportunistically freeze the page or not. The
+ * all_visible and all_frozen values ultimately used to set the VM are
+ * adjusted to include LP_DEAD items after we determine whether or not to
+ * opportunistically freeze.
*/
bool all_visible;
bool all_frozen;
@@ -191,6 +194,17 @@ static void page_verify_redirects(Page page);
static bool heap_page_will_freeze(Relation relation, Buffer buffer,
bool did_tuple_hint_fpi, bool do_prune, bool do_hint_prune,
PruneState *prstate);
+static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page, int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 *vmbits);
+static bool heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits);
/*
@@ -280,6 +294,7 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
PruneFreezeParams params = {
.relation = relation,
.buffer = buffer,
+ .vmbuffer = InvalidBuffer,
.reason = PRUNE_ON_ACCESS,
.options = 0,
.vistest = vistest,
@@ -341,6 +356,8 @@ prune_freeze_setup(PruneFreezeParams *params,
/* cutoffs must be provided if we will attempt freezing */
Assert(!(params->options & HEAP_PAGE_PRUNE_FREEZE) || params->cutoffs);
prstate->attempt_freeze = (params->options & HEAP_PAGE_PRUNE_FREEZE) != 0;
+ prstate->attempt_update_vm =
+ (params->options & HEAP_PAGE_PRUNE_UPDATE_VM) != 0;
prstate->cutoffs = params->cutoffs;
/*
@@ -396,51 +413,54 @@ prune_freeze_setup(PruneFreezeParams *params,
prstate->frz_conflict_horizon = InvalidTransactionId;
/*
- * Vacuum may update the VM after we're done. We can keep track of
- * whether the page will be all-visible and all-frozen after pruning and
- * freezing to help the caller to do that.
+ * Track whether the page could be marked all-visible and/or all-frozen.
+ * This information is used for opportunistic freezing and for updating
+ * the visibility map (VM) if requested by the caller.
*
- * Currently, only VACUUM sets the VM bits. To save the effort, only do
- * the bookkeeping if the caller needs it. Currently, that's tied to
- * HEAP_PAGE_PRUNE_FREEZE, but it could be a separate flag if you wanted
- * to update the VM bits without also freezing or freeze without also
- * setting the VM bits.
+ * Currently, only VACUUM performs freezing, but other callers may in the
+ * future. Visibility bookkeeping is required not just for setting the VM
+ * bits, but also for opportunistic freezing: we only consider freezing if
+ * the page would become all-frozen, or if it would be all-frozen except
+ * for dead tuples that VACUUM will remove. If attempt_update_vm is false,
+ * we will not set the VM bit even if the page is found to be all-visible.
*
- * In addition to telling the caller whether it can set the VM bit, we
- * also use 'all_visible' and 'all_frozen' for our own decision-making. If
- * the whole page would become frozen, we consider opportunistically
- * freezing tuples. We will not be able to freeze the whole page if there
- * are tuples present that are not visible to everyone or if there are
- * dead tuples which are not yet removable. However, dead tuples which
- * will be removed by the end of vacuuming should not preclude us from
- * opportunistically freezing. Because of that, we do not immediately
- * clear all_visible and all_frozen when we see LP_DEAD items. We fix
- * that after scanning the line pointers. We must correct all_visible and
- * all_frozen before we return them to the caller, so that the caller
- * doesn't set the VM bits incorrectly.
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is passed without HEAP_PAGE_PRUNE_FREEZE,
+ * prstate.all_frozen must be initialized to false, since we will not call
+ * heap_prepare_freeze_tuple() for each tuple.
+ *
+ * Dead tuples that will be removed by the end of vacuum should not
+ * prevent opportunistic freezing. Therefore, we do not clear all_visible
+ * and all_frozen when we encounter LP_DEAD items. Instead, we correct
+ * them after deciding whether to freeze, but before updating the VM, to
+ * avoid setting the VM bits incorrectly.
+ *
+ * If neither freezing nor VM updates are requested, we skip the extra
+ * bookkeeping. In this case, initializing all_visible to false allows
+ * heap_prune_record_unchanged_lp_normal() to bypass unnecessary work.
*/
if (prstate->attempt_freeze)
{
prstate->all_visible = true;
prstate->all_frozen = true;
}
+ else if (prstate->attempt_update_vm)
+ {
+ prstate->all_visible = true;
+ prstate->all_frozen = false;
+ }
else
{
- /*
- * Initializing to false allows skipping the work to update them in
- * heap_prune_record_unchanged_lp_normal().
- */
prstate->all_visible = false;
prstate->all_frozen = false;
}
/*
- * The visibility cutoff xid is the newest xmin of live tuples on the
- * page. In the common case, this will be set as the conflict horizon the
- * caller can use for updating the VM. If, at the end of freezing and
- * pruning, the page is all-frozen, there is no possibility that any
- * running transaction on the standby does not see tuples on the page as
- * all-visible, so the conflict horizon remains InvalidTransactionId.
+ * The visibility cutoff xid is the newest xmin of live, committed tuples
+ * older than OldestXmin on the page. This field is only kept up-to-date
+ * if the page is all-visible. As soon as a tuple is encountered that is
+ * not visible to all, this field is unmaintained. As long as it is
+ * maintained, it can be used to calculate the snapshot conflict horizon
+ * when updating the VM and/or freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -784,10 +804,145 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Helper to correct any corruption detected on a heap page and its
+ * corresponding visibility map page after pruning but before setting the
+ * visibility map. It examines the heap page, the associated VM page, and the
+ * number of dead items previously identified.
+ *
+ * This function must be called while holding an exclusive lock on the heap
+ * buffer, and the dead items must have been discovered under that same lock.
+ *
+ * The provided vmbits must reflect the current state of the VM block
+ * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
+ * is pinned, and the heap buffer is exclusively locked, ensuring that no
+ * other backend can update the VM bits corresponding to this heap page.
+ *
+ * If it clears corruption, it will zero out vmbits.
+ */
+static void
+identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
+ BlockNumber heap_blk, Page heap_page,
+ int nlpdead_items,
+ Buffer vmbuffer,
+ uint8 *vmbits)
+{
+ Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
+
+ Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
+
+ /*
+ * As of PostgreSQL 9.2, the visibility map bit should never be set if the
+ * page-level bit is clear. However, it's possible that the bit got
+ * cleared after heap_vac_scan_next_block() was called, so we must recheck
+ * with buffer lock before concluding that the VM is corrupt.
+ */
+ if (!PageIsAllVisible(heap_page) &&
+ ((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ *vmbits = 0;
+ }
+
+ /*
+ * It's possible for the value returned by
+ * GetOldestNonRemovableTransactionId() to move backwards, so it's not
+ * wrong for us to see tuples that appear to not be visible to everyone
+ * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
+ * never moves backwards, but GetOldestNonRemovableTransactionId() is
+ * conservative and sometimes returns a value that's unnecessarily small,
+ * so if we see that contradiction it just means that the tuples that we
+ * think are not visible to everyone yet actually are, and the
+ * PD_ALL_VISIBLE flag is correct.
+ *
+ * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
+ * however.
+ */
+ else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
+ {
+ ereport(WARNING,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
+ RelationGetRelationName(rel), heap_blk)));
+
+ PageClearAllVisible(heap_page);
+ MarkBufferDirty(heap_buffer);
+ visibilitymap_clear(rel, heap_blk, vmbuffer,
+ VISIBILITYMAP_VALID_BITS);
+ *vmbits = 0;
+ }
+}
+
+/*
+ * Decide whether to set the visibility map bits (all-visible and all-frozen)
+ * for heap_blk using information from the PruneState and VM.
+ *
+ * This function does not actually set the VM bits or page-level visibility
+ * hint, PD_ALL_VISIBLE.
+ *
+ * However, if it finds that the page-level visibility hint or VM is
+ * corrupted, it will fix them by clearing the VM bits and visibility hint.
+ * This does not need to be done in a critical section.
+ *
+ * Returns true if one or both VM bits should be set, along with returning the
+ * current value of the VM bits in *old_vmbits and the desired new value of
+ * the VM bits in *new_vmbits.
+ *
+ * If the VM should not be set, it returns false. If we won't consider
+ * updating the VM, *old_vmbits will be 0, regardless of the current value of
+ * the VM bits.
+ */
+static bool
+heap_page_will_set_vm(PruneState *prstate,
+ Relation relation,
+ BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
+ Buffer vmbuffer,
+ int nlpdead_items,
+ uint8 *old_vmbits,
+ uint8 *new_vmbits)
+{
+ *old_vmbits = 0;
+ *new_vmbits = 0;
+
+ if (!prstate->attempt_update_vm)
+ return false;
+
+ *old_vmbits = visibilitymap_get_status(relation, heap_blk,
+ &vmbuffer);
+
+ /* We do this even if not all-visible */
+ identify_and_fix_vm_corruption(relation, heap_buffer, heap_blk, heap_page,
+ nlpdead_items, vmbuffer,
+ old_vmbits);
+
+ if (!prstate->all_visible)
+ return false;
+
+ *new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
+
+ if (prstate->all_frozen)
+ *new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+ if (*new_vmbits == *old_vmbits)
+ {
+ *new_vmbits = 0;
+ return false;
+ }
+
+ return true;
+}
+
/*
* Prune and repair fragmentation and potentially freeze tuples on the
- * specified page.
+ * specified page. If the page's visibility status has changed, update it in
+ * the VM.
*
* Caller must have pin and buffer cleanup lock on the page. Note that we
* don't update the FSM information for page on caller's behalf. Caller might
@@ -802,12 +957,13 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
* tuples if it's required in order to advance relfrozenxid / relminmxid, or
* if it's considered advantageous for overall system performance to do so
* now. The 'params.cutoffs', 'presult', 'new_relfrozen_xid' and
- * 'new_relmin_mxid' arguments are required when freezing. When
- * HEAP_PAGE_PRUNE_FREEZE option is passed, we also set presult->all_visible
- * and presult->all_frozen after determining whether or not to
- * opportunistically freeze, to indicate if the VM bits can be set. They are
- * always set to false when the HEAP_PAGE_PRUNE_FREEZE option is not passed,
- * because at the moment only callers that also freeze need that information.
+ * 'new_relmin_mxid' arguments are required when freezing.
+ *
+ * If HEAP_PAGE_PRUNE_UPDATE_VM is set in params and the visibility status of
+ * the page has changed, we will update the VM at the same time as pruning and
+ * freezing the heap page. We will also update presult->old_vmbits and
+ * presult->new_vmbits with the state of the VM before and after updating it
+ * for the caller to use in bookkeeping.
*
* presult contains output parameters needed by callers, such as the number of
* tuples removed and the offsets of dead items on the page after pruning.
@@ -832,13 +988,18 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
MultiXactId *new_relmin_mxid)
{
Buffer buffer = params->buffer;
+ Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
+ BlockNumber blockno = BufferGetBlockNumber(buffer);
PruneState prstate;
bool do_freeze;
bool do_prune;
bool do_hint_prune;
+ bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ uint8 new_vmbits;
+ uint8 old_vmbits;
/* Initialize prstate */
prune_freeze_setup(params,
@@ -1021,6 +1182,71 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
}
}
+
+ /* Now update the visibility map and PD_ALL_VISIBLE hint */
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ /* Set the visibility map and page visibility hint, if relevant */
+ if (do_set_vm)
+ {
+ Assert(prstate.all_visible);
+
+ /*
+ * It should never be the case that the visibility map page is set
+ * while the page-level bit is clear (and if so, we cleared it above),
+ * but the reverse is allowed (if checksums are not enabled).
+ * Regardless, set both bits so that we get back in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the WAL
+ * chain when setting the VM. We don't worry about unnecessarily
+ * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
+ * It is extremely rare to have a clean heap buffer with
+ * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
+ * point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ MarkBufferDirty(buffer);
+
+ /*
+ * If the page is being set all-frozen, we pass InvalidTransactionId
+ * as the cutoff_xid, since a snapshot conflict horizon sufficient to
+ * make everything safe for REDO was logged when the page's tuples
+ * were frozen.
+ */
+ Assert(!prstate.all_frozen ||
+ !TransactionIdIsValid(presult->vm_conflict_horizon));
+
+ visibilitymap_set(params->relation, blockno, buffer,
+ InvalidXLogRecPtr,
+ vmbuffer, presult->vm_conflict_horizon,
+ new_vmbits);
+
+ if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ {
+ presult->new_all_visible_pages = 1;
+ if (presult->all_frozen)
+ presult->new_all_visible_frozen_pages = 1;
+ }
+ else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
+ presult->all_frozen)
+ presult->new_all_frozen_pages = 1;
+ }
}
@@ -1495,6 +1721,8 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
{
TransactionId xmin;
+ Assert(prstate->attempt_update_vm);
+
if (!HeapTupleHeaderXminCommitted(htup))
{
prstate->all_visible = false;
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4be267ff657..323b8e3dde3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -422,11 +422,6 @@ static void find_next_unskippable_block(LVRelState *vacrel, bool *skipsallvis);
static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
bool sharelock, Buffer vmbuffer);
-static void identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 *vmbits);
static int lazy_scan_prune(LVRelState *vacrel, Buffer buf,
BlockNumber blkno, Page page,
Buffer vmbuffer,
@@ -1960,81 +1955,6 @@ cmpOffsetNumbers(const void *a, const void *b)
return pg_cmp_u16(*(const OffsetNumber *) a, *(const OffsetNumber *) b);
}
-/*
- * Helper to correct any corruption detected on a heap page and its
- * corresponding visibility map page after pruning but before setting the
- * visibility map. It examines the heap page, the associated VM page, and the
- * number of dead items previously identified.
- *
- * This function must be called while holding an exclusive lock on the heap
- * buffer, and the dead items must have been discovered under that same lock.
-
- * The provided vmbits must reflect the current state of the VM block
- * referenced by vmbuffer. Although we do not hold a lock on the VM buffer, it
- * is pinned, and the heap buffer is exclusively locked, ensuring that no
- * other backend can update the VM bits corresponding to this heap page.
- *
- * If it clears corruption, it will zero out vmbits.
- */
-static void
-identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
- BlockNumber heap_blk, Page heap_page,
- int nlpdead_items,
- Buffer vmbuffer,
- uint8 *vmbits)
-{
- Assert(visibilitymap_get_status(rel, heap_blk, &vmbuffer) == *vmbits);
-
- Assert(BufferIsLockedByMeInMode(heap_buffer, BUFFER_LOCK_EXCLUSIVE));
-
- /*
- * As of PostgreSQL 9.2, the visibility map bit should never be set if the
- * page-level bit is clear. However, it's possible that the bit got
- * cleared after heap_vac_scan_next_block() was called, so we must recheck
- * with buffer lock before concluding that the VM is corrupt.
- */
- if (!PageIsAllVisible(heap_page) &&
- ((*vmbits & VISIBILITYMAP_VALID_BITS) != 0))
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- *vmbits = 0;
- }
-
- /*
- * It's possible for the value returned by
- * GetOldestNonRemovableTransactionId() to move backwards, so it's not
- * wrong for us to see tuples that appear to not be visible to everyone
- * yet, while PD_ALL_VISIBLE is already set. The real safe xmin value
- * never moves backwards, but GetOldestNonRemovableTransactionId() is
- * conservative and sometimes returns a value that's unnecessarily small,
- * so if we see that contradiction it just means that the tuples that we
- * think are not visible to everyone yet actually are, and the
- * PD_ALL_VISIBLE flag is correct.
- *
- * There should never be LP_DEAD items on a page with PD_ALL_VISIBLE set,
- * however.
- */
- else if (PageIsAllVisible(heap_page) && nlpdead_items > 0)
- {
- ereport(WARNING,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("page containing LP_DEAD items is marked as all-visible in relation \"%s\" page %u",
- RelationGetRelationName(rel), heap_blk)));
-
- PageClearAllVisible(heap_page);
- MarkBufferDirty(heap_buffer);
- visibilitymap_clear(rel, heap_blk, vmbuffer,
- VISIBILITYMAP_VALID_BITS);
- *vmbits = 0;
- }
-}
-
/*
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
*
@@ -2066,13 +1986,12 @@ lazy_scan_prune(LVRelState *vacrel,
PruneFreezeParams params = {
.relation = rel,
.buffer = buf,
+ .vmbuffer = vmbuffer,
.reason = PRUNE_VACUUM_SCAN,
- .options = HEAP_PAGE_PRUNE_FREEZE,
+ .options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
.vistest = vacrel->vistest,
.cutoffs = &vacrel->cutoffs,
};
- uint8 old_vmbits = 0;
- uint8 new_vmbits = 0;
Assert(BufferGetBlockNumber(buf) == blkno);
@@ -2159,6 +2078,14 @@ lazy_scan_prune(LVRelState *vacrel,
}
/* Finally, add page-local counts to whole-VACUUM counts */
+ vacrel->vm_new_visible_pages += presult.new_all_visible_pages;
+ vacrel->vm_new_visible_frozen_pages += presult.new_all_visible_frozen_pages;
+ vacrel->vm_new_frozen_pages += presult.new_all_frozen_pages;
+
+ /* Capture if the page was newly set frozen */
+ *vm_page_frozen = presult.new_all_visible_frozen_pages > 0 ||
+ presult.new_all_frozen_pages > 0;
+
vacrel->tuples_deleted += presult.ndeleted;
vacrel->tuples_frozen += presult.nfrozen;
vacrel->lpdead_items += presult.lpdead_items;
@@ -2172,76 +2099,6 @@ lazy_scan_prune(LVRelState *vacrel,
/* Did we find LP_DEAD items? */
*has_lpdead_items = (presult.lpdead_items > 0);
- Assert(!presult.all_visible || !(*has_lpdead_items));
- Assert(!presult.all_frozen || presult.all_visible);
-
- old_vmbits = visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer);
-
- identify_and_fix_vm_corruption(vacrel->rel, buf, blkno, page,
- presult.lpdead_items, vmbuffer,
- &old_vmbits);
-
- if (!presult.all_visible)
- return presult.ndeleted;
-
- /* Set the visibility map and page visibility hint */
- new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
-
- if (presult.all_frozen)
- new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
-
- /* Nothing to do */
- if (old_vmbits == new_vmbits)
- return presult.ndeleted;
-
- /*
- * It should never be the case that the visibility map page is set while
- * the page-level bit is clear (and if so, we cleared it above), but the
- * reverse is allowed (if checksums are not enabled). Regardless, set both
- * bits so that we get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL chain
- * when setting the VM. We don't worry about unnecessarily dirtying the
- * heap buffer if PD_ALL_VISIBLE is already set, though. It is extremely
- * rare to have a clean heap buffer with PD_ALL_VISIBLE already set and
- * the VM bits clear, so there is no point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buf);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId as
- * the cutoff_xid, since a snapshot conflict horizon sufficient to make
- * everything safe for REDO was logged when the page's tuples were frozen.
- */
- Assert(!presult.all_frozen ||
- !TransactionIdIsValid(presult.vm_conflict_horizon));
-
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, presult.vm_conflict_horizon,
- new_vmbits);
-
- /*
- * If the page wasn't already set all-visible and/or all-frozen in the VM,
- * count it as newly set for logging.
- */
- if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
- {
- vacrel->vm_new_visible_pages++;
- if (presult.all_frozen)
- {
- vacrel->vm_new_visible_frozen_pages++;
- *vm_page_frozen = true;
- }
- }
- else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult.all_frozen)
- {
- vacrel->vm_new_frozen_pages++;
- *vm_page_frozen = true;
- }
-
return presult.ndeleted;
}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..df9fa17a6f9 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -42,6 +42,7 @@
/* "options" flag bits for heap_page_prune_and_freeze */
#define HEAP_PAGE_PRUNE_MARK_UNUSED_NOW (1 << 0)
#define HEAP_PAGE_PRUNE_FREEZE (1 << 1)
+#define HEAP_PAGE_PRUNE_UPDATE_VM (1 << 2)
typedef struct BulkInsertStateData *BulkInsertState;
typedef struct GlobalVisState GlobalVisState;
@@ -238,6 +239,12 @@ typedef struct PruneFreezeParams
Relation relation; /* relation containing buffer to be pruned */
Buffer buffer; /* buffer to be pruned */
+ /*
+ * If we will consider updating the visibility map, vmbuffer should
+ * contain the correct block of the visibility map and be pinned.
+ */
+ Buffer vmbuffer;
+
/*
* The reason pruning was performed. It is used to set the WAL record
* opcode which is used for debugging and analysis purposes.
@@ -252,6 +259,9 @@ typedef struct PruneFreezeParams
*
* HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
* will return 'all_visible', 'all_frozen' flags to the caller.
+ *
+ * HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
+ * in the VM.
*/
int options;
@@ -299,6 +309,16 @@ typedef struct PruneFreezeResult
bool all_frozen;
TransactionId vm_conflict_horizon;
+ /*
+ * Whether or not the page was newly set all-visible and all-frozen during
+ * phase I of vacuuming.
+ *
+ * These are only set if the HEAP_PAGE_PRUNE_UPDATE_VM option is set.
+ */
+ BlockNumber new_all_visible_pages;
+ BlockNumber new_all_visible_frozen_pages;
+ BlockNumber new_all_frozen_pages;
+
/*
* Whether or not the page makes rel truncation unsafe. This is set to
* 'true', even if the page contains LP_DEAD items. VACUUM will remove
--
2.43.0
[text/x-patch] v34-0003-Move-VM-assert-into-prune-freeze-code-and-simpli.patch (9.4K, 4-v34-0003-Move-VM-assert-into-prune-freeze-code-and-simpli.patch)
download | inline diff:
From e591bf061ee673f3750d1180673e1ab48be43bb8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 27 Jan 2026 16:53:11 -0500
Subject: [PATCH v34 03/14] Move VM assert into prune/freeze code and simplify
returned values
After pruning and freezing, we do an assert-only validatation that the
page's visibility status matches what we found during the pruning and
freezing pass over the page.
There's no reason to wait until lazy_scan_prune() to do this validation,
as all of the VM setting logic has already been moved to
heap_page_prune_and_freeze().
Doing so also allows us to remove some fields of PruneFreezeResult,
narrowing the scope of values the caller has to think about.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t%40gfpkkwmdnepx
---
src/backend/access/heap/pruneheap.c | 65 +++++++++++++++++++---------
src/backend/access/heap/vacuumlazy.c | 35 +--------------
src/include/access/heapam.h | 26 ++++-------
3 files changed, 54 insertions(+), 72 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 014c3c92d6c..192df9a2218 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -991,6 +991,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
+ TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -1149,23 +1150,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
presult->nfrozen = prstate.nfrozen;
presult->live_tuples = prstate.live_tuples;
presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->all_visible = prstate.all_visible;
- presult->all_frozen = prstate.all_frozen;
presult->hastup = prstate.hastup;
- /*
- * For callers planning to update the visibility map, the conflict horizon
- * for that record must be the newest xmin on the page. However, if the
- * page is completely frozen, there can be no conflict and the
- * vm_conflict_horizon should remain InvalidTransactionId. This includes
- * the case that we just froze all the tuples; the prune-freeze record
- * included the conflict XID already so the caller doesn't need it.
- */
- if (presult->all_frozen)
- presult->vm_conflict_horizon = InvalidTransactionId;
- else
- presult->vm_conflict_horizon = prstate.visibility_cutoff_xid;
-
presult->lpdead_items = prstate.lpdead_items;
/* the presult->deadoffsets array was already filled in */
@@ -1183,6 +1169,46 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
}
}
+ /*
+ * If updating the visibility map, the conflict horizon for that record
+ * must be the newest xmin on the page. However, if the page is
+ * completely frozen, there can be no conflict and the vm_conflict_horizon
+ * should remain InvalidTransactionId. This includes the case that we
+ * just froze all the tuples; the prune-freeze record included the
+ * conflict XID already so we don't need to again.
+ */
+ if (prstate.all_frozen)
+ vm_conflict_horizon = InvalidTransactionId;
+ else
+ vm_conflict_horizon = prstate.visibility_cutoff_xid;
+
+ /*
+ * During its second pass over the heap, VACUUM calls
+ * heap_page_would_be_all_visible() to determine whether a page is
+ * all-visible and all-frozen. The logic here is similar. After completing
+ * pruning and freezing, use an assertion to verify that our results
+ * remain consistent with heap_page_would_be_all_visible().
+ */
+#ifdef USE_ASSERT_CHECKING
+ if (prstate.all_visible)
+ {
+ TransactionId debug_cutoff;
+ bool debug_all_frozen;
+
+ Assert(presult->lpdead_items == 0);
+
+ Assert(heap_page_is_all_visible(params->relation, buffer,
+ prstate.cutoffs->OldestXmin,
+ &debug_all_frozen,
+ &debug_cutoff, off_loc));
+
+ Assert(prstate.all_frozen == debug_all_frozen);
+
+ Assert(!TransactionIdIsValid(debug_cutoff) ||
+ debug_cutoff == vm_conflict_horizon);
+ }
+#endif
+
/* Now update the visibility map and PD_ALL_VISIBLE hint */
Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
@@ -1229,22 +1255,21 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* make everything safe for REDO was logged when the page's tuples
* were frozen.
*/
- Assert(!prstate.all_frozen ||
- !TransactionIdIsValid(presult->vm_conflict_horizon));
+ Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
visibilitymap_set(params->relation, blockno, buffer,
InvalidXLogRecPtr,
- vmbuffer, presult->vm_conflict_horizon,
+ vmbuffer, vm_conflict_horizon,
new_vmbits);
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
presult->new_all_visible_pages = 1;
- if (presult->all_frozen)
+ if (prstate.all_frozen)
presult->new_all_visible_frozen_pages = 1;
}
else if ((old_vmbits & VISIBILITYMAP_ALL_FROZEN) == 0 &&
- presult->all_frozen)
+ prstate.all_frozen)
presult->new_all_frozen_pages = 1;
}
}
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 323b8e3dde3..90e94b2ac3f 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -456,13 +456,6 @@ static void dead_items_add(LVRelState *vacrel, BlockNumber blkno, OffsetNumber *
static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
-#ifdef USE_ASSERT_CHECKING
-static bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
- bool *all_frozen,
- TransactionId *visibility_cutoff_xid,
- OffsetNumber *logging_offnum);
-#endif
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
OffsetNumber *deadoffsets,
@@ -2032,32 +2025,6 @@ lazy_scan_prune(LVRelState *vacrel,
vacrel->new_frozen_tuple_pages++;
}
- /*
- * VACUUM will call heap_page_is_all_visible() during the second pass over
- * the heap to determine all_visible and all_frozen for the page -- this
- * is a specialized version of the logic from this function. Now that
- * we've finished pruning and freezing, make sure that we're in total
- * agreement with heap_page_is_all_visible() using an assertion.
- */
-#ifdef USE_ASSERT_CHECKING
- if (presult.all_visible)
- {
- TransactionId debug_cutoff;
- bool debug_all_frozen;
-
- Assert(presult.lpdead_items == 0);
-
- Assert(heap_page_is_all_visible(vacrel->rel, buf,
- vacrel->cutoffs.OldestXmin, &debug_all_frozen,
- &debug_cutoff, &vacrel->offnum));
-
- Assert(presult.all_frozen == debug_all_frozen);
-
- Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == presult.vm_conflict_horizon);
- }
-#endif
-
/*
* Now save details of the LP_DEAD items from the page in vacrel
*/
@@ -3511,7 +3478,7 @@ dead_items_cleanup(LVRelState *vacrel)
* that expect no LP_DEAD on the page. Currently assert-only, but there is no
* reason not to use it outside of asserts.
*/
-static bool
+bool
heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId OldestXmin,
bool *all_frozen,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index df9fa17a6f9..a0f7974942e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -257,8 +257,7 @@ typedef struct PruneFreezeParams
* HEAP_PAGE_PRUNE_MARK_UNUSED_NOW indicates that dead items can be set
* LP_UNUSED during pruning.
*
- * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples, and
- * will return 'all_visible', 'all_frozen' flags to the caller.
+ * HEAP_PAGE_PRUNE_FREEZE indicates that we will also freeze tuples.
*
* HEAP_PAGE_PRUNE_UPDATE_VM indicates that we will set the page's status
* in the VM.
@@ -294,21 +293,6 @@ typedef struct PruneFreezeResult
int live_tuples;
int recently_dead_tuples;
- /*
- * all_visible and all_frozen indicate if the all-visible and all-frozen
- * bits in the visibility map can be set for this page, after pruning.
- *
- * vm_conflict_horizon is the newest xmin of live tuples on the page. The
- * caller can use it as the conflict horizon when setting the VM bits. It
- * is only valid if we froze some tuples (nfrozen > 0), and all_frozen is
- * true.
- *
- * These are only set if the HEAP_PAGE_PRUNE_FREEZE option is set.
- */
- bool all_visible;
- bool all_frozen;
- TransactionId vm_conflict_horizon;
-
/*
* Whether or not the page was newly set all-visible and all-frozen during
* phase I of vacuuming.
@@ -453,7 +437,13 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
-
+#ifdef USE_ASSERT_CHECKING
+extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
+ TransactionId OldestXmin,
+ bool *all_frozen,
+ TransactionId *visibility_cutoff_xid,
+ OffsetNumber *logging_offnum);
+#endif
/* in heap/heapam_visibility.c */
extern bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot,
Buffer buffer);
--
2.43.0
[text/x-patch] v34-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch (14.1K, 5-v34-0004-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch)
download | inline diff:
From a94267babeedec6705fd7f3b43242c6ba0e458c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 2 Dec 2025 16:16:22 -0500
Subject: [PATCH v34 04/14] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
prune/freeze
Vacuum no longer emits a separate WAL record for each page set
all-visible or all-frozen during phase I. Instead, visibility map
updates are now included in the XLOG_HEAP2_PRUNE_VACUUM_SCAN record that
is already emitted for pruning and freezing.
Previously, heap_page_prune_and_freeze() determined whether a page was
all-visible, but the corresponding VM bits were only set later in
lazy_scan_prune(). Now the VM is updated immediately in
heap_page_prune_and_freeze(), at the same time as the heap
modifications.
This change applies only to vacuum phase I, not to pruning performed
during normal page access.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/pruneheap.c | 266 ++++++++++++++++------------
1 file changed, 152 insertions(+), 114 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 192df9a2218..b8ba5b7a681 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -205,6 +205,11 @@ static bool heap_page_will_set_vm(PruneState *prstate,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
+static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed,
+ TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid);
/*
@@ -804,6 +809,62 @@ heap_page_will_freeze(Relation relation, Buffer buffer,
return do_freeze;
}
+/*
+ * Calculate the conflict horizon for the whole XLOG_HEAP2_PRUNE_VACUUM_SCAN
+ * or XLOG_HEAP2_PRUNE_ON_ACCESS record.
+ */
+static TransactionId
+get_conflict_xid(bool do_prune, bool do_freeze, bool do_set_vm,
+ uint8 old_vmbits, uint8 new_vmbits,
+ TransactionId latest_xid_removed, TransactionId frz_conflict_horizon,
+ TransactionId visibility_cutoff_xid)
+{
+ TransactionId conflict_xid;
+
+ /*
+ * We can omit the snapshot conflict horizon if we are not pruning or
+ * freezing any tuples and are setting an already all-visible page
+ * all-frozen in the VM. In this case, all of the tuples on the page must
+ * already be seen as frozen by all MVCC snapshots on the standby.
+ */
+ if (!do_prune &&
+ !do_freeze &&
+ (old_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0 &&
+ (new_vmbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ return InvalidTransactionId;
+
+ /*
+ * The snapshot conflict horizon for the whole record should be the most
+ * conservative of all the horizons calculated for any of the possible
+ * modifications. If this record will prune tuples, any transactions on
+ * the standby older than the youngest xmax of the most recently removed
+ * tuple this record will prune will conflict. If this record will freeze
+ * tuples, any transactions on the standby with xids older than the
+ * youngest tuple this record will freeze will conflict.
+ *
+ * If we are setting the VM, the conflict horizon is almost always the
+ * visibility cutoff XID, except in the situation described above.
+ *
+ * By picking the newest of all of those, we can ensure that all changes
+ * in the record have been taken into account.
+ */
+ if (do_set_vm)
+ conflict_xid = visibility_cutoff_xid;
+ else if (do_freeze)
+ conflict_xid = frz_conflict_horizon;
+ else
+ conflict_xid = InvalidTransactionId;
+
+ /*
+ * If we are removing tuples with a younger xmax than our so far
+ * calculated conflict_xid, we must use this as our horizon.
+ */
+ if (TransactionIdFollows(latest_xid_removed, conflict_xid))
+ conflict_xid = latest_xid_removed;
+
+ return conflict_xid;
+}
+
/*
* Helper to correct any corruption detected on a heap page and its
* corresponding visibility map page after pruning but before setting the
@@ -991,7 +1052,6 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Buffer vmbuffer = params->vmbuffer;
Page page = BufferGetPage(buffer);
BlockNumber blockno = BufferGetBlockNumber(buffer);
- TransactionId vm_conflict_horizon = InvalidTransactionId;
PruneState prstate;
bool do_freeze;
bool do_prune;
@@ -999,6 +1059,7 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool do_set_vm;
bool did_tuple_hint_fpi;
int64 fpi_before = pgWalUsage.wal_fpi;
+ TransactionId conflict_xid;
uint8 new_vmbits;
uint8 old_vmbits;
@@ -1063,6 +1124,37 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_visible = prstate.all_frozen = false;
Assert(!prstate.all_frozen || prstate.all_visible);
+ Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
+
+ /*
+ * Decide whether to set the VM bits based on information from the VM and
+ * the all_visible/all_frozen flags.
+ */
+ do_set_vm = heap_page_will_set_vm(&prstate,
+ params->relation,
+ blockno,
+ buffer,
+ page,
+ vmbuffer,
+ prstate.lpdead_items,
+ &old_vmbits,
+ &new_vmbits);
+
+ /*
+ * new_vmbits should be 0 regardless of whether or not the page is
+ * all-visible if we do not intend to set the VM.
+ */
+ Assert(do_set_vm || new_vmbits == 0);
+
+ conflict_xid = get_conflict_xid(do_prune, do_freeze, do_set_vm,
+ old_vmbits, new_vmbits,
+ prstate.latest_xid_removed,
+ prstate.frz_conflict_horizon,
+ prstate.visibility_cutoff_xid);
+
+ /* Lock vmbuffer before entering a critical section */
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
/* Any error while applying the changes is critical */
START_CRIT_SECTION();
@@ -1084,14 +1176,17 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
/*
* If that's all we had to do to the page, this is a non-WAL-logged
- * hint. If we are going to freeze or prune the page, we will mark
- * the buffer dirty below.
+ * hint. If we are going to freeze or prune the page or set
+ * PD_ALL_VISIBLE, we will mark the buffer dirty below.
+ *
+ * Setting PD_ALL_VISIBLE is fully WAL-logged because it is forbidden
+ * for the VM to be set and PD_ALL_VISIBLE to be clear.
*/
- if (!do_freeze && !do_prune)
+ if (!do_freeze && !do_prune && !do_set_vm)
MarkBufferDirtyHint(buffer, true);
}
- if (do_prune || do_freeze)
+ if (do_prune || do_freeze || do_set_vm)
{
/* Apply the planned item changes and repair page fragmentation. */
if (do_prune)
@@ -1105,6 +1200,26 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
if (do_freeze)
heap_freeze_prepared_tuples(buffer, prstate.frozen, prstate.nfrozen);
+ /* Set the visibility map and page visibility hint */
+ if (do_set_vm)
+ {
+ /*
+ * While it is valid for PD_ALL_VISIBLE to be set when the
+ * corresponding VM bit is clear, we strongly prefer to keep them
+ * in sync.
+ *
+ * The heap buffer must be marked dirty before adding it to the
+ * WAL chain when setting the VM. We don't worry about
+ * unnecessarily dirtying the heap buffer if PD_ALL_VISIBLE is
+ * already set, though. It is extremely rare to have a clean heap
+ * buffer with PD_ALL_VISIBLE already set and the VM bits clear,
+ * so there is no point in optimizing it.
+ */
+ PageSetAllVisible(page);
+ visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
+ }
+
MarkBufferDirty(buffer);
/*
@@ -1112,29 +1227,12 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
*/
if (RelationNeedsWAL(params->relation))
{
- /*
- * The snapshotConflictHorizon for the whole record should be the
- * most conservative of all the horizons calculated for any of the
- * possible modifications. If this record will prune tuples, any
- * transactions on the standby older than the youngest xmax of the
- * most recently removed tuple this record will prune will
- * conflict. If this record will freeze tuples, any transactions
- * on the standby with xids older than the youngest tuple this
- * record will freeze will conflict.
- */
- TransactionId conflict_xid;
-
- if (TransactionIdFollows(prstate.frz_conflict_horizon,
- prstate.latest_xid_removed))
- conflict_xid = prstate.frz_conflict_horizon;
- else
- conflict_xid = prstate.latest_xid_removed;
-
log_heap_prune_and_freeze(params->relation, buffer,
- InvalidBuffer, /* vmbuffer */
- 0, /* vmflags */
+ do_set_vm ? vmbuffer : InvalidBuffer,
+ do_set_vm ? new_vmbits : 0,
conflict_xid,
- true, params->reason,
+ true, /* cleanup lock */
+ params->reason,
prstate.frozen, prstate.nfrozen,
prstate.redirected, prstate.nredirected,
prstate.nowdead, prstate.ndead,
@@ -1144,43 +1242,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
END_CRIT_SECTION();
- /* Copy information back for caller */
- presult->ndeleted = prstate.ndeleted;
- presult->nnewlpdead = prstate.ndead;
- presult->nfrozen = prstate.nfrozen;
- presult->live_tuples = prstate.live_tuples;
- presult->recently_dead_tuples = prstate.recently_dead_tuples;
- presult->hastup = prstate.hastup;
-
- presult->lpdead_items = prstate.lpdead_items;
- /* the presult->deadoffsets array was already filled in */
-
- if (prstate.attempt_freeze)
- {
- if (presult->nfrozen > 0)
- {
- *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
- }
- else
- {
- *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
- *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
- }
- }
-
- /*
- * If updating the visibility map, the conflict horizon for that record
- * must be the newest xmin on the page. However, if the page is
- * completely frozen, there can be no conflict and the vm_conflict_horizon
- * should remain InvalidTransactionId. This includes the case that we
- * just froze all the tuples; the prune-freeze record included the
- * conflict XID already so we don't need to again.
- */
- if (prstate.all_frozen)
- vm_conflict_horizon = InvalidTransactionId;
- else
- vm_conflict_horizon = prstate.visibility_cutoff_xid;
+ if (do_set_vm)
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
/*
* During its second pass over the heap, VACUUM calls
@@ -1195,7 +1258,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
TransactionId debug_cutoff;
bool debug_all_frozen;
- Assert(presult->lpdead_items == 0);
+ Assert(prstate.lpdead_items == 0);
+ Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
prstate.cutoffs->OldestXmin,
@@ -1205,63 +1269,23 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
Assert(prstate.all_frozen == debug_all_frozen);
Assert(!TransactionIdIsValid(debug_cutoff) ||
- debug_cutoff == vm_conflict_horizon);
+ debug_cutoff == prstate.visibility_cutoff_xid);
}
#endif
- /* Now update the visibility map and PD_ALL_VISIBLE hint */
- Assert(!prstate.all_visible || (prstate.lpdead_items == 0));
-
- do_set_vm = heap_page_will_set_vm(&prstate,
- params->relation,
- blockno,
- buffer,
- page,
- vmbuffer,
- prstate.lpdead_items,
- &old_vmbits,
- &new_vmbits);
+ /* Copy information back for caller */
+ presult->ndeleted = prstate.ndeleted;
+ presult->nnewlpdead = prstate.ndead;
+ presult->nfrozen = prstate.nfrozen;
+ presult->live_tuples = prstate.live_tuples;
+ presult->recently_dead_tuples = prstate.recently_dead_tuples;
+ presult->hastup = prstate.hastup;
- /*
- * new_vmbits should be 0 regardless of whether or not the page is
- * all-visible if we do not intend to set the VM.
- */
- Assert(do_set_vm || new_vmbits == 0);
+ presult->lpdead_items = prstate.lpdead_items;
+ /* the presult->deadoffsets array was already filled in */
- /* Set the visibility map and page visibility hint, if relevant */
if (do_set_vm)
{
- Assert(prstate.all_visible);
-
- /*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear (and if so, we cleared it above),
- * but the reverse is allowed (if checksums are not enabled).
- * Regardless, set both bits so that we get back in sync.
- *
- * The heap buffer must be marked dirty before adding it to the WAL
- * chain when setting the VM. We don't worry about unnecessarily
- * dirtying the heap buffer if PD_ALL_VISIBLE is already set, though.
- * It is extremely rare to have a clean heap buffer with
- * PD_ALL_VISIBLE already set and the VM bits clear, so there is no
- * point in optimizing it.
- */
- PageSetAllVisible(page);
- MarkBufferDirty(buffer);
-
- /*
- * If the page is being set all-frozen, we pass InvalidTransactionId
- * as the cutoff_xid, since a snapshot conflict horizon sufficient to
- * make everything safe for REDO was logged when the page's tuples
- * were frozen.
- */
- Assert(!prstate.all_frozen || !TransactionIdIsValid(vm_conflict_horizon));
-
- visibilitymap_set(params->relation, blockno, buffer,
- InvalidXLogRecPtr,
- vmbuffer, vm_conflict_horizon,
- new_vmbits);
-
if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0)
{
presult->new_all_visible_pages = 1;
@@ -1272,6 +1296,20 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prstate.all_frozen)
presult->new_all_frozen_pages = 1;
}
+
+ if (prstate.attempt_freeze)
+ {
+ if (presult->nfrozen > 0)
+ {
+ *new_relfrozen_xid = prstate.pagefrz.FreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.FreezePageRelminMxid;
+ }
+ else
+ {
+ *new_relfrozen_xid = prstate.pagefrz.NoFreezePageRelfrozenXid;
+ *new_relmin_mxid = prstate.pagefrz.NoFreezePageRelminMxid;
+ }
+ }
}
--
2.43.0
[text/x-patch] v34-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch (2.6K, 6-v34-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch)
download | inline diff:
From dda6ceffc924333fd7587c92c336a01932967532 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:21 -0400
Subject: [PATCH v34 05/14] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/vacuumlazy.c | 35 +++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 90e94b2ac3f..86b3155717e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1900,9 +1900,12 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
*/
if (!PageIsAllVisible(page))
{
+ /* Lock vmbuffer before entering critical section */
+ LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE);
+
START_CRIT_SECTION();
- /* mark buffer dirty before writing a WAL record */
+ /* Mark buffer dirty before writing any WAL records */
MarkBufferDirty(buf);
/*
@@ -1919,13 +1922,33 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set(vacrel->rel, blkno, buf,
- InvalidXLogRecPtr,
- vmbuffer, InvalidTransactionId,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN);
+ visibilitymap_set_vmbits(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
+
+ /*
+ * Emit WAL for setting PD_ALL_VISIBLE on the heap page and
+ * setting the VM.
+ */
+ if (RelationNeedsWAL(vacrel->rel))
+ log_heap_prune_and_freeze(vacrel->rel, buf,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ InvalidTransactionId, /* conflict xid */
+ false, /* cleanup lock */
+ PRUNE_VACUUM_SCAN, /* reason */
+ NULL, 0,
+ NULL, 0,
+ NULL, 0,
+ NULL, 0);
+
END_CRIT_SECTION();
+ LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
+
/* Count the newly all-frozen pages for logging */
vacrel->vm_new_visible_pages++;
vacrel->vm_new_visible_frozen_pages++;
--
2.43.0
[text/x-patch] v34-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch (24.9K, 7-v34-0006-Remove-XLOG_HEAP2_VISIBLE-entirely.patch)
download | inline diff:
From cfe59761babde9cf9aeab5d5ca0b50220a41b76d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 27 Sep 2025 11:55:36 -0400
Subject: [PATCH v34 06/14] Remove XLOG_HEAP2_VISIBLE entirely
As no remaining users emit XLOG_HEAP2_VISIBLE records.
This includes deleting the xl_heap_visible struct and all functions
responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
This changes the visibility map API, so any external users/consumers of
the VM-only WAL record will need to change.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/common/bufmask.c | 4 +-
src/backend/access/heap/heapam.c | 54 +-------
src/backend/access/heap/heapam_xlog.c | 155 ++---------------------
src/backend/access/heap/pruneheap.c | 4 +-
src/backend/access/heap/vacuumlazy.c | 16 +--
src/backend/access/heap/visibilitymap.c | 110 +---------------
src/backend/access/rmgrdesc/heapdesc.c | 10 --
src/backend/replication/logical/decode.c | 1 -
src/backend/storage/ipc/standby.c | 12 +-
src/include/access/heapam_xlog.h | 20 ---
src/include/access/visibilitymap.h | 13 +-
src/include/access/visibilitymapdefs.h | 9 --
src/tools/pgindent/typedefs.list | 1 -
13 files changed, 38 insertions(+), 371 deletions(-)
diff --git a/src/backend/access/common/bufmask.c b/src/backend/access/common/bufmask.c
index 1a9e7bea5d2..bce767d7b71 100644
--- a/src/backend/access/common/bufmask.c
+++ b/src/backend/access/common/bufmask.c
@@ -56,8 +56,8 @@ mask_page_hint_bits(Page page)
/*
* During replay, if the page LSN has advanced past our XLOG record's LSN,
- * we don't mark the page all-visible. See heap_xlog_visible() for
- * details.
+ * we don't mark the page all-visible. See heap_xlog_prune_and_freeze()
+ * for more details.
*/
PageClearAllVisible(page);
}
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f30a56ecf55..1a3fa8a76aa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2586,11 +2586,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
else if (all_frozen_set)
{
PageSetAllVisible(page);
- visibilitymap_set_vmbits(BufferGetBlockNumber(buffer),
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- relation->rd_locator);
+ visibilitymap_set(BufferGetBlockNumber(buffer),
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ relation->rd_locator);
}
/*
@@ -8875,50 +8875,6 @@ bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate)
return nblocksfavorable;
}
-/*
- * Perform XLogInsert for a heap-visible operation. 'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block. Both should have already been modified
- * and dirtied.
- *
- * snapshotConflictHorizon comes from the largest xmin on the page being
- * marked all-visible. REDO routine uses it to generate recovery conflicts.
- *
- * If checksums or wal_log_hints are enabled, we may also generate a full-page
- * image of heap_buffer. Otherwise, we optimize away the FPI (by specifying
- * REGBUF_NO_IMAGE for the heap buffer), in which case the caller should *not*
- * update the heap page's LSN.
- */
-XLogRecPtr
-log_heap_visible(Relation rel, Buffer heap_buffer, Buffer vm_buffer,
- TransactionId snapshotConflictHorizon, uint8 vmflags)
-{
- xl_heap_visible xlrec;
- XLogRecPtr recptr;
- uint8 flags;
-
- Assert(BufferIsValid(heap_buffer));
- Assert(BufferIsValid(vm_buffer));
-
- xlrec.snapshotConflictHorizon = snapshotConflictHorizon;
- xlrec.flags = vmflags;
- if (RelationIsAccessibleInLogicalDecoding(rel))
- xlrec.flags |= VISIBILITYMAP_XLOG_CATALOG_REL;
- XLogBeginInsert();
- XLogRegisterData(&xlrec, SizeOfHeapVisible);
-
- XLogRegisterBuffer(0, vm_buffer, 0);
-
- flags = REGBUF_STANDARD;
- if (!XLogHintBitIsNeeded())
- flags |= REGBUF_NO_IMAGE;
- XLogRegisterBuffer(1, heap_buffer, flags);
-
- recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_VISIBLE);
-
- return recptr;
-}
-
/*
* Perform XLogInsert for a heap-update operation. Caller must already
* have modified the buffer(s) and marked them dirty.
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index f765345e9e4..9a29fda3601 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -236,7 +236,7 @@ heap_xlog_prune_freeze(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno, vmbuffer, vmflags, rlocator);
+ visibilitymap_set(blkno, vmbuffer, vmflags, rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -249,142 +249,6 @@ heap_xlog_prune_freeze(XLogReaderState *record)
XLogRecordPageWithFreeSpace(rlocator, blkno, freespace);
}
-/*
- * Replay XLOG_HEAP2_VISIBLE records.
- *
- * The critical integrity requirement here is that we must never end up with
- * a situation where the visibility map bit is set, and the page-level
- * PD_ALL_VISIBLE bit is clear. If that were to occur, then a subsequent
- * page modification would fail to clear the visibility map bit.
- */
-static void
-heap_xlog_visible(XLogReaderState *record)
-{
- XLogRecPtr lsn = record->EndRecPtr;
- xl_heap_visible *xlrec = (xl_heap_visible *) XLogRecGetData(record);
- Buffer vmbuffer = InvalidBuffer;
- Buffer buffer;
- Page page;
- RelFileLocator rlocator;
- BlockNumber blkno;
- XLogRedoAction action;
-
- Assert((xlrec->flags & VISIBILITYMAP_XLOG_VALID_BITS) == xlrec->flags);
-
- XLogRecGetBlockTag(record, 1, &rlocator, NULL, &blkno);
-
- /*
- * If there are any Hot Standby transactions running that have an xmin
- * horizon old enough that this page isn't all-visible for them, they
- * might incorrectly decide that an index-only scan can skip a heap fetch.
- *
- * NB: It might be better to throw some kind of "soft" conflict here that
- * forces any index-only scan that is in flight to perform heap fetches,
- * rather than killing the transaction outright.
- */
- if (InHotStandby)
- ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
- xlrec->flags & VISIBILITYMAP_XLOG_CATALOG_REL,
- rlocator);
-
- /*
- * Read the heap page, if it still exists. If the heap file has dropped or
- * truncated later in recovery, we don't need to update the page, but we'd
- * better still update the visibility map.
- */
- action = XLogReadBufferForRedo(record, 1, &buffer);
- if (action == BLK_NEEDS_REDO)
- {
- /*
- * We don't bump the LSN of the heap page when setting the visibility
- * map bit (unless checksums or wal_hint_bits is enabled, in which
- * case we must). This exposes us to torn page hazards, but since
- * we're not inspecting the existing page contents in any way, we
- * don't care.
- */
- page = BufferGetPage(buffer);
-
- PageSetAllVisible(page);
-
- if (XLogHintBitIsNeeded())
- PageSetLSN(page, lsn);
-
- MarkBufferDirty(buffer);
- }
- else if (action == BLK_RESTORED)
- {
- /*
- * If heap block was backed up, we already restored it and there's
- * nothing more to do. (This can only happen with checksums or
- * wal_log_hints enabled.)
- */
- }
-
- if (BufferIsValid(buffer))
- {
- Size space = PageGetFreeSpace(BufferGetPage(buffer));
-
- UnlockReleaseBuffer(buffer);
-
- /*
- * Since FSM is not WAL-logged and only updated heuristically, it
- * easily becomes stale in standbys. If the standby is later promoted
- * and runs VACUUM, it will skip updating individual free space
- * figures for pages that became all-visible (or all-frozen, depending
- * on the vacuum mode,) which is troublesome when FreeSpaceMapVacuum
- * propagates too optimistic free space values to upper FSM layers;
- * later inserters try to use such pages only to find out that they
- * are unusable. This can cause long stalls when there are many such
- * pages.
- *
- * Forestall those problems by updating FSM's idea about a page that
- * is becoming all-visible or all-frozen.
- *
- * Do this regardless of a full-page image being applied, since the
- * FSM data is not in the page anyway.
- */
- if (xlrec->flags & VISIBILITYMAP_VALID_BITS)
- XLogRecordPageWithFreeSpace(rlocator, blkno, space);
- }
-
- /*
- * Even if we skipped the heap page update due to the LSN interlock, it's
- * still safe to update the visibility map. Any WAL record that clears
- * the visibility map bit does so before checking the page LSN, so any
- * bits that need to be cleared will still be cleared.
- */
- if (XLogReadBufferForRedoExtended(record, 0, RBM_ZERO_ON_ERROR, false,
- &vmbuffer) == BLK_NEEDS_REDO)
- {
- Page vmpage = BufferGetPage(vmbuffer);
- Relation reln;
- uint8 vmbits;
-
- /* initialize the page if it was read as zeros */
- if (PageIsNew(vmpage))
- PageInit(vmpage, BLCKSZ, 0);
-
- /* remove VISIBILITYMAP_XLOG_* */
- vmbits = xlrec->flags & VISIBILITYMAP_VALID_BITS;
-
- /*
- * XLogReadBufferForRedoExtended locked the buffer. But
- * visibilitymap_set will handle locking itself.
- */
- LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
-
- reln = CreateFakeRelcacheEntry(rlocator);
-
- visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
- xlrec->snapshotConflictHorizon, vmbits);
-
- ReleaseBuffer(vmbuffer);
- FreeFakeRelcacheEntry(reln);
- }
- else if (BufferIsValid(vmbuffer))
- UnlockReleaseBuffer(vmbuffer);
-}
-
/*
* Given an "infobits" field from an XLog record, set the correct bits in the
* given infomask and infomask2 for the tuple touched by the record.
@@ -762,8 +626,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
*
* During recovery, however, no concurrent writers exist. Therefore,
* updating the VM without holding the heap page lock is safe enough. This
- * same approach is taken when replaying xl_heap_visible records (see
- * heap_xlog_visible()).
+ * same approach is taken when replaying XLOG_HEAP2_PRUNE* records (see
+ * heap_xlog_prune_and_freeze()).
*/
if ((xlrec->flags & XLH_INSERT_ALL_FROZEN_SET) &&
XLogReadBufferForRedoExtended(record, 1, RBM_ZERO_ON_ERROR, false,
@@ -775,11 +639,11 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (PageIsNew(vmpage))
PageInit(vmpage, BLCKSZ, 0);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- rlocator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ rlocator);
Assert(BufferIsDirty(vmbuffer));
PageSetLSN(vmpage, lsn);
@@ -1360,9 +1224,6 @@ heap2_redo(XLogReaderState *record)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
heap_xlog_prune_freeze(record);
break;
- case XLOG_HEAP2_VISIBLE:
- heap_xlog_visible(record);
- break;
case XLOG_HEAP2_MULTI_INSERT:
heap_xlog_multi_insert(record);
break;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b8ba5b7a681..2ba863be07c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1216,8 +1216,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
* so there is no point in optimizing it.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blockno, vmbuffer, new_vmbits,
- params->relation->rd_locator);
+ visibilitymap_set(blockno, vmbuffer, new_vmbits,
+ params->relation->rd_locator);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 86b3155717e..d4624010123 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1922,11 +1922,11 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
log_newpage_buffer(buf, true);
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer,
- VISIBILITYMAP_ALL_VISIBLE |
- VISIBILITYMAP_ALL_FROZEN,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer,
+ VISIBILITYMAP_ALL_VISIBLE |
+ VISIBILITYMAP_ALL_FROZEN,
+ vacrel->rel->rd_locator);
/*
* Emit WAL for setting PD_ALL_VISIBLE on the heap page and
@@ -2789,9 +2789,9 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* set PD_ALL_VISIBLE.
*/
PageSetAllVisible(page);
- visibilitymap_set_vmbits(blkno,
- vmbuffer, vmflags,
- vacrel->rel->rd_locator);
+ visibilitymap_set(blkno,
+ vmbuffer, vmflags,
+ vacrel->rel->rd_locator);
conflict_xid = visibility_cutoff_xid;
}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 3047bd46def..fc74e39e069 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -14,8 +14,7 @@
* visibilitymap_clear - clear bits for one page in the visibility map
* visibilitymap_pin - pin a map page for setting a bit
* visibilitymap_pin_ok - check whether correct map page is already pinned
- * visibilitymap_set - set bit(s) in a previously pinned page and log
- * visibilitymap_set_vmbits - set bit(s) in a pinned page
+ * visibilitymap_set - set bit(s) in a previously pinned page
* visibilitymap_get_status - get status of bits
* visibilitymap_count - count number of bits set in visibility map
* visibilitymap_prepare_truncate -
@@ -220,112 +219,11 @@ visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf)
return BufferIsValid(vmbuf) && BufferGetBlockNumber(vmbuf) == mapBlock;
}
-/*
- * visibilitymap_set - set bit(s) on a previously pinned page
- *
- * recptr is the LSN of the XLOG record we're replaying, if we're in recovery,
- * or InvalidXLogRecPtr in normal running. The VM page LSN is advanced to the
- * one provided; in normal running, we generate a new XLOG record and set the
- * page LSN to that value (though the heap page's LSN may *not* be updated;
- * see below). cutoff_xid is the largest xmin on the page being marked
- * all-visible; it is needed for Hot Standby, and can be InvalidTransactionId
- * if the page contains no tuples. It can also be set to InvalidTransactionId
- * when a page that is already all-visible is being marked all-frozen.
- *
- * Caller is expected to set the heap page's PD_ALL_VISIBLE bit before calling
- * this function. Except in recovery, caller should also pass the heap
- * buffer. When checksums are enabled and we're not in recovery, we must add
- * the heap buffer to the WAL chain to protect it from being torn.
- *
- * You must pass a buffer containing the correct map page to this function.
- * Call visibilitymap_pin first to pin the right one. This function doesn't do
- * any I/O.
- */
-void
-visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr, Buffer vmBuf, TransactionId cutoff_xid,
- uint8 flags)
-{
- BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
- uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
- uint8 mapOffset = HEAPBLK_TO_OFFSET(heapBlk);
- Page page;
- uint8 *map;
- uint8 status;
-
-#ifdef TRACE_VISIBILITYMAP
- elog(DEBUG1, "vm_set flags 0x%02X for %s %d",
- flags, RelationGetRelationName(rel), heapBlk);
-#endif
-
- Assert(InRecovery || !XLogRecPtrIsValid(recptr));
- Assert(InRecovery || PageIsAllVisible(BufferGetPage(heapBuf)));
- Assert((flags & VISIBILITYMAP_VALID_BITS) == flags);
-
- /* Must never set all_frozen bit without also setting all_visible bit */
- Assert(flags != VISIBILITYMAP_ALL_FROZEN);
-
- /* Check that we have the right heap page pinned, if present */
- if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
- elog(ERROR, "wrong heap buffer passed to visibilitymap_set");
-
- Assert(!BufferIsValid(heapBuf) ||
- BufferIsLockedByMeInMode(heapBuf, BUFFER_LOCK_EXCLUSIVE));
-
- /* Check that we have the right VM page pinned */
- if (!BufferIsValid(vmBuf) || BufferGetBlockNumber(vmBuf) != mapBlock)
- elog(ERROR, "wrong VM buffer passed to visibilitymap_set");
-
- page = BufferGetPage(vmBuf);
- map = (uint8 *) PageGetContents(page);
- LockBuffer(vmBuf, BUFFER_LOCK_EXCLUSIVE);
-
- status = (map[mapByte] >> mapOffset) & VISIBILITYMAP_VALID_BITS;
- if (flags != status)
- {
- START_CRIT_SECTION();
-
- map[mapByte] |= (flags << mapOffset);
- MarkBufferDirty(vmBuf);
-
- if (RelationNeedsWAL(rel))
- {
- if (!XLogRecPtrIsValid(recptr))
- {
- Assert(!InRecovery);
- recptr = log_heap_visible(rel, heapBuf, vmBuf, cutoff_xid, flags);
-
- /*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
- if (XLogHintBitIsNeeded())
- {
- Page heapPage = BufferGetPage(heapBuf);
-
- PageSetLSN(heapPage, recptr);
- }
- }
- PageSetLSN(page, recptr);
- }
-
- END_CRIT_SECTION();
- }
-
- LockBuffer(vmBuf, BUFFER_LOCK_UNLOCK);
-}
-
/*
* Set VM (visibility map) flags in the VM block in vmBuf.
*
* This function is intended for callers that log VM changes together
* with the heap page modifications that rendered the page all-visible.
- * Callers that log VM changes separately should use visibilitymap_set().
*
* vmBuf must be pinned and exclusively locked, and it must cover the VM bits
* corresponding to heapBlk.
@@ -341,9 +239,9 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
* rlocator is used only for debugging messages.
*/
void
-visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator)
+visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator)
{
BlockNumber mapBlock = HEAPBLK_TO_MAPBLOCK(heapBlk);
uint32 mapByte = HEAPBLK_TO_MAPBYTE(heapBlk);
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 02ae91653c1..75ae6f9d375 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -349,13 +349,6 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
}
}
}
- else if (info == XLOG_HEAP2_VISIBLE)
- {
- xl_heap_visible *xlrec = (xl_heap_visible *) rec;
-
- appendStringInfo(buf, "snapshotConflictHorizon: %u, flags: 0x%02X",
- xlrec->snapshotConflictHorizon, xlrec->flags);
- }
else if (info == XLOG_HEAP2_MULTI_INSERT)
{
xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
@@ -461,9 +454,6 @@ heap2_identify(uint8 info)
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
id = "PRUNE_VACUUM_CLEANUP";
break;
- case XLOG_HEAP2_VISIBLE:
- id = "VISIBLE";
- break;
case XLOG_HEAP2_MULTI_INSERT:
id = "MULTI_INSERT";
break;
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index e25dd6bc366..f7ddb56fc30 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -449,7 +449,6 @@ heap2_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
case XLOG_HEAP2_PRUNE_ON_ACCESS:
case XLOG_HEAP2_PRUNE_VACUUM_SCAN:
case XLOG_HEAP2_PRUNE_VACUUM_CLEANUP:
- case XLOG_HEAP2_VISIBLE:
case XLOG_HEAP2_LOCK_UPDATED:
break;
default:
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index afffab77106..f8681dcc9c7 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -475,12 +475,12 @@ ResolveRecoveryConflictWithSnapshot(TransactionId snapshotConflictHorizon,
* If we get passed InvalidTransactionId then we do nothing (no conflict).
*
* This can happen when replaying already-applied WAL records after a
- * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
- * record that marks as frozen a page which was already all-visible. It's
- * also quite common with records generated during index deletion
- * (original execution of the deletion can reason that a recovery conflict
- * which is sufficient for the deletion operation must take place before
- * replay of the deletion record itself).
+ * standby crash or restart, or when replaying a record that marks as
+ * frozen a page which was already marked all-visible in the visibility
+ * map. It's also quite common with records generated during index
+ * deletion (original execution of the deletion can reason that a recovery
+ * conflict which is sufficient for the deletion operation must take place
+ * before replay of the deletion record itself).
*/
if (!TransactionIdIsValid(snapshotConflictHorizon))
return;
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index ce3566ba949..5eed567a8e5 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -60,7 +60,6 @@
#define XLOG_HEAP2_PRUNE_ON_ACCESS 0x10
#define XLOG_HEAP2_PRUNE_VACUUM_SCAN 0x20
#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP 0x30
-#define XLOG_HEAP2_VISIBLE 0x40
#define XLOG_HEAP2_MULTI_INSERT 0x50
#define XLOG_HEAP2_LOCK_UPDATED 0x60
#define XLOG_HEAP2_NEW_CID 0x70
@@ -443,20 +442,6 @@ typedef struct xl_heap_inplace
#define MinSizeOfHeapInplace (offsetof(xl_heap_inplace, nmsgs) + sizeof(int))
-/*
- * This is what we need to know about setting a visibility map bit
- *
- * Backup blk 0: visibility map buffer
- * Backup blk 1: heap buffer
- */
-typedef struct xl_heap_visible
-{
- TransactionId snapshotConflictHorizon;
- uint8 flags;
-} xl_heap_visible;
-
-#define SizeOfHeapVisible (offsetof(xl_heap_visible, flags) + sizeof(uint8))
-
typedef struct xl_heap_new_cid
{
/*
@@ -500,11 +485,6 @@ extern void heap2_desc(StringInfo buf, XLogReaderState *record);
extern const char *heap2_identify(uint8 info);
extern void heap_xlog_logical_rewrite(XLogReaderState *r);
-extern XLogRecPtr log_heap_visible(Relation rel, Buffer heap_buffer,
- Buffer vm_buffer,
- TransactionId snapshotConflictHorizon,
- uint8 vmflags);
-
/* in heapdesc.c, so it can be shared between frontend/backend code */
extern void heap_xlog_deserialize_prune_and_freeze(char *cursor, uint16 flags,
int *nplans, xlhp_freeze_plan **plans,
diff --git a/src/include/access/visibilitymap.h b/src/include/access/visibilitymap.h
index a0166c5b410..001afb037f3 100644
--- a/src/include/access/visibilitymap.h
+++ b/src/include/access/visibilitymap.h
@@ -15,7 +15,6 @@
#define VISIBILITYMAP_H
#include "access/visibilitymapdefs.h"
-#include "access/xlogdefs.h"
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/relfilelocator.h"
@@ -32,15 +31,9 @@ extern bool visibilitymap_clear(Relation rel, BlockNumber heapBlk,
extern void visibilitymap_pin(Relation rel, BlockNumber heapBlk,
Buffer *vmbuf);
extern bool visibilitymap_pin_ok(BlockNumber heapBlk, Buffer vmbuf);
-extern void visibilitymap_set(Relation rel,
- BlockNumber heapBlk, Buffer heapBuf,
- XLogRecPtr recptr,
- Buffer vmBuf,
- TransactionId cutoff_xid,
- uint8 flags);
-extern void visibilitymap_set_vmbits(BlockNumber heapBlk,
- Buffer vmBuf, uint8 flags,
- const RelFileLocator rlocator);
+extern void visibilitymap_set(BlockNumber heapBlk,
+ Buffer vmBuf, uint8 flags,
+ const RelFileLocator rlocator);
extern uint8 visibilitymap_get_status(Relation rel, BlockNumber heapBlk, Buffer *vmbuf);
extern void visibilitymap_count(Relation rel, BlockNumber *all_visible, BlockNumber *all_frozen);
extern BlockNumber visibilitymap_prepare_truncate(Relation rel,
diff --git a/src/include/access/visibilitymapdefs.h b/src/include/access/visibilitymapdefs.h
index 89153b3cd9a..e5794c8559e 100644
--- a/src/include/access/visibilitymapdefs.h
+++ b/src/include/access/visibilitymapdefs.h
@@ -21,14 +21,5 @@
#define VISIBILITYMAP_ALL_FROZEN 0x02
#define VISIBILITYMAP_VALID_BITS 0x03 /* OR of all valid visibilitymap
* flags bits */
-/*
- * To detect recovery conflicts during logical decoding on a standby, we need
- * to know if a table is a user catalog table. For that we add an additional
- * bit into xl_heap_visible.flags, in addition to the above.
- *
- * NB: VISIBILITYMAP_XLOG_* may not be passed to visibilitymap_set().
- */
-#define VISIBILITYMAP_XLOG_CATALOG_REL 0x04
-#define VISIBILITYMAP_XLOG_VALID_BITS (VISIBILITYMAP_VALID_BITS | VISIBILITYMAP_XLOG_CATALOG_REL)
#endif /* VISIBILITYMAPDEFS_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ddbe4c64971..a85b41e006b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4342,7 +4342,6 @@ xl_heap_prune
xl_heap_rewrite_mapping
xl_heap_truncate
xl_heap_update
-xl_heap_visible
xl_invalid_page
xl_invalid_page_key
xl_invalidations
--
2.43.0
[text/x-patch] v34-0007-Simplify-heap_page_would_be_all_visible-visibili.patch (1.9K, 8-v34-0007-Simplify-heap_page_would_be_all_visible-visibili.patch)
download | inline diff:
From a64707f1f2fa88d7292f7a2f2a760c613eea4950 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 13:57:16 -0500
Subject: [PATCH v34 07/14] Simplify heap_page_would_be_all_visible visibility
check
heap_page_would_be_all_visible() doesn't care about the distinction
between HEAPTUPLE_RECENTLY_DEAD and HEAPTUPLE_DEAD tuples -- any tuple
that is not HEAPTUPLE_LIVE means the page is not all-visible and causes
us to return false.
Therefore, we don't need to call HeapTupleSatisfiesVacuum(), which
includes an extra step to distinguish between dead and recently dead
tuples using OldestXmin. Replace it with the more minimal
HeapTupleSatisfiesVacuumHorizon().
This has the added benefit of making it easier to replace uses of
OldestXmin in heap_page_would_be_all_visible() in the future.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/vacuumlazy.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index d4624010123..5d7ec5c0240 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3585,6 +3585,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
ItemId itemid;
HeapTupleData tuple;
+ TransactionId dead_after;
/*
* Set the offset number so that we can display it along with any
@@ -3624,7 +3625,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
/* Visibility checks may do IO or allocate memory */
Assert(CritSectionCount == 0);
- switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
+ switch (HeapTupleSatisfiesVacuumHorizon(&tuple, buf, &dead_after))
{
case HEAPTUPLE_LIVE:
{
--
2.43.0
[text/x-patch] v34-0008-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch (4.4K, 9-v34-0008-Remove-table_scan_analyze_next_tuple-unneeded-pa.patch)
download | inline diff:
From 85ab0d4eb681eaba4668ee23602d425c27f56d07 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Mon, 22 Dec 2025 10:46:45 -0500
Subject: [PATCH v34 08/14] Remove table_scan_analyze_next_tuple unneeded
parameter OldestXmin
heapam_scan_analyze_next_tuple() doesn't distinguish between dead and
recently dead tuples when counting them, so it doesn't need OldestXmin.
Looking at other table AMs implementing table_scan_analyze_next_tuple(),
it appears most do not use OldestXmin either.
Suggested-by: Kirill Reshke <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CALdSSPjvhGXihT_9f-GJabYU%3D_PjrFDUxYaURuTbfLyQM6TErg%40mail.gmail.com
---
src/backend/access/heap/heapam_handler.c | 8 +++++---
src/backend/commands/analyze.c | 6 +-----
src/include/access/tableam.h | 5 ++---
3 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cbef73e5d4b..da4ae236ca8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1040,7 +1040,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
}
static bool
-heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+heapam_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
@@ -1061,6 +1061,7 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
ItemId itemid;
HeapTuple targtuple = &hslot->base.tupdata;
bool sample_it = false;
+ TransactionId dead_after;
itemid = PageGetItemId(targpage, hscan->rs_cindex);
@@ -1083,8 +1084,9 @@ heapam_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
targtuple->t_data = (HeapTupleHeader) PageGetItem(targpage, itemid);
targtuple->t_len = ItemIdGetLength(itemid);
- switch (HeapTupleSatisfiesVacuum(targtuple, OldestXmin,
- hscan->rs_cbuf))
+ switch (HeapTupleSatisfiesVacuumHorizon(targtuple,
+ hscan->rs_cbuf,
+ &dead_after))
{
case HEAPTUPLE_LIVE:
sample_it = true;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index a483424152c..53adac9139b 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1213,7 +1213,6 @@ acquire_sample_rows(Relation onerel, int elevel,
double rowstoskip = -1; /* -1 means not set yet */
uint32 randseed; /* Seed for block sampler(s) */
BlockNumber totalblocks;
- TransactionId OldestXmin;
BlockSamplerData bs;
ReservoirStateData rstate;
TupleTableSlot *slot;
@@ -1226,9 +1225,6 @@ acquire_sample_rows(Relation onerel, int elevel,
totalblocks = RelationGetNumberOfBlocks(onerel);
- /* Need a cutoff xmin for HeapTupleSatisfiesVacuum */
- OldestXmin = GetOldestNonRemovableTransactionId(onerel);
-
/* Prepare for sampling block numbers */
randseed = pg_prng_uint32(&pg_global_prng_state);
nblocks = BlockSampler_Init(&bs, totalblocks, targrows, randseed);
@@ -1261,7 +1257,7 @@ acquire_sample_rows(Relation onerel, int elevel,
{
vacuum_delay_point(true);
- while (table_scan_analyze_next_tuple(scan, OldestXmin, &liverows, &deadrows, slot))
+ while (table_scan_analyze_next_tuple(scan, &liverows, &deadrows, slot))
{
/*
* The first targrows sample rows are simply copied into the
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index e2ec5289d4d..c9fa9f259cd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -683,7 +683,6 @@ typedef struct TableAmRoutine
* callback).
*/
bool (*scan_analyze_next_tuple) (TableScanDesc scan,
- TransactionId OldestXmin,
double *liverows,
double *deadrows,
TupleTableSlot *slot);
@@ -1714,11 +1713,11 @@ table_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
* tuples.
*/
static inline bool
-table_scan_analyze_next_tuple(TableScanDesc scan, TransactionId OldestXmin,
+table_scan_analyze_next_tuple(TableScanDesc scan,
double *liverows, double *deadrows,
TupleTableSlot *slot)
{
- return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan, OldestXmin,
+ return scan->rs_rd->rd_tableam->scan_analyze_next_tuple(scan,
liverows, deadrows,
slot);
}
--
2.43.0
[text/x-patch] v34-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch (12.4K, 10-v34-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch)
download | inline diff:
From 8d350868206456f631883a40a955dff480e408d3 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 17 Dec 2025 16:51:05 -0500
Subject: [PATCH v34 09/14] Use GlobalVisState in vacuum to determine page
level visibility
During vacuum's first and third phases, we examine tuples' visibility
to determine if we can set the page all-visible in the visibility map.
Previously, this check compared tuple xmins against a single XID chosen at
the start of vacuum (OldestXmin). We now use GlobalVisState, which also
enables future work to set the VM during on-access pruning, since ordinary
queries have access to GlobalVisState but not OldestXmin.
This also benefits vacuum: in some cases, GlobalVisState may advance
during a vacuum, allowing more pages to become considered all-visible.
And, in the future, we could easily add a heuristic to update
GlobalVisState more frequently during vacuums of large tables.
OldestXmin is still used for freezing and as a backstop to ensure we
don't freeze a dead tuple that wasn't yet prunable according to
GlobalVisState in the rare occurrences where GlobalVisState moves
backwards.
Because comparing a transaction ID against GlobalVisState is more
expensive than comparing against a single XID, we defer this check until
after scanning all tuples on the page. If visibility_cutoff_xid was
maintained, we perform the GlobalVisState check only once per page.
This is safe because visibility_cutoff_xid records the newest xmin on
the page; if it is globally visible, then the entire page is all-visible.
This approach may result in examining more tuple xmins than before,
since with OldestXmin we could sometimes rule out the page being
all-visible earlier. However, profiling shows the additional cost is not
significant.
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/heapam_visibility.c | 22 ++++++++
src/backend/access/heap/pruneheap.c | 57 +++++++++++----------
src/backend/access/heap/vacuumlazy.c | 44 ++++++++++------
src/include/access/heapam.h | 4 +-
4 files changed, 85 insertions(+), 42 deletions(-)
diff --git a/src/backend/access/heap/heapam_visibility.c b/src/backend/access/heap/heapam_visibility.c
index 75ae268d753..aee88947393 100644
--- a/src/backend/access/heap/heapam_visibility.c
+++ b/src/backend/access/heap/heapam_visibility.c
@@ -1060,6 +1060,28 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
return res;
}
+/*
+ * Wrapper around GlobalVisTestIsRemovableXid() for use when examining live
+ * tuples. Returns true if the given XID may be considered running by at least
+ * one snapshot.
+ *
+ * This function alone is insufficient to determine tuple visibility; callers
+ * must also consider the XID's commit status. Its purpose is purely semantic:
+ * when applied to live tuples, GlobalVisTestIsRemovableXid() is checking
+ * whether the inserting transaction is still considered running, not whether
+ * the tuple is removable. Live tuples are, by definition, not removable, but
+ * the snapshot criteria for “transaction still running” are identical to
+ * those used for removal XIDs.
+ *
+ * See the comment above GlobalVisTestIsRemovable[Full]Xid() for details on the
+ * required preconditions for calling this function.
+ */
+bool
+GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid)
+{
+ return !GlobalVisTestIsRemovableXid(state, xid);
+}
+
/*
* Work horse for HeapTupleSatisfiesVacuum and similar routines.
*
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 2ba863be07c..c5bf0899c89 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -461,11 +461,12 @@ prune_freeze_setup(PruneFreezeParams *params,
/*
* The visibility cutoff xid is the newest xmin of live, committed tuples
- * older than OldestXmin on the page. This field is only kept up-to-date
- * if the page is all-visible. As soon as a tuple is encountered that is
- * not visible to all, this field is unmaintained. As long as it is
- * maintained, it can be used to calculate the snapshot conflict horizon
- * when updating the VM and/or freezing all the tuples on the page.
+ * on the page older than the visibility horizon represented in the
+ * GlobalVisState. This field is only kept up-to-date if the page is
+ * all-visible. It is invalid if there are any tuples on the page that are
+ * not visible to all. As long as it is maintained, it can be used to
+ * calculate the snapshot conflict horizon when updating the VM and/or
+ * freezing all the tuples on the page.
*/
prstate->visibility_cutoff_xid = InvalidTransactionId;
}
@@ -1077,6 +1078,24 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
prune_freeze_plan(RelationGetRelid(params->relation),
buffer, &prstate, off_loc);
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * amongst them may be considered running by any snapshot, the page cannot
+ * be all-visible.
+ */
+ if (prstate.all_visible &&
+ TransactionIdIsNormal(prstate.visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(prstate.vistest, prstate.visibility_cutoff_xid))
+ {
+ prstate.all_visible = prstate.all_frozen = false;
+
+ /*
+ * We won't try to use this if all_visible is false, but better to be
+ * safe and invalidate it.
+ */
+ prstate.visibility_cutoff_xid = InvalidTransactionId;
+ }
+
/*
* If checksums are enabled, calling heap_prune_satisfies_vacuum() while
* checking tuple visibility information in prune_freeze_plan() may have
@@ -1259,10 +1278,9 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
bool debug_all_frozen;
Assert(prstate.lpdead_items == 0);
- Assert(prstate.cutoffs);
Assert(heap_page_is_all_visible(params->relation, buffer,
- prstate.cutoffs->OldestXmin,
+ prstate.vistest,
&debug_all_frozen,
&debug_cutoff, off_loc));
@@ -1794,28 +1812,15 @@ heap_prune_record_unchanged_lp_normal(Page page, PruneState *prstate, OffsetNumb
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed? A FrozenTransactionId
- * is seen as committed to everyone. Otherwise, we check if
- * there is a snapshot that considers this xid to still be
- * running, and if so, we don't consider the page all-visible.
+ * The inserter definitely committed. But we don't know if it
+ * is old enough that everyone sees it as committed. Later,
+ * after processing all the tuples on the page, we'll check if
+ * there is any snapshot that still considers the newest xid
+ * on the page to be running. If so, we don't consider the
+ * page all-visible.
*/
xmin = HeapTupleHeaderGetXmin(htup);
- /*
- * For now always use prstate->cutoffs for this test, because
- * we only update 'all_visible' and 'all_frozen' when freezing
- * is requested. We could use GlobalVisTestIsRemovableXid
- * instead, if a non-freezing caller wanted to set the VM bit.
- */
- Assert(prstate->cutoffs);
- if (!TransactionIdPrecedes(xmin, prstate->cutoffs->OldestXmin))
- {
- prstate->all_visible = false;
- prstate->all_frozen = false;
- break;
- }
-
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, prstate->visibility_cutoff_xid) &&
TransactionIdIsNormal(xmin))
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 5d7ec5c0240..96ed31c9570 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -457,7 +457,7 @@ static void dead_items_reset(LVRelState *vacrel);
static void dead_items_cleanup(LVRelState *vacrel);
static bool heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -2743,7 +2743,7 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
* done outside the critical section.
*/
if (heap_page_would_be_all_visible(vacrel->rel, buffer,
- vacrel->cutoffs.OldestXmin,
+ vacrel->vistest,
deadoffsets, num_offsets,
&all_frozen, &visibility_cutoff_xid,
&vacrel->offnum))
@@ -3503,14 +3503,14 @@ dead_items_cleanup(LVRelState *vacrel)
*/
bool
heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum)
{
return heap_page_would_be_all_visible(rel, buf,
- OldestXmin,
+ vistest,
NULL, 0,
all_frozen,
visibility_cutoff_xid,
@@ -3531,7 +3531,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
* Returns true if the page is all-visible other than the provided
* deadoffsets and false otherwise.
*
- * OldestXmin is used to determine visibility.
+ * vistest is used to determine visibility.
*
* Output parameters:
*
@@ -3550,7 +3550,7 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
*/
static bool
heap_page_would_be_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
OffsetNumber *deadoffsets,
int ndeadoffsets,
bool *all_frozen,
@@ -3631,7 +3631,7 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
{
TransactionId xmin;
- /* Check comments in lazy_scan_prune. */
+ /* Check heap_prune_record_unchanged_lp_normal comments */
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
{
all_visible = false;
@@ -3640,16 +3640,17 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
/*
- * The inserter definitely committed. But is it old enough
- * that everyone sees it as committed?
+ * The inserter definitely committed. But we don't know if
+ * it is old enough that everyone sees it as committed.
+ * Don't check that now.
+ *
+ * If we scan all tuples without finding one that prevents
+ * the page from being all-visible, we then check whether
+ * any snapshot still considers the newest XID on the page
+ * to be running. In that case, the page is not considered
+ * all-visible.
*/
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
- if (!TransactionIdPrecedes(xmin, OldestXmin))
- {
- all_visible = false;
- *all_frozen = false;
- break;
- }
/* Track newest xmin on page. */
if (TransactionIdFollows(xmin, *visibility_cutoff_xid) &&
@@ -3678,6 +3679,19 @@ heap_page_would_be_all_visible(Relation rel, Buffer buf,
}
} /* scan along page */
+ /*
+ * After processing all the live tuples on the page, if the newest xmin
+ * among them may still be considered running by any snapshot, the page
+ * cannot be all-visible.
+ */
+ if (all_visible &&
+ TransactionIdIsNormal(*visibility_cutoff_xid) &&
+ GlobalVisTestXidMaybeRunning(vistest, *visibility_cutoff_xid))
+ {
+ all_visible = false;
+ *all_frozen = false;
+ }
+
/* Clear the offset information once we have processed the given page. */
*logging_offnum = InvalidOffsetNumber;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a0f7974942e..05ef8d8cd5e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -439,7 +439,7 @@ extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
#ifdef USE_ASSERT_CHECKING
extern bool heap_page_is_all_visible(Relation rel, Buffer buf,
- TransactionId OldestXmin,
+ GlobalVisState *vistest,
bool *all_frozen,
TransactionId *visibility_cutoff_xid,
OffsetNumber *logging_offnum);
@@ -451,6 +451,8 @@ extern TM_Result HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
Buffer buffer);
extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
Buffer buffer);
+
+extern bool GlobalVisTestXidMaybeRunning(GlobalVisState *state, TransactionId xid);
extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer,
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
--
2.43.0
[text/x-patch] v34-0010-Unset-all_visible-sooner-if-not-freezing.patch (2.5K, 11-v34-0010-Unset-all_visible-sooner-if-not-freezing.patch)
download | inline diff:
From 8a0d8c5726f4c5821eb59396c1b48265a474a588 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 14 Oct 2025 15:22:35 -0400
Subject: [PATCH v34 10/14] Unset all_visible sooner if not freezing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In the prune/freeze path, we currently delay clearing all_visible and
all_frozen in the presence of dead items to allow opportunistic
freezing.
However, if no freezing will be attempted, there’s no need to delay.
Clearing the flags earlier avoids extra bookkeeping in
heap_prune_record_unchanged_lp_normal(). This currently has no runtime
effect because all callers that consider setting the VM also prepare
freeze plans, but upcoming changes will allow on-access pruning to set
the VM without freezing. The extra bookkeeping was noticeable in a
profile of on-access VM setting.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/pruneheap.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index c5bf0899c89..cc45728d25e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -1677,8 +1677,13 @@ heap_prune_record_dead(PruneState *prstate, OffsetNumber offnum,
/*
* Deliberately delay unsetting all_visible and all_frozen until later
* during pruning. Removable dead tuples shouldn't preclude freezing the
- * page.
+ * page. If we won't attempt freezing, just unset all-visible now, though.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
@@ -1938,8 +1943,14 @@ heap_prune_record_unchanged_lp_dead(Page page, PruneState *prstate, OffsetNumber
* Similarly, don't unset all_visible and all_frozen until later, at the
* end of heap_page_prune_and_freeze(). This will allow us to attempt to
* freeze the page after pruning. As long as we unset it before updating
- * the visibility map, this will be correct.
+ * the visibility map, this will be correct. If we won't attempt freezing,
+ * though, just unset all_visible and all_frozen now.
*/
+ if (!prstate->attempt_freeze)
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ }
/* Record the dead offset for vacuum */
prstate->deadoffsets[prstate->lpdead_items++] = offnum;
--
2.43.0
[text/x-patch] v34-0011-Track-which-relations-are-modified-by-a-query.patch (2.6K, 12-v34-0011-Track-which-relations-are-modified-by-a-query.patch)
download | inline diff:
From 38c5a2bb61ce9df0035d01a37e1f5e5e806cb5ff Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:07:24 -0500
Subject: [PATCH v34 11/14] Track which relations are modified by a query
Save the relids in a bitmap in the estate. A later commit will pass this
information down to scan nodes to control whether or not the scan allows
setting the visibility map while on-access pruning. We don't want to set
the visibility map if the query is just going to modify the page
immediately after.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/executor/execMain.c | 4 ++++
src/backend/executor/execUtils.c | 2 ++
src/include/nodes/execnodes.h | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..87772de3d33 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -920,6 +920,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
break;
}
+ /* If it has a rowmark, the relation may be modified */
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rc->rti);
+
/* Check that relation is a legal target for marking */
if (relation)
CheckValidRowMarkRel(relation, rc->markType);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index a7955e476f9..ac0ed6c68eb 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -893,6 +893,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
estate->es_result_relations = (ResultRelInfo **)
palloc0(estate->es_range_table_size * sizeof(ResultRelInfo *));
estate->es_result_relations[rti - 1] = resultRelInfo;
+ estate->es_modified_relids = bms_add_member(estate->es_modified_relids,
+ rti);
/*
* Saving in the list allows to avoid needlessly traversing the whole
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f8053d9e572..1e3cd73cf27 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -678,6 +678,12 @@ typedef struct EState
* ExecDoInitialPruning() */
const char *es_sourceText; /* Source text from QueryDesc */
+ /*
+ * RT indexes of relations modified by the query through a
+ * UPDATE/DELETE/INSERT/MERGE or targeted by a SELECT FOR UPDATE.
+ */
+ Bitmapset *es_modified_relids;
+
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
/* If query can insert/delete tuples, the command ID to mark them with */
--
2.43.0
[text/x-patch] v34-0012-Pass-down-information-on-table-modification-to-s.patch (24.8K, 13-v34-0012-Pass-down-information-on-table-modification-to-s.patch)
download | inline diff:
From 8205b2d7da0c3ad3cbc5cead336ced677996b37d Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:12:18 -0500
Subject: [PATCH v34 12/14] Pass down information on table modification to scan
node
Pass down information to sequential scan, index [only] scan, and bitmap
table scan nodes on whether or not the query modifies the relation being
scanned. A later commit will use this information to update the VM
during on-access pruning only if the relation is not modified by the
query.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/4379FDA3-9446-4E2C-9C15-32EFE8D4F31B%40yandex-team.ru
---
contrib/pgrowlocks/pgrowlocks.c | 2 +-
src/backend/access/brin/brin.c | 3 ++-
src/backend/access/gin/gininsert.c | 3 ++-
src/backend/access/heap/heapam_handler.c | 7 +++---
src/backend/access/index/genam.c | 4 ++--
src/backend/access/index/indexam.c | 6 +++---
src/backend/access/nbtree/nbtsort.c | 2 +-
src/backend/access/table/tableam.c | 7 +++---
src/backend/commands/constraint.c | 2 +-
src/backend/commands/copyto.c | 2 +-
src/backend/commands/tablecmds.c | 8 +++----
src/backend/commands/typecmds.c | 4 ++--
src/backend/executor/execIndexing.c | 2 +-
src/backend/executor/execReplication.c | 8 +++----
src/backend/executor/nodeBitmapHeapscan.c | 9 +++++++-
src/backend/executor/nodeIndexonlyscan.c | 9 +++++++-
src/backend/executor/nodeIndexscan.c | 18 ++++++++++++++--
src/backend/executor/nodeSeqscan.c | 26 ++++++++++++++++++++---
src/backend/partitioning/partbounds.c | 2 +-
src/backend/utils/adt/selfuncs.c | 2 +-
src/include/access/genam.h | 2 +-
src/include/access/heapam.h | 6 ++++++
src/include/access/tableam.h | 19 ++++++++++-------
23 files changed, 107 insertions(+), 46 deletions(-)
diff --git a/contrib/pgrowlocks/pgrowlocks.c b/contrib/pgrowlocks/pgrowlocks.c
index f88269332b6..27f01d8055f 100644
--- a/contrib/pgrowlocks/pgrowlocks.c
+++ b/contrib/pgrowlocks/pgrowlocks.c
@@ -114,7 +114,7 @@ pgrowlocks(PG_FUNCTION_ARGS)
RelationGetRelationName(rel));
/* Scan the relation */
- scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scan = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
hscan = (HeapScanDesc) scan;
attinmeta = TupleDescGetAttInMetadata(rsinfo->setDesc);
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 6887e421442..4d9684b1b19 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -2843,7 +2843,8 @@ _brin_parallel_scan_and_build(BrinBuildState *state,
indexInfo->ii_Concurrent = brinshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromBrinShared(brinshared));
+ ParallelTableScanFromBrinShared(brinshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, true,
brinbuildCallbackParallel, state, scan);
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 0d63fb4ba27..f02d6df40a2 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -2059,7 +2059,8 @@ _gin_parallel_scan_and_build(GinBuildState *state,
indexInfo->ii_Concurrent = ginshared->isconcurrent;
scan = table_beginscan_parallel(heap,
- ParallelTableScanFromGinBuildShared(ginshared));
+ ParallelTableScanFromGinBuildShared(ginshared),
+ 0);
reltuples = table_index_build_scan(heap, index, indexInfo, true, progress,
ginBuildCallbackParallel, state, scan);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index da4ae236ca8..6ce4350c2c8 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -79,12 +79,13 @@ heapam_slot_callbacks(Relation relation)
*/
static IndexFetchTableData *
-heapam_index_fetch_begin(Relation rel)
+heapam_index_fetch_begin(Relation rel, uint32 flags)
{
IndexFetchHeapData *hscan = palloc0_object(IndexFetchHeapData);
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
}
@@ -753,7 +754,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +763,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL, 0);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
index a29be6f467b..5ac7d22e49f 100644
--- a/src/backend/access/index/genam.c
+++ b/src/backend/access/index/genam.c
@@ -447,7 +447,7 @@ systable_beginscan(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, irel,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
@@ -708,7 +708,7 @@ systable_beginscan_ordered(Relation heapRelation,
}
sysscan->iscan = index_beginscan(heapRelation, indexRelation,
- snapshot, NULL, nkeys, 0);
+ snapshot, NULL, nkeys, 0, 0);
index_rescan(sysscan->iscan, idxkey, nkeys, NULL, 0);
sysscan->scan = NULL;
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 4ed0508c605..4df56087841 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -257,7 +257,7 @@ index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys)
+ int nkeys, int norderbys, uint32 flags)
{
IndexScanDesc scan;
@@ -284,7 +284,7 @@ index_beginscan(Relation heapRelation,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heapRelation);
+ scan->xs_heapfetch = table_index_fetch_begin(heapRelation, flags);
return scan;
}
@@ -615,7 +615,7 @@ index_beginscan_parallel(Relation heaprel, Relation indexrel,
scan->instrument = instrument;
/* prepare to fetch index matches from table */
- scan->xs_heapfetch = table_index_fetch_begin(heaprel);
+ scan->xs_heapfetch = table_index_fetch_begin(heaprel, 0);
return scan;
}
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 90ab4e91b56..8ae54217f36 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -1925,7 +1925,7 @@ _bt_parallel_scan_and_sort(BTSpool *btspool, BTSpool *btspool2,
indexInfo = BuildIndexInfo(btspool->index);
indexInfo->ii_Concurrent = btshared->isconcurrent;
scan = table_beginscan_parallel(btspool->heap,
- ParallelTableScanFromBTShared(btshared));
+ ParallelTableScanFromBTShared(btshared), 0);
reltuples = table_index_build_scan(btspool->heap, btspool->index, indexInfo,
true, progress, _bt_build_callback,
&buildstate, scan);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 87491796523..2ff29b6e40b 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -163,10 +163,11 @@ table_parallelscan_initialize(Relation rel, ParallelTableScanDesc pscan,
}
TableScanDesc
-table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan)
+table_beginscan_parallel(Relation relation, ParallelTableScanDesc pscan, uint32 flags)
{
Snapshot snapshot;
- uint32 flags = SO_TYPE_SEQSCAN |
+
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
Assert(RelFileLocatorEquals(relation->rd_locator, pscan->phs_locator));
@@ -248,7 +249,7 @@ table_index_fetch_tuple_check(Relation rel,
bool found;
slot = table_slot_create(rel, NULL);
- scan = table_index_fetch_begin(rel);
+ scan = table_index_fetch_begin(rel, 0);
found = table_index_fetch_tuple(scan, tid, snapshot, slot, &call_again,
all_dead);
table_index_fetch_end(scan);
diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index cc11c47b6f2..37cfbd63938 100644
--- a/src/backend/commands/constraint.c
+++ b/src/backend/commands/constraint.c
@@ -106,7 +106,7 @@ unique_key_recheck(PG_FUNCTION_ARGS)
*/
tmptid = checktid;
{
- IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation);
+ IndexFetchTableData *scan = table_index_fetch_begin(trigdata->tg_relation, 0);
bool call_again = false;
if (!table_index_fetch_tuple(scan, &tmptid, SnapshotSelf, slot,
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4ab4a3893d5..4261baf4a41 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1156,7 +1156,7 @@ CopyRelationTo(CopyToState cstate, Relation rel, Relation root_rel, uint64 *proc
AttrMap *map = NULL;
TupleTableSlot *root_slot = NULL;
- scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL);
+ scandesc = table_beginscan(rel, GetActiveSnapshot(), 0, NULL, 0);
slot = table_slot_create(rel, NULL);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f976c0e5c7e..eb35dbbc853 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6378,7 +6378,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
* checking all the constraints.
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(oldrel, snapshot, 0, NULL);
+ scan = table_beginscan(oldrel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -13768,7 +13768,7 @@ validateForeignKeyConstraint(char *conname,
*/
snapshot = RegisterSnapshot(GetLatestSnapshot());
slot = table_slot_create(rel, NULL);
- scan = table_beginscan(rel, snapshot, 0, NULL);
+ scan = table_beginscan(rel, snapshot, 0, NULL, 0);
perTupCxt = AllocSetContextCreate(CurrentMemoryContext,
"validateForeignKeyConstraint",
@@ -22626,7 +22626,7 @@ MergePartitionsMoveRows(List **wqueue, List *mergingPartitions, Relation newPart
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(mergingPartition, snapshot, 0, NULL);
+ scan = table_beginscan(mergingPartition, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
@@ -23090,7 +23090,7 @@ SplitPartitionMoveRows(List **wqueue, Relation rel, Relation splitRel,
/* Scan through the rows. */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(splitRel, snapshot, 0, NULL);
+ scan = table_beginscan(splitRel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index 288edb25f2f..34ff8d041f1 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -3179,7 +3179,7 @@ validateDomainNotNullConstraint(Oid domainoid)
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
@@ -3260,7 +3260,7 @@ validateDomainCheckConstraint(Oid domainoid, const char *ccbin, LOCKMODE lockmod
/* Scan all tuples in this relation */
snapshot = RegisterSnapshot(GetLatestSnapshot());
- scan = table_beginscan(testrel, snapshot, 0, NULL);
+ scan = table_beginscan(testrel, snapshot, 0, NULL, 0);
slot = table_slot_create(testrel, NULL);
while (table_scan_getnextslot(scan, ForwardScanDirection, slot))
{
diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 6ae0f959592..6d3e9d2f311 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -816,7 +816,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
retry:
conflict = false;
found_self = false;
- index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0);
+ index_scan = index_beginscan(heap, index, &DirtySnapshot, NULL, indnkeyatts, 0, 0);
index_rescan(index_scan, scankeys, indnkeyatts, NULL, 0);
while (index_getnext_slot(index_scan, ForwardScanDirection, existing_slot))
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 72f2bff7708..4c965a35e05 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -205,7 +205,7 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
skey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
/* Start an index scan. */
- scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, &snap, NULL, skey_attoff, 0, 0);
retry:
found = false;
@@ -383,7 +383,7 @@ RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
/* Start a heap scan. */
InitDirtySnapshot(snap);
- scan = table_beginscan(rel, &snap, 0, NULL);
+ scan = table_beginscan(rel, &snap, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
retry:
@@ -602,7 +602,7 @@ RelationFindDeletedTupleInfoSeq(Relation rel, TupleTableSlot *searchslot,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = table_beginscan(rel, SnapshotAny, 0, NULL);
+ scan = table_beginscan(rel, SnapshotAny, 0, NULL, 0);
scanslot = table_slot_create(rel, NULL);
table_rescan(scan, NULL);
@@ -666,7 +666,7 @@ RelationFindDeletedTupleInfoByIndex(Relation rel, Oid idxoid,
* not yet committed or those just committed prior to the scan are
* excluded in update_most_recent_deletion_info().
*/
- scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0);
+ scan = index_beginscan(rel, idxrel, SnapshotAny, NULL, skey_attoff, 0, 0);
index_rescan(scan, skey, skey_attoff, NULL, 0);
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index c68c26cbf38..1017676fce0 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -103,11 +103,18 @@ BitmapTableScanSetup(BitmapHeapScanState *node)
*/
if (!node->ss.ss_currentScanDesc)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
table_beginscan_bm(node->ss.ss_currentRelation,
node->ss.ps.state->es_snapshot,
0,
- NULL);
+ NULL,
+ flags);
}
node->ss.ss_currentScanDesc->st.rs_tbmiterator = tbmiterator;
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index c2d09374517..2fe724a323f 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -84,6 +84,12 @@ IndexOnlyNext(IndexOnlyScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index only scan is not parallel, or if we're
* serially executing an index only scan that was planned to be
@@ -94,7 +100,8 @@ IndexOnlyNext(IndexOnlyScanState *node)
estate->es_snapshot,
&node->ioss_Instrument,
node->ioss_NumScanKeys,
- node->ioss_NumOrderByKeys);
+ node->ioss_NumOrderByKeys,
+ flags);
node->ioss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index a616abff04c..8730dab7469 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -102,6 +102,12 @@ IndexNext(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -111,7 +117,8 @@ IndexNext(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
@@ -198,6 +205,12 @@ IndexNextWithReorder(IndexScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the index scan is not parallel, or if we're
* serially executing an index scan that was planned to be parallel.
@@ -207,7 +220,8 @@ IndexNextWithReorder(IndexScanState *node)
estate->es_snapshot,
&node->iss_Instrument,
node->iss_NumScanKeys,
- node->iss_NumOrderByKeys);
+ node->iss_NumOrderByKeys,
+ flags);
node->iss_ScanDesc = scandesc;
diff --git a/src/backend/executor/nodeSeqscan.c b/src/backend/executor/nodeSeqscan.c
index af3c788ce8b..336354922a2 100644
--- a/src/backend/executor/nodeSeqscan.c
+++ b/src/backend/executor/nodeSeqscan.c
@@ -65,13 +65,20 @@ SeqNext(SeqScanState *node)
if (scandesc == NULL)
{
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
/*
* We reach here if the scan is not parallel, or if we're serially
* executing a scan that was planned to be parallel.
*/
scandesc = table_beginscan(node->ss.ss_currentRelation,
estate->es_snapshot,
- 0, NULL);
+ 0, NULL, flags);
+
node->ss.ss_currentScanDesc = scandesc;
}
@@ -367,14 +374,20 @@ ExecSeqScanInitializeDSM(SeqScanState *node,
{
EState *estate = node->ss.ps.state;
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
pscan = shm_toc_allocate(pcxt->toc, node->pscan_len);
table_parallelscan_initialize(node->ss.ss_currentRelation,
pscan,
estate->es_snapshot);
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pscan);
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ estate->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
+
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation, pscan,
+ flags);
}
/* ----------------------------------------------------------------
@@ -404,8 +417,15 @@ ExecSeqScanInitializeWorker(SeqScanState *node,
ParallelWorkerContext *pwcxt)
{
ParallelTableScanDesc pscan;
+ uint32 flags = 0;
+
+ if (!bms_is_member(((Scan *) node->ss.ps.plan)->scanrelid,
+ node->ss.ps.state->es_modified_relids))
+ flags = SO_HINT_REL_READ_ONLY;
pscan = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
node->ss.ss_currentScanDesc =
- table_beginscan_parallel(node->ss.ss_currentRelation, pscan);
+ table_beginscan_parallel(node->ss.ss_currentRelation,
+ pscan,
+ flags);
}
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 0ca312ac27d..b7c4e6d1071 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -3362,7 +3362,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
econtext = GetPerTupleExprContext(estate);
snapshot = RegisterSnapshot(GetLatestSnapshot());
tupslot = table_slot_create(part_rel, &estate->es_tupleTable);
- scan = table_beginscan(part_rel, snapshot, 0, NULL);
+ scan = table_beginscan(part_rel, snapshot, 0, NULL, 0);
/*
* Switch to per-tuple memory context and reset it for each tuple
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec655593..ac181853225 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7181,7 +7181,7 @@ get_actual_variable_endpoint(Relation heapRel,
index_scan = index_beginscan(heapRel, indexRel,
&SnapshotNonVacuumable, NULL,
- 1, 0);
+ 1, 0, 0);
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4c0429cc613..3934fa44793 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -156,7 +156,7 @@ extern IndexScanDesc index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
- int nkeys, int norderbys);
+ int nkeys, int norderbys, uint32 flags);
extern IndexScanDesc index_beginscan_bitmap(Relation indexRelation,
Snapshot snapshot,
IndexScanInstrumentation *instrument,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 05ef8d8cd5e..6d54781609a 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -119,6 +119,12 @@ typedef struct IndexFetchHeapData
Buffer xs_cbuf; /* current heap buffer in scan, if any */
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+
+ /*
+ * Some optimizations can only be performed if the query does not modify
+ * the underlying relation. Track that here.
+ */
+ bool modifies_base_rel;
} IndexFetchHeapData;
/* Result codes for HeapTupleSatisfiesVacuum */
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index c9fa9f259cd..fb2a54b010d 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -63,6 +63,8 @@ typedef enum ScanOptions
/* unregister snapshot at scan end? */
SO_TEMP_SNAPSHOT = 1 << 9,
+ /* set if the query doesn't modify the relation */
+ SO_HINT_REL_READ_ONLY = 1 << 10,
} ScanOptions;
/*
@@ -420,7 +422,7 @@ typedef struct TableAmRoutine
*
* Tuples for an index scan can then be fetched via index_fetch_tuple.
*/
- struct IndexFetchTableData *(*index_fetch_begin) (Relation rel);
+ struct IndexFetchTableData *(*index_fetch_begin) (Relation rel, uint32 flags);
/*
* Reset index fetch. Typically this will release cross index fetch
@@ -873,9 +875,9 @@ extern TupleTableSlot *table_slot_create(Relation relation, List **reglist);
*/
static inline TableScanDesc
table_beginscan(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_SEQSCAN |
+ flags |= SO_TYPE_SEQSCAN |
SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key, NULL, flags);
@@ -918,9 +920,9 @@ table_beginscan_strat(Relation rel, Snapshot snapshot,
*/
static inline TableScanDesc
table_beginscan_bm(Relation rel, Snapshot snapshot,
- int nkeys, ScanKeyData *key)
+ int nkeys, ScanKeyData *key, uint32 flags)
{
- uint32 flags = SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
+ flags |= SO_TYPE_BITMAPSCAN | SO_ALLOW_PAGEMODE;
return rel->rd_tableam->scan_begin(rel, snapshot, nkeys, key,
NULL, flags);
@@ -1127,7 +1129,8 @@ extern void table_parallelscan_initialize(Relation rel,
* Caller must hold a suitable lock on the relation.
*/
extern TableScanDesc table_beginscan_parallel(Relation relation,
- ParallelTableScanDesc pscan);
+ ParallelTableScanDesc pscan,
+ uint32 flags);
/*
* Begin a parallel tid range scan. `pscan` needs to have been initialized
@@ -1163,9 +1166,9 @@ table_parallelscan_reinitialize(Relation rel, ParallelTableScanDesc pscan)
* Tuples for an index scan can then be fetched via table_index_fetch_tuple().
*/
static inline IndexFetchTableData *
-table_index_fetch_begin(Relation rel)
+table_index_fetch_begin(Relation rel, uint32 flags)
{
- return rel->rd_tableam->index_fetch_begin(rel);
+ return rel->rd_tableam->index_fetch_begin(rel, flags);
}
/*
--
2.43.0
[text/x-patch] v34-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch (10.9K, 14-v34-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch)
download | inline diff:
From cef0d7f13467597bdec9dacfb1d586b857c37137 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 3 Dec 2025 15:24:08 -0500
Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-visible
Many queries do not modify the underlying relation. For such queries, if
on-access pruning occurs during the scan, we can check whether the page
has become all-visible and update the visibility map accordingly.
Previously, only vacuum and COPY FREEZE marked pages as all-visible or
all-frozen.
This commit implements on-access VM setting for sequential scans as well
as for the underlying heap relation in index scans and bitmap heap
scans.
Author: Melanie Plageman <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Discussion: https://postgr.es/m/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com
---
src/backend/access/heap/heapam.c | 15 +++++++-
src/backend/access/heap/heapam_handler.c | 15 +++++++-
src/backend/access/heap/pruneheap.c | 38 ++++++++++++++++++-
src/include/access/heapam.h | 24 ++++++++++--
.../t/035_standby_logical_decoding.pl | 3 +-
5 files changed, 87 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1a3fa8a76aa..7d22549a290 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -617,6 +617,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
Buffer buffer = scan->rs_cbuf;
BlockNumber block = scan->rs_cblock;
Snapshot snapshot;
+ Buffer *vmbuffer = NULL;
Page page;
int lines;
bool all_visible;
@@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
+ if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &scan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
@@ -1308,6 +1311,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
sizeof(TBMIterateResult));
}
+ scan->rs_vmbuffer = InvalidBuffer;
return (TableScanDesc) scan;
}
@@ -1346,6 +1350,12 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params,
scan->rs_cbuf = InvalidBuffer;
}
+ if (BufferIsValid(scan->rs_vmbuffer))
+ {
+ ReleaseBuffer(scan->rs_vmbuffer);
+ scan->rs_vmbuffer = InvalidBuffer;
+ }
+
/*
* SO_TYPE_BITMAPSCAN would be cleaned up here, but it does not hold any
* additional data vs a normal HeapScan
@@ -1378,6 +1388,9 @@ heap_endscan(TableScanDesc sscan)
if (BufferIsValid(scan->rs_cbuf))
ReleaseBuffer(scan->rs_cbuf);
+ if (BufferIsValid(scan->rs_vmbuffer))
+ ReleaseBuffer(scan->rs_vmbuffer);
+
/*
* Must free the read stream before freeing the BufferAccessStrategy.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6ce4350c2c8..c9e67514aea 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -85,6 +85,7 @@ heapam_index_fetch_begin(Relation rel, uint32 flags)
hscan->xs_base.rel = rel;
hscan->xs_cbuf = InvalidBuffer;
+ hscan->xs_vmbuffer = InvalidBuffer;
hscan->modifies_base_rel = !(flags & SO_HINT_REL_READ_ONLY);
return &hscan->xs_base;
@@ -100,6 +101,12 @@ heapam_index_fetch_reset(IndexFetchTableData *scan)
ReleaseBuffer(hscan->xs_cbuf);
hscan->xs_cbuf = InvalidBuffer;
}
+
+ if (BufferIsValid(hscan->xs_vmbuffer))
+ {
+ ReleaseBuffer(hscan->xs_vmbuffer);
+ hscan->xs_vmbuffer = InvalidBuffer;
+ }
}
static void
@@ -139,7 +146,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
* Prune page, but only if we weren't already on this page
*/
if (prev_buf != hscan->xs_cbuf)
- heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf);
+ heap_page_prune_opt(hscan->xs_base.rel, hscan->xs_cbuf,
+ hscan->modifies_base_rel ? NULL : &hscan->xs_vmbuffer);
}
/* Obtain share-lock on the buffer so we can examine visibility */
@@ -2488,6 +2496,7 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
TBMIterateResult *tbmres;
OffsetNumber offsets[TBM_MAX_TUPLES_PER_PAGE];
int noffsets = -1;
+ Buffer *vmbuffer = NULL;
Assert(scan->rs_flags & SO_TYPE_BITMAPSCAN);
Assert(hscan->rs_read_stream);
@@ -2534,7 +2543,9 @@ BitmapHeapScanNextBlock(TableScanDesc scan,
/*
* Prune and repair fragmentation for the whole page, if possible.
*/
- heap_page_prune_opt(scan->rs_rd, buffer);
+ if (scan->rs_flags & SO_HINT_REL_READ_ONLY)
+ vmbuffer = &hscan->rs_vmbuffer;
+ heap_page_prune_opt(scan->rs_rd, buffer, vmbuffer);
/*
* We must hold share lock on the buffer content while examining tuple
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index cc45728d25e..59f647b7f77 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -202,6 +202,8 @@ static bool heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits);
@@ -223,9 +225,13 @@ static TransactionId get_conflict_xid(bool do_prune, bool do_freeze, bool do_set
* if there's not any use in pruning.
*
* Caller must have pin on the buffer, and must *not* have a lock on it.
+ *
+ * If vmbuffer is not NULL, it is okay for pruning to set the visibility map if
+ * the page is all-visible. We will take care of pinning and, if needed,
+ * reading in the page of the visibility map.
*/
void
-heap_page_prune_opt(Relation relation, Buffer buffer)
+heap_page_prune_opt(Relation relation, Buffer buffer, Buffer *vmbuffer)
{
Page page = BufferGetPage(buffer);
TransactionId prune_xid;
@@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
.cutoffs = NULL,
};
+ if (vmbuffer)
+ {
+ visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+ params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
+ params.vmbuffer = *vmbuffer;
+ }
+
heap_page_prune_and_freeze(¶ms, &presult, &dummy_off_loc,
NULL, NULL);
@@ -952,6 +965,9 @@ identify_and_fix_vm_corruption(Relation rel, Buffer heap_buffer,
* corrupted, it will fix them by clearing the VM bits and visibility hint.
* This does not need to be done in a critical section.
*
+ * This should be called only after do_freeze has been decided (and do_prune
+ * has been set), as these factor into our heuristic-based decision.
+ *
* Returns true if one or both VM bits should be set, along with returning the
* current value of the VM bits in *old_vmbits and the desired new value of
* the VM bits in *new_vmbits.
@@ -965,6 +981,8 @@ heap_page_will_set_vm(PruneState *prstate,
Relation relation,
BlockNumber heap_blk, Buffer heap_buffer, Page heap_page,
Buffer vmbuffer,
+ PruneReason reason,
+ bool do_prune, bool do_freeze,
int nlpdead_items,
uint8 *old_vmbits,
uint8 *new_vmbits)
@@ -986,6 +1004,22 @@ heap_page_will_set_vm(PruneState *prstate,
if (!prstate->all_visible)
return false;
+ /*
+ * If this is an on-access call and we're not actually pruning, avoid
+ * setting the visibility map if it would newly dirty the heap page or, if
+ * the page is already dirty, if doing so would require including a
+ * full-page image (FPI) of the heap page in the WAL. This situation
+ * should be rare, as on-access pruning is only attempted when
+ * pd_prune_xid is valid.
+ */
+ if (reason == PRUNE_ON_ACCESS && !do_prune && !do_freeze &&
+ (!BufferIsDirty(heap_buffer) || XLogCheckBufferNeedsBackup(heap_buffer)))
+ {
+ prstate->all_visible = false;
+ prstate->all_frozen = false;
+ return false;
+ }
+
*new_vmbits = VISIBILITYMAP_ALL_VISIBLE;
if (prstate->all_frozen)
@@ -1155,6 +1189,8 @@ heap_page_prune_and_freeze(PruneFreezeParams *params,
buffer,
page,
vmbuffer,
+ params->reason,
+ do_prune, do_freeze,
prstate.lpdead_items,
&old_vmbits,
&new_vmbits);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 6d54781609a..55ac478dd67 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -95,6 +95,13 @@ typedef struct HeapScanDescData
*/
ParallelBlockTableScanWorkerData *rs_parallelworkerdata;
+ /*
+ * For sequential scans and bitmap heap scans. If the relation is not
+ * being modified, on-access pruning may read in the current heap page's
+ * corresponding VM block to this buffer.
+ */
+ Buffer rs_vmbuffer;
+
/* these fields only used in page-at-a-time mode and for bitmap scans */
uint32 rs_cindex; /* current tuple's index in vistuples */
uint32 rs_ntuples; /* number of visible tuples on page */
@@ -117,8 +124,18 @@ typedef struct IndexFetchHeapData
{
IndexFetchTableData xs_base; /* AM independent part of the descriptor */
- Buffer xs_cbuf; /* current heap buffer in scan, if any */
- /* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
+ /*
+ * Current heap buffer in scan, if any. NB: if xs_cbuf is not
+ * InvalidBuffer, we hold a pin on that buffer.
+ */
+ Buffer xs_cbuf;
+
+ /*
+ * For index scans that do not modify the underlying heap table, on-access
+ * pruning may read in the current heap page's corresponding VM block to
+ * this buffer.
+ */
+ Buffer xs_vmbuffer;
/*
* Some optimizations can only be performed if the query does not modify
@@ -419,7 +436,8 @@ extern TransactionId heap_index_delete_tuples(Relation rel,
TM_IndexDeleteOp *delstate);
/* in heap/pruneheap.c */
-extern void heap_page_prune_opt(Relation relation, Buffer buffer);
+extern void heap_page_prune_opt(Relation relation, Buffer buffer,
+ Buffer *vmbuffer);
extern void heap_page_prune_and_freeze(PruneFreezeParams *params,
PruneFreezeResult *presult,
OffsetNumber *off_loc,
diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl
index d264a698ff6..a5536ba4ff6 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -296,6 +296,7 @@ wal_level = 'logical'
max_replication_slots = 4
max_wal_senders = 4
autovacuum = off
+hot_standby_feedback = on
});
$node_primary->dump_info;
$node_primary->start;
@@ -748,7 +749,7 @@ check_pg_recvlogical_stderr($handle,
$logstart = -s $node_standby->logfile;
reactive_slots_change_hfs_and_wait_for_xmins('shared_row_removal_',
- 'no_conflict_', 0, 1);
+ 'no_conflict_', 1, 0);
# This should not trigger a conflict
wait_until_vacuum_can_remove(
--
2.43.0
[text/x-patch] v34-0014-Set-pd_prune_xid-on-insert.patch (6.8K, 15-v34-0014-Set-pd_prune_xid-on-insert.patch)
download | inline diff:
From ba527a1cf0fff9f2b80d64d9d0f80888c6e5db66 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Tue, 29 Jul 2025 16:12:56 -0400
Subject: [PATCH v34 14/14] Set pd_prune_xid on insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Now that visibility map (VM) updates can occur during read-only queries,
it makes sense to also set the page’s pd_prune_xid hint during inserts.
This enables heap_page_prune_and_freeze() to run after a page is
filled with newly inserted tuples the first time it is read.
This change also addresses a long-standing note in heap_insert() and
heap_multi_insert(), which observed that setting pd_prune_xid would
help clean up aborted insertions sooner. Without it, such tuples might
linger until VACUUM, whereas now they can be pruned earlier.
The index killtuples test had to be updated to reflect a larger number
of hits by some accesses. Since the prune_xid is set by the fill/insert
step, on-access pruning can happen during the first access step (before
the DELETE). This is when the VM is extended. After the DELETE, the next
access hits the VM block instead of extending it. Thus, an additional
buffer hit is counted for the table.
Reviewed-by: Chao Li <[email protected]>
---
src/backend/access/heap/heapam.c | 25 +++++++++++++------
src/backend/access/heap/heapam_xlog.c | 15 ++++++++++-
.../modules/index/expected/killtuples.out | 6 ++---
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7d22549a290..afafc7e2cb2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2166,6 +2166,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
+ Page page;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
@@ -2225,15 +2226,19 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
}
/*
- * XXX Should we set PageSetPrunable on this page ?
+ * Set pd_prune_xid to trigger heap_page_prune_and_freeze() once the page
+ * is full so that we can set the page all-visible in the VM.
*
- * The inserting transaction may eventually abort thus making this tuple
- * DEAD and hence available for pruning. Though we don't want to optimize
- * for aborts, if no other tuple in this page is UPDATEd/DELETEd, the
- * aborted tuple will never be pruned until next vacuum is triggered.
+ * Setting pd_prune_xid is also handy if the inserting transaction
+ * eventually aborts making this tuple DEAD and hence available for
+ * pruning. If no other tuple in this page is UPDATEd/DELETEd, the aborted
+ * tuple would never otherwise be pruned until next vacuum is triggered.
*
- * If you do add PageSetPrunable here, add it in heap_xlog_insert too.
+ * Don't set it if we are in bootstrap mode, though.
*/
+ page = BufferGetPage(buffer);
+ if (TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
@@ -2243,7 +2248,6 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
xl_heap_insert xlrec;
xl_heap_header xlhdr;
XLogRecPtr recptr;
- Page page = BufferGetPage(buffer);
uint8 info = XLOG_HEAP_INSERT;
int bufflags = 0;
@@ -2607,8 +2611,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
}
/*
- * XXX Should we set PageSetPrunable on this page ? See heap_insert()
+ * Set pd_prune_xid. See heap_insert() for more on why we do this when
+ * inserting tuples. This only makes sense if we aren't already
+ * setting the page frozen in the VM. We also don't set it in
+ * bootstrap mode.
*/
+ if (!all_frozen_set && TransactionIdIsNormal(xid))
+ PageSetPrunable(page, xid);
MarkBufferDirty(buffer);
diff --git a/src/backend/access/heap/heapam_xlog.c b/src/backend/access/heap/heapam_xlog.c
index 9a29fda3601..49cc83a6479 100644
--- a/src/backend/access/heap/heapam_xlog.c
+++ b/src/backend/access/heap/heapam_xlog.c
@@ -447,6 +447,12 @@ heap_xlog_insert(XLogReaderState *record)
freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ /*
+ * Set the page prunable to trigger on-access pruning later which may
+ * set the page all-visible in the VM.
+ */
+ PageSetPrunable(page, XLogRecGetXid(record));
+
PageSetLSN(page, lsn);
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
@@ -596,9 +602,16 @@ heap_xlog_multi_insert(XLogReaderState *record)
if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
- /* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+ /*
+ * XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible. If
+ * we are not setting the page frozen, then set the page's prunable
+ * hint so that we trigger on-access pruning later which may set the
+ * page all-visible in the VM.
+ */
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);
+ else
+ PageSetPrunable(page, XLogRecGetXid(record));
MarkBufferDirty(buffer);
}
diff --git a/src/test/modules/index/expected/killtuples.out b/src/test/modules/index/expected/killtuples.out
index be7ddd756ef..b29f2434b00 100644
--- a/src/test/modules/index/expected/killtuples.out
+++ b/src/test/modules/index/expected/killtuples.out
@@ -54,7 +54,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -130,7 +130,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
@@ -283,7 +283,7 @@ step flush: SELECT FROM pg_stat_force_next_flush();
step result: SELECT heap_blks_read + heap_blks_hit - counter.heap_accesses AS new_heap_accesses FROM counter, pg_statio_all_tables WHERE relname = 'kill_prior_tuple';
new_heap_accesses
-----------------
- 1
+ 2
(1 row)
step measure: UPDATE counter SET heap_accesses = (SELECT heap_blks_read + heap_blks_hit FROM pg_statio_all_tables WHERE relname = 'kill_prior_tuple');
--
2.43.0
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-01-29 05:00 ` Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
1 sibling, 1 reply; 17+ messages in thread
From: Alexander Lakhin @ 2026-01-29 05:00 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
Hello Melanie,
29.01.2026 01:16, Melanie Plageman wrote:
> Thanks for the review!
> I pushed v33 0001-0003 after incorporating your feedback.
The buildfarm animal scorpion has detected an instability of the addition
to pg_visibility from 21796c267 [1]:
80/82 postgresql:pg_visibility-running / pg_visibility-running/regress ERROR 7.23s exit status 1
diff -U3 /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out
/home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out
--- /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out 2026-01-26 22:07:12.923378464
+0100
+++ /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out
2026-01-28 20:15:13.802517085 +0100
@@ -213,7 +213,7 @@
select pg_visibility_map_summary('test_vac_unmodified_heap');
pg_visibility_map_summary
---------------------------
- (1,1)
+ (0,0)
(1 row)
-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
@@ -237,7 +237,7 @@
FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
?column?
----------
- t
+ f
(1 row)
-- vacuum sets the VM
I've managed to reproduce it locally with the attached and:
echo "autovacuum_naptime = 1" > /tmp/temp.config
TEMP_CONFIG=/tmp/temp.config make -s check -C contrib/pg_visibility/
...
ok 85 - pg_visibility 30 ms
not ok 86 - pg_visibility 165 ms
ok 87 - pg_visibility 36 ms
...
# 1 of 100 tests failed.
Could you please look at this?
Probably you'll find [2] helpful.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-01-28%2019%3A07%3A32
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1c64d2fcb
Best regards,
Alexander
Attachments:
[text/x-patch] debug-pg_visibility.patch (1.6K, 3-debug-pg_visibility.patch)
download | inline diff:
diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
index e5a74f32c48..8e3434a808e 100644
--- a/contrib/pg_visibility/Makefile
+++ b/contrib/pg_visibility/Makefile
@@ -11,7 +11,7 @@ DATA = pg_visibility--1.1.sql pg_visibility--1.1--1.2.sql \
PGFILEDESC = "pg_visibility - page visibility information"
EXTRA_INSTALL = contrib/pageinspect
-REGRESS = pg_visibility
+REGRESS = $(shell printf 'pg_visibility %.0s' `seq 100`)
TAP_TESTS = 1
ifdef USE_PGXS
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index e10f1706015..1e444558ac8 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,5 +1,8 @@
-CREATE EXTENSION pg_visibility;
-CREATE EXTENSION pageinspect;
+SET client_min_messages TO 'warning';
+CREATE EXTENSION IF NOT EXISTS pg_visibility;
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+DROP TABLE IF EXISTS test_vac_unmodified_heap;
+RESET client_min_messages;
--
-- recently-dropped table
--
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 57af8a0c5b6..78fd52a0b73 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,5 +1,8 @@
-CREATE EXTENSION pg_visibility;
-CREATE EXTENSION pageinspect;
+SET client_min_messages TO 'warning';
+CREATE EXTENSION IF NOT EXISTS pg_visibility;
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+DROP TABLE IF EXISTS test_vac_unmodified_heap;
+RESET client_min_messages;
--
-- recently-dropped table
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
@ 2026-01-29 13:39 ` Kirill Reshke <[email protected]>
2026-01-29 13:51 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
0 siblings, 2 replies; 17+ messages in thread
From: Kirill Reshke @ 2026-01-29 13:39 UTC (permalink / raw)
To: Alexander Lakhin <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Thu, 29 Jan 2026 at 10:00, Alexander Lakhin <[email protected]> wrote:
>
> Hello Melanie,
>
> 29.01.2026 01:16, Melanie Plageman wrote:
>
> Thanks for the review!
> I pushed v33 0001-0003 after incorporating your feedback.
>
>
> The buildfarm animal scorpion has detected an instability of the addition
> to pg_visibility from 21796c267 [1]:
>
> 80/82 postgresql:pg_visibility-running / pg_visibility-running/regress ERROR 7.23s exit status 1
>
> diff -U3 /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out
> --- /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out 2026-01-26 22:07:12.923378464 +0100
> +++ /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out 2026-01-28 20:15:13.802517085 +0100
> @@ -213,7 +213,7 @@
> select pg_visibility_map_summary('test_vac_unmodified_heap');
> pg_visibility_map_summary
> ---------------------------
> - (1,1)
> + (0,0)
> (1 row)
>
> -- the checkpoint cleans the buffer dirtied by freezing the sole tuple
> @@ -237,7 +237,7 @@
> FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
> ?column?
> ----------
> - t
> + f
> (1 row)
>
> -- vacuum sets the VM
>
> I've managed to reproduce it locally with the attached and:
> echo "autovacuum_naptime = 1" > /tmp/temp.config
> TEMP_CONFIG=/tmp/temp.config make -s check -C contrib/pg_visibility/
> ...
> ok 85 - pg_visibility 30 ms
> not ok 86 - pg_visibility 165 ms
> ok 87 - pg_visibility 36 ms
> ...
> # 1 of 100 tests failed.
>
> Could you please look at this?
>
> Probably you'll find [2] helpful.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-01-28%2019%3A07%3A32
> [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1c64d2fcb
>
> Best regards,
> Alexander
Thanks Alexander!
This is a good and detailed report, I was able to reproduce this.
I have added some logs to my copy of postgres with your patch and I
think problem causing this test to fail is this sequence:
1) Autovacuum starts, does its deeds, and acquiring xid = 118518
2) insert into test_vac_unmodified_heap values (1); executes and
commits with xid = 118519 (from my log)
3) vacuum freeze starts and computes cutoff xid = 118518, because
oldest xmin is 118518 from (1)
*and we cannot freeze tuple*
```
2026-01-29 13:27:44.559 UTC [133670] DEBUG: CommitTransaction(1)
name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid:
118519/1/0 (used)
...
2026-01-29 13:27:44.559 UTC [133672] DEBUG: CommitTransaction(1)
name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid:
118518/1/2
...
2026-01-29 13:27:44.560 UTC [133670] INFO: finished vacuuming
"contrib_regression.public.test_vac_unmodified_heap": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0
eagerly scanned
tuples: 0 removed, 1 remain, 0 are dead but not yet removable
removable cutoff: 118518, which was 2 XIDs old when operation ended
new relfrozenxid: 118518, which is 1 XIDs ahead of previous value
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set
all-frozen (0 were all-visible)
index scan not needed: 0 pages from table (0.00% of total) had
0 dead item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 54.253 MB/s
buffer usage: 22 hits, 0 reads, 3 dirtied
```
I did not come up with a fix yet though.
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-01-29 13:51 ` Kirill Reshke <[email protected]>
1 sibling, 0 replies; 17+ messages in thread
From: Kirill Reshke @ 2026-01-29 13:51 UTC (permalink / raw)
To: Alexander Lakhin <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Thu, 29 Jan 2026 at 18:39, Kirill Reshke <[email protected]> wrote:
>
> On Thu, 29 Jan 2026 at 10:00, Alexander Lakhin <[email protected]> wrote:
> >
> > Hello Melanie,
> >
> > 29.01.2026 01:16, Melanie Plageman wrote:
> >
> > Thanks for the review!
> > I pushed v33 0001-0003 after incorporating your feedback.
> >
> >
> > The buildfarm animal scorpion has detected an instability of the addition
> > to pg_visibility from 21796c267 [1]:
> >
> > 80/82 postgresql:pg_visibility-running / pg_visibility-running/regress ERROR 7.23s exit status 1
> >
> > diff -U3 /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out
> > --- /home/bf/bf-build/scorpion/HEAD/pgsql/contrib/pg_visibility/expected/pg_visibility.out 2026-01-26 22:07:12.923378464 +0100
> > +++ /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/pg_visibility-running/regress/results/pg_visibility.out 2026-01-28 20:15:13.802517085 +0100
> > @@ -213,7 +213,7 @@
> > select pg_visibility_map_summary('test_vac_unmodified_heap');
> > pg_visibility_map_summary
> > ---------------------------
> > - (1,1)
> > + (0,0)
> > (1 row)
> >
> > -- the checkpoint cleans the buffer dirtied by freezing the sole tuple
> > @@ -237,7 +237,7 @@
> > FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
> > ?column?
> > ----------
> > - t
> > + f
> > (1 row)
> >
> > -- vacuum sets the VM
> >
> > I've managed to reproduce it locally with the attached and:
> > echo "autovacuum_naptime = 1" > /tmp/temp.config
> > TEMP_CONFIG=/tmp/temp.config make -s check -C contrib/pg_visibility/
> > ...
> > ok 85 - pg_visibility 30 ms
> > not ok 86 - pg_visibility 165 ms
> > ok 87 - pg_visibility 36 ms
> > ...
> > # 1 of 100 tests failed.
> >
> > Could you please look at this?
> >
> > Probably you'll find [2] helpful.
> >
> > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-01-28%2019%3A07%3A32
> > [2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1c64d2fcb
> >
> > Best regards,
> > Alexander
>
>
> Thanks Alexander!
> This is a good and detailed report, I was able to reproduce this.
>
> I have added some logs to my copy of postgres with your patch and I
> think problem causing this test to fail is this sequence:
>
> 1) Autovacuum starts, does its deeds, and acquiring xid = 118518
> 2) insert into test_vac_unmodified_heap values (1); executes and
> commits with xid = 118519 (from my log)
> 3) vacuum freeze starts and computes cutoff xid = 118518, because
> oldest xmin is 118518 from (1)
>
> *and we cannot freeze tuple*
>
>
> ```
>
> 2026-01-29 13:27:44.559 UTC [133670] DEBUG: CommitTransaction(1)
> name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid:
> 118519/1/0 (used)
> ...
> 2026-01-29 13:27:44.559 UTC [133672] DEBUG: CommitTransaction(1)
> name: unnamed; blockState: STARTED; state: INPROGRESS, xid/subid/cid:
> 118518/1/2
> ...
> 2026-01-29 13:27:44.560 UTC [133670] INFO: finished vacuuming
> "contrib_regression.public.test_vac_unmodified_heap": index scans: 0
> pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0
> eagerly scanned
> tuples: 0 removed, 1 remain, 0 are dead but not yet removable
> removable cutoff: 118518, which was 2 XIDs old when operation ended
> new relfrozenxid: 118518, which is 1 XIDs ahead of previous value
> frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
> visibility map: 0 pages set all-visible, 0 pages set
> all-frozen (0 were all-visible)
> index scan not needed: 0 pages from table (0.00% of total) had
> 0 dead item identifiers removed
> avg read rate: 0.000 MB/s, avg write rate: 54.253 MB/s
> buffer usage: 22 hits, 0 reads, 3 dirtied
>
>
> ```
>
> I did not come up with a fix yet though.
>
> --
> Best regards,
> Kirill Reshke
One possible way here is to remove regression test changes made in
21796c267, and rewrite this ad TAP test. In TAP test, we can do akin
to 006_singla_autovacuum.pl:
```
# From this point, autovacuum worker will wait at startup.
$node->safe_psql('postgres',
"SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
```
So, we can remove autovacuum xid acquirition from the test.
Thoughts?
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-01-29 15:16 ` Melanie Plageman <[email protected]>
2026-01-30 09:25 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
1 sibling, 1 reply; 17+ messages in thread
From: Melanie Plageman @ 2026-01-29 15:16 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Thu, Jan 29, 2026 at 8:39 AM Kirill Reshke <[email protected]> wrote:
>
> Thanks Alexander!
> This is a good and detailed report, I was able to reproduce this.
Thanks to both of you for looking into it!
> I have added some logs to my copy of postgres with your patch and I
> think problem causing this test to fail is this sequence:
>
> 1) Autovacuum starts, does its deeds, and acquiring xid = 118518
So, in this scenario, is the issue that autovacuum runs before vacuum
freeze? If so, we can change the table DDL to:
create table test_vac_unmodified_heap(a int) with (autovacuum_enabled = false);
which would prevent the autovacuum from running.
Unless there is some other way for one of the other tests to hold
OldestXmin back to before the xid of the insert. But I don't see how.
> 2) insert into test_vac_unmodified_heap values (1); executes and
> commits with xid = 118519 (from my log)
> 3) vacuum freeze starts and computes cutoff xid = 118518, because
> oldest xmin is 118518 from (1)
>
> *and we cannot freeze tuple*
- Melanie
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-01-30 09:25 ` Kirill Reshke <[email protected]>
2026-01-30 09:59 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-30 21:36 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
0 siblings, 2 replies; 17+ messages in thread
From: Kirill Reshke @ 2026-01-30 09:25 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Thu, 29 Jan 2026 at 20:16, Melanie Plageman
<[email protected]> wrote:
>
> On Thu, Jan 29, 2026 at 8:39 AM Kirill Reshke <[email protected]> wrote:
> >
> > Thanks Alexander!
> > This is a good and detailed report, I was able to reproduce this.
>
> Thanks to both of you for looking into it!
>
> > I have added some logs to my copy of postgres with your patch and I
> > think problem causing this test to fail is this sequence:
> >
> > 1) Autovacuum starts, does its deeds, and acquiring xid = 118518
>
> So, in this scenario, is the issue that autovacuum runs before vacuum
> freeze? If so, we can change the table DDL to:
>
> create table test_vac_unmodified_heap(a int) with (autovacuum_enabled = false);
>
> which would prevent the autovacuum from running.
>
> Unless there is some other way for one of the other tests to hold
> OldestXmin back to before the xid of the insert. But I don't see how.
>
> > 2) insert into test_vac_unmodified_heap values (1); executes and
> > commits with xid = 118519 (from my log)
> > 3) vacuum freeze starts and computes cutoff xid = 118518, because
> > oldest xmin is 118518 from (1)
> >
> > *and we cannot freeze tuple*
>
> - Melanie
Sorry, I messed up my previous email.
> create table test_vac_unmodified_heap(a int) with (autovacuum_enabled = false);
Yes I did try this, but it does not help, because autovacuum runs on
catalog relations, still causing fail.
We cannot disable autovac globally in regression suite, so I propose
to changes this to TAp test
--
Best regards,
Kirill Reshke
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-30 09:25 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-01-30 09:59 ` Kirill Reshke <[email protected]>
2026-01-30 11:09 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
1 sibling, 1 reply; 17+ messages in thread
From: Kirill Reshke @ 2026-01-30 09:59 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Fri, 30 Jan 2026 at 14:25, Kirill Reshke <[email protected]> wrote:
>
> Sorry, I messed up my previous email.
>
> > create table test_vac_unmodified_heap(a int) with (autovacuum_enabled = false);
>
> Yes I did try this, but it does not help, because autovacuum runs on
> catalog relations, still causing fail.
>
> We cannot disable autovac globally in regression suite, so I propose
> to changes this to TAp test
>
FPA POC v1
I use 'autovacuum-worker-start' injection point to 'disable'
autovacuum until test is done
--
Best regards,
Kirill Reshke
Attachments:
[application/octet-stream] v1-0001-Reimplement-regression-tests-from-21796c267-as-TA.patch (8.0K, 2-v1-0001-Reimplement-regression-tests-from-21796c267-as-TA.patch)
download | inline diff:
From b0761ac76175b583f913ef2ff5994f467e194731 Mon Sep 17 00:00:00 2001
From: reshke <[email protected]>
Date: Fri, 30 Jan 2026 09:54:26 +0000
Subject: [PATCH v1] Reimplement regression tests from 21796c267 as TAP-test.
Regression test changes introduces in 21796c267 appears to be flakky
due to possible concurrent autovacuum activity. This actuvity can
allocate xids, preventing VACUUM FREEZE to actaully freeze page tuples
due to hilding oldest xmin.
So, reimplement this test as TAP-test, and prevent autovacuum workers
start with injection point.
This was actually detected by scorpion build animal.
Thanks to Alexander Lakhin for detailed report.
---
contrib/pg_visibility/Makefile | 4 +-
.../pg_visibility/expected/pg_visibility.out | 44 ------------
contrib/pg_visibility/meson.build | 4 ++
contrib/pg_visibility/sql/pg_visibility.sql | 20 ------
contrib/pg_visibility/t/003_vacuum_freeze.pl | 70 +++++++++++++++++++
5 files changed, 77 insertions(+), 65 deletions(-)
create mode 100644 contrib/pg_visibility/t/003_vacuum_freeze.pl
diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
index e5a74f32c48..7768bdf72d7 100644
--- a/contrib/pg_visibility/Makefile
+++ b/contrib/pg_visibility/Makefile
@@ -10,7 +10,9 @@ DATA = pg_visibility--1.1.sql pg_visibility--1.1--1.2.sql \
pg_visibility--1.0--1.1.sql
PGFILEDESC = "pg_visibility - page visibility information"
-EXTRA_INSTALL = contrib/pageinspect
+EXTRA_INSTALL=src/test/modules/injection_points contrib/pageinspect
+export enable_injection_points
+
REGRESS = pg_visibility
TAP_TESTS = 1
diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index e10f1706015..09fa5933a35 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -1,5 +1,4 @@
CREATE EXTENSION pg_visibility;
-CREATE EXTENSION pageinspect;
--
-- recently-dropped table
--
@@ -205,49 +204,6 @@ select pg_truncate_visibility_map('test_partition');
(1 row)
--- test the case where vacuum phase I does not need to modify the heap buffer
--- and only needs to set the VM
-create table test_vac_unmodified_heap(a int);
-insert into test_vac_unmodified_heap values (1);
-vacuum (freeze) test_vac_unmodified_heap;
-select pg_visibility_map_summary('test_vac_unmodified_heap');
- pg_visibility_map_summary
----------------------------
- (1,1)
-(1 row)
-
--- the checkpoint cleans the buffer dirtied by freezing the sole tuple
-checkpoint;
--- truncating the VM ensures that the next vacuum will need to set it
-select pg_truncate_visibility_map('test_vac_unmodified_heap');
- pg_truncate_visibility_map
-----------------------------
-
-(1 row)
-
-select pg_visibility_map_summary('test_vac_unmodified_heap');
- pg_visibility_map_summary
----------------------------
- (0,0)
-(1 row)
-
--- though the VM is truncated, the heap page-level visibility hint,
--- PD_ALL_VISIBLE should still be set
-SELECT (flags & x'0004'::int) <> 0
- FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
- ?column?
-----------
- t
-(1 row)
-
--- vacuum sets the VM
-vacuum test_vac_unmodified_heap;
-select pg_visibility_map_summary('test_vac_unmodified_heap');
- pg_visibility_map_summary
----------------------------
- (1,1)
-(1 row)
-
-- test copy freeze
create table copyfreeze (a int, b char(1500));
-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
diff --git a/contrib/pg_visibility/meson.build b/contrib/pg_visibility/meson.build
index 8a17050f2ac..1d2dd3ee572 100644
--- a/contrib/pg_visibility/meson.build
+++ b/contrib/pg_visibility/meson.build
@@ -34,9 +34,13 @@ tests += {
],
},
'tap': {
+ 'env': {
+ 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no',
+ },
'tests': [
't/001_concurrent_transaction.pl',
't/002_corrupt_vm.pl',
+ 't/003_vacuum_freeze.pl',
],
},
}
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index 57af8a0c5b6..5af06ec5b76 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -1,5 +1,4 @@
CREATE EXTENSION pg_visibility;
-CREATE EXTENSION pageinspect;
--
-- recently-dropped table
@@ -95,25 +94,6 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
select * from pg_check_frozen('test_partition'); -- hopefully none
select pg_truncate_visibility_map('test_partition');
--- test the case where vacuum phase I does not need to modify the heap buffer
--- and only needs to set the VM
-create table test_vac_unmodified_heap(a int);
-insert into test_vac_unmodified_heap values (1);
-vacuum (freeze) test_vac_unmodified_heap;
-select pg_visibility_map_summary('test_vac_unmodified_heap');
--- the checkpoint cleans the buffer dirtied by freezing the sole tuple
-checkpoint;
--- truncating the VM ensures that the next vacuum will need to set it
-select pg_truncate_visibility_map('test_vac_unmodified_heap');
-select pg_visibility_map_summary('test_vac_unmodified_heap');
--- though the VM is truncated, the heap page-level visibility hint,
--- PD_ALL_VISIBLE should still be set
-SELECT (flags & x'0004'::int) <> 0
- FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));
--- vacuum sets the VM
-vacuum test_vac_unmodified_heap;
-select pg_visibility_map_summary('test_vac_unmodified_heap');
-
-- test copy freeze
create table copyfreeze (a int, b char(1500));
diff --git a/contrib/pg_visibility/t/003_vacuum_freeze.pl b/contrib/pg_visibility/t/003_vacuum_freeze.pl
new file mode 100644
index 00000000000..382539fb9c4
--- /dev/null
+++ b/contrib/pg_visibility/t/003_vacuum_freeze.pl
@@ -0,0 +1,70 @@
+
+# Copyright (c) 2026-2026, PostgreSQL Global Development Group
+
+# Check that vacuum phase I does not need to modify the heap buffer.
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+if ($ENV{enable_injection_points} ne 'yes')
+{
+ plan skip_all => 'Injection points not supported by this build';
+}
+
+# Initialize the primary node
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->safe_psql('postgres', 'CREATE EXTENSION injection_points;');
+
+
+# From this point, autovacuum worker will wait at startup.
+$node->safe_psql('postgres',
+ "SELECT injection_points_attach('autovacuum-worker-start', 'wait');");
+
+# Create a sample table and run vacuum
+$node->safe_psql("postgres",
+ "CREATE EXTENSION pg_visibility;\n"
+ . "CREATE EXTENSION pageinspect;\n"
+ . "create table test_vac_unmodified_heap(a int);\n"
+ . "insert into test_vac_unmodified_heap values (1);\n"
+ . "vacuum (freeze) test_vac_unmodified_heap;");
+
+my $result = $node->safe_psql('postgres', qq(select pg_visibility_map_summary('test_vac_unmodified_heap');));
+like($result, qr/(1,1)/, 'pg_visibility_map_summary returned as expected');
+
+
+# truncating the VM ensures that the next vacuum will need to set it
+$node->safe_psql("postgres",
+ "CHECKPOINT;\n"
+ . "select pg_truncate_visibility_map('test_vac_unmodified_heap');\n");
+
+$result = $node->safe_psql('postgres', qq(
+ select pg_visibility_map_summary('test_vac_unmodified_heap');));
+like($result, qr/(0,0)/, 'page_header returned as expected');
+
+
+# though the VM is truncated, the heap page-level visibility hint,
+# PD_ALL_VISIBLE should still be set
+
+$result = $node->safe_psql('postgres', qq(
+ SELECT 'page flags is: '||(flags & x'0004'::int)::text FROM page_header(get_raw_page('test_vac_unmodified_heap', 0));));
+like($result, qr/page flags is: 4/, 'page_header returned as expected');
+
+
+# vacuum sets the VM
+$node->safe_psql("postgres",
+ "vacuum test_vac_unmodified_heap;\n");
+
+$result = $node->safe_psql('postgres', qq(
+ select pg_visibility_map_summary('test_vac_unmodified_heap');));
+like($result, qr/(1,1)/, 'page_header returned as expected');
+
+# Release injection point.
+$node->safe_psql('postgres',
+ "SELECT injection_points_detach('autovacuum-worker-start');");
+
+done_testing();
--
2.43.0
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-30 09:25 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-30 09:59 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-01-30 11:09 ` Andrey Borodin <[email protected]>
0 siblings, 0 replies; 17+ messages in thread
From: Andrey Borodin @ 2026-01-30 11:09 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Melanie Plageman <[email protected]>; Alexander Lakhin <[email protected]>; Andres Freund <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
Well, converting to TAP seems feasible to me. This will make test more stable.
> On 30 Jan 2026, at 14:59, Kirill Reshke <[email protected]> wrote:
>
> I use 'autovacuum-worker-start' injection point to 'disable'
> autovacuum until test is done
If you just set autovacuum off - the test will be executed on bf animals that has no injection points.
Also
x4mmm@x4mmm-osx postgres % git am ~/Downloads/v1-0001-Reimplement-regression-tests-from-21796c267-as-TA.patch
Applying: Reimplement regression tests from 21796c267 as TAP-test.
.git/rebase-apply/patch:148: trailing whitespace.
# Check that vacuum phase I does not need to modify the heap buffer. warning: 1 line adds whitespace errors.
There are some typos: "flakky", "introduces", "actuvity", "actaully", "hilding".
Test descriptions are off: you used 'page_header returned as expected' when it's actually about pg_visibility_map_summary.
And, of course, total lack of comments is not good.
However, in principle approach seems good to me.
(FWIW I'm looking into patches 0003-0005 of v33, I'll post when I'll find some nits)
Best regards, Andrey Borodin.
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-30 09:25 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Kirill Reshke <[email protected]>
@ 2026-01-30 21:36 ` Melanie Plageman <[email protected]>
1 sibling, 0 replies; 17+ messages in thread
From: Melanie Plageman @ 2026-01-30 21:36 UTC (permalink / raw)
To: Kirill Reshke <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; Andres Freund <[email protected]>; Andrey Borodin <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Fri, Jan 30, 2026 at 4:25 AM Kirill Reshke <[email protected]> wrote:
>
> > create table test_vac_unmodified_heap(a int) with (autovacuum_enabled = false);
>
> Yes I did try this, but it does not help, because autovacuum runs on
> catalog relations, still causing fail.
>
> We cannot disable autovac globally in regression suite, so I propose
> to changes this to TAp test
Andres suggested making the table a temp table. He said other sessions
vacuuming catalog tables shouldn't affect the temp table horizon. If
you try that in your repro does it fix it?
- Melanie
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
@ 2026-02-20 21:34 ` Andres Freund <[email protected]>
2026-03-03 00:04 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
1 sibling, 1 reply; 17+ messages in thread
From: Andres Freund @ 2026-02-20 21:34 UTC (permalink / raw)
To: Melanie Plageman <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
Hi,
On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
> Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-visible
>
> Many queries do not modify the underlying relation. For such queries, if
> on-access pruning occurs during the scan, we can check whether the page
> has become all-visible and update the visibility map accordingly.
> Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> all-frozen.
>
> This commit implements on-access VM setting for sequential scans as well
> as for the underlying heap relation in index scans and bitmap heap
> scans.
For evaluating this, did you build anything that evaluates the frequency of
this succeeding, causing unnecessary un-all-visibling etc during benchmarks?
> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> index 1a3fa8a76aa..7d22549a290 100644
> --- a/src/backend/access/heap/heapam.c
> +++ b/src/backend/access/heap/heapam.c
> @@ -617,6 +617,7 @@ heap_prepare_pagescan(TableScanDesc sscan)
> Buffer buffer = scan->rs_cbuf;
> BlockNumber block = scan->rs_cblock;
> Snapshot snapshot;
> + Buffer *vmbuffer = NULL;
> Page page;
> int lines;
> bool all_visible;
> @@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
> /*
> * Prune and repair fragmentation for the whole page, if possible.
> */
> - heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
> + if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
> + vmbuffer = &scan->rs_vmbuffer;
> + heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
I don't love that the signalling to heap_page_prune_opt() about this is by
passing vmbuffer or NULL.
We clearly don't want to actually freeze rows if we're doing an update and
might just update the rows again. But it's less clear to me that, if we are
pruning dead row versions *and* the page is already all-visible after that
(say because only HOT versions were removed), we shouldn't mark the page as
such?
> @@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
> .cutoffs = NULL,
> };
>
> + if (vmbuffer)
> + {
> + visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
> + params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
> + params.vmbuffer = *vmbuffer;
Why do we pin the buffer at this time, rather than deferring that until we
actually need it? I guess we just always will access it, but that doesn't
seem like it's inherent (c.f. my earlier points about a faster exit when
looking at an already all-frozen page or such).
It's not clear to me why we are pinning the page in lazy_scan_heap(), before
it's clear that we need it, either. But there the cost is often very low,
because we have a lot of sequential accesses. But here we might be called
from an index scan, with very little locality of access.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 17+ messages in thread
* Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
2026-01-28 23:16 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Melanie Plageman <[email protected]>
2026-02-20 21:34 ` Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andres Freund <[email protected]>
@ 2026-03-03 00:04 ` Melanie Plageman <[email protected]>
0 siblings, 0 replies; 17+ messages in thread
From: Melanie Plageman @ 2026-03-03 00:04 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Andrey Borodin <[email protected]>; Kirill Reshke <[email protected]>; Chao Li <[email protected]>; Xuneng Zhou <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>; Heikki Linnakangas <[email protected]>
On Fri, Feb 20, 2026 at 4:34 PM Andres Freund <[email protected]> wrote:
>
> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-visible
> >
> > Many queries do not modify the underlying relation. For such queries, if
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
> >
> > This commit implements on-access VM setting for sequential scans as well
> > as for the underlying heap relation in index scans and bitmap heap
> > scans.
>
> For evaluating this, did you build anything that evaluates the frequency of
> this succeeding, causing unnecessary un-all-visibling etc during benchmarks?
I didn't develop a specific micro-benchmark for this, but I did run
some generic pgbenches (which does a single tuple update on accounts
followed by a select) because I thought there would be a good amount
of un-all-visibling there. I didn't gather stats to confirm though and
who knows with a random data distribution (IIRC it was a relatively
small working set, but still). I can develop something more targeted,
though.
> > @@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
> > /*
> > * Prune and repair fragmentation for the whole page, if possible.
> > */
> > - heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
> > + if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
> > + vmbuffer = &scan->rs_vmbuffer;
> > + heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
>
> I don't love that the signalling to heap_page_prune_opt() about this is by
> passing vmbuffer or NULL.
v35 is more explicit and heap_page_prune_opt() has a rel_read_only flag.
> We clearly don't want to actually freeze rows if we're doing an update and
> might just update the rows again. But it's less clear to me that, if we are
> pruning dead row versions *and* the page is already all-visible after that
> (say because only HOT versions were removed), we shouldn't mark the page as
> such?
If we're doing an update and the new tuple fits on the same page, then
the page will not be all-visible by the time the update is over,
right? And if the new tuple doesn't fit on the same page as the old
tuple, then while it would be nice to mark the old page as
all-visible, don't we on-access prune the page before actually
updating the tuple? Like we are scanning in the old page to update it
and on-access prune then to make space for it and then we make the
page modification.
> > @@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
> > .cutoffs = NULL,
> > };
> >
> > + if (vmbuffer)
> > + {
> > + visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
> > + params.options |= HEAP_PAGE_PRUNE_UPDATE_VM;
> > + params.vmbuffer = *vmbuffer;
>
> Why do we pin the buffer at this time, rather than deferring that until we
> actually need it? I guess we just always will access it, but that doesn't
> seem like it's inherent (c.f. my earlier points about a faster exit when
> looking at an already all-frozen page or such).
We would need to pin the VM to see if it is all-frozen to exit early.
For the on-access case, since we won't freeze, we could rely on
PD_ALL_VISIBLE to exit early, but that means we wouldn't be able to
identify and fix PD_ALL_VISIBLE/VM-all-visible mismatches.
> It's not clear to me why we are pinning the page in lazy_scan_heap(), before
> it's clear that we need it, either. But there the cost is often very low,
> because we have a lot of sequential accesses. But here we might be called
> from an index scan, with very little locality of access.
Now that, as of v35, we check for VM corruption unconditionally at the
start of heap_page_prune_and_freeze() and check the VM to potentially
exit early, there's no benefit in deferring pinning the VM in either
vacuum or on-access.
- Melanie
^ permalink raw reply [nested|flat] 17+ messages in thread
end of thread, other threads:[~2026-03-03 00:04 UTC | newest]
Thread overview: 17+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-06 09:40 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) Andrey Borodin <[email protected]>
2026-01-06 17:31 ` Melanie Plageman <[email protected]>
2026-01-07 05:55 ` Kirill Reshke <[email protected]>
2026-01-07 08:14 ` Chao Li <[email protected]>
2026-01-27 22:58 ` Melanie Plageman <[email protected]>
2026-01-24 00:28 ` Andres Freund <[email protected]>
2026-01-28 23:16 ` Melanie Plageman <[email protected]>
2026-01-29 05:00 ` Alexander Lakhin <[email protected]>
2026-01-29 13:39 ` Kirill Reshke <[email protected]>
2026-01-29 13:51 ` Kirill Reshke <[email protected]>
2026-01-29 15:16 ` Melanie Plageman <[email protected]>
2026-01-30 09:25 ` Kirill Reshke <[email protected]>
2026-01-30 09:59 ` Kirill Reshke <[email protected]>
2026-01-30 11:09 ` Andrey Borodin <[email protected]>
2026-01-30 21:36 ` Melanie Plageman <[email protected]>
2026-02-20 21:34 ` Andres Freund <[email protected]>
2026-03-03 00:04 ` Melanie Plageman <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox