public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Melanie Plageman <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date: Thu, 11 Dec 2025 12:06:57 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAKRu_bvQiesumRhgvbcym1T9Z9=ddGfUbi-dSNxLRc6JvL1-w@mail.gmail.com>
References: <CAAKRu_ZP-3=SaZykpwDBMJOdUKyW3Wm5JZfPFRR3L5Ac8ouq4w@mail.gmail.com>
	<CAAKRu_bgkOQqu3K5n4YLRsNBZqJ9Rjg80ROqgKSr2UGz4b5hUg@mail.gmail.com>
	<2wk7jo4m4qwh5sn33pfgerdjfujebbccsmmlownybddbh6nawl@mdyyqpqzxjek>
	<CAAKRu_YR-COJ9aGnMUQqt5yoWmUBjikqrd4jGNZYouHaXpis9g@mail.gmail.com>
	<CALdSSPhiCwJwWwgJP1NmqRmnp9RS2tGOBY0gQrfLCbB+OS5_KQ@mail.gmail.com>
	<CAAKRu_YS+Ocm=OzMaZnG4egFiE9v4VYfZ25DXd6jbwegqmGYbQ@mail.gmail.com>
	<CAAKRu_ZGZSqhGt-RcmmfiSheC+1fjQdxy6_+oM-1jMn8hyVptQ@mail.gmail.com>
	<CALdSSPg+B8RTzTXhJvCcKJBqgzhPZkq0E2oqxQdv74ZNZOMVzg@mail.gmail.com>
	<CAAKRu_Zha7HcdQBv8tTtQrcry5332J6kHnOc1X=TT03LzUXDow@mail.gmail.com>
	<CAAKRu_amF00f2T_H8N6pbbe75C22EeX1OqA=svpj8LNO1sdUuw@mail.gmail.com>
	<mhf4vkmh3j57zx7vuxp4jagtdzwhu3573pgfpmnjwqa6i6yj5y@sy4ymcdtdklo>
	<CAAKRu_agCLAQ9OjmrTdJe-X=Xr7QnU4d=cfxdQGwc9jNx9w31w@mail.gmail.com>
	<[email protected]>
	<CAAKRu_bvkCZ+xBzBjujJMkA5STU+b6AtrUUTjcvAH=ZnnpTtzA@mail.gmail.com>
	<CAAKRu_bvQiesumRhgvbcym1T9Z9=ddGfUbi-dSNxLRc6JvL1-w@mail.gmail.com>



> On Dec 11, 2025, at 07:35, Melanie Plageman <[email protected]> wrote:
> 
> On Tue, Dec 9, 2025 at 12:48 PM Melanie Plageman
> <[email protected]> wrote:
>> 
>> In this set 0001 and 0002 are independent. 0003-0007 are all small
>> steps toward the single change in 0007 which combines the VM updates
>> into the same WAL record as pruning and freezing. 0008 and 0009 are
>> removing the rest of XLOG_HEAP2_VISIBLE. 0010 - 0012 are refactoring
>> needed to set the VM during on-access pruning. 0013 - 0015 are small
>> steps toward setting the VM on-access. And 0016 sets the prune xid on
>> insert so we may set the VM on-access for pages that have only new
>> data.
> 
> I committed 0001 and 0002. attached v25 reflects that.
> 0001-0004 refactoring steps for eliminate visible record from phase I
> (not probably independent commits in the end)
> 0005 eliminate XLOG_HEAP2_VISIBLE from phase I vac
> 0006-0007 removing the rest of XLOG_HEAP2_VISIBLE
> 0008-0010 refactoring for setting VM on-access
> 0011-0013 setting the VM on-access
> 0014 - setting pd_prune_xid on insert
> 
> - Melanie
> <v25-0001-Combine-visibilitymap_set-cases-in-lazy_scan_pru.patch><v25-0002-Refactor-lazy_scan_prune-VM-set-logic-into-helpe.patch><v25-0003-Set-the-VM-in-heap_page_prune_and_freeze.patch><v25-0004-Move-VM-assert-into-prune-freeze-code.patch><v25-0005-Eliminate-XLOG_HEAP2_VISIBLE-from-vacuum-phase-I.patch><v25-0006-Eliminate-XLOG_HEAP2_VISIBLE-from-empty-page-vac.patch><v25-0007-Remove-XLOG_HEAP2_VISIBLE-entirely.patch><v25-0008-Rename-GlobalVisTestIsRemovableXid-to-GlobalVisX.patch><v25-0009-Use-GlobalVisState-in-vacuum-to-determine-page-l.patch><v25-0011-Track-which-relations-are-modified-by-a-query.patch><v25-0012-Pass-down-information-on-table-modification-to-s.patch><v25-0013-Allow-on-access-pruning-to-set-pages-all-visible.patch><v25-0014-Set-pd_prune_xid-on-insert.patch>

A few more small comments. Sorry for keeping come out new comments. Actually I learned a lot about vacuum from reviewing this patch.

1 - 0001
```
+-- the checkpoint cleans the buffer dirtied by freezing the sole tuple
+checkpoint;
+-- truncating the VM ensures that the next vacuum will need to set it
+select pg_truncate_visibility_map('test_vac_unmodified_heap');
+-- vacuum sets the VM but does not need to set PD_ALL_VISIBLE so no heap page
+-- modification
+vacuum test_vac_unmodified_heap;
```

The last vacuum is expected to set vm bits, but the test doesn’t verify that. Should we verify that like:
```
evantest=# SELECT blkno, all_visible, all_frozen FROM pg_visibility_map('test_vac_unmodified_heap');
 blkno | all_visible | all_frozen
-------+-------------+------------
     0 | t           | t
(1 row)
```

As you have been using the extension pg_visibility, adding the verification with pg_visibility_map() should not be a burden.

2 - 0001
```
 		if (presult.all_frozen)
 		{
+			/*
+			 * We can pass InvalidTransactionId as our cutoff_xid, since a
+			 * snapshotConflictHorizon sufficient to make everything safe for
+			 * REDO was logged when the page's tuples were frozen.
+			 */
 			Assert(!TransactionIdIsValid(presult.vm_conflict_horizon));
-			flags |= VISIBILITYMAP_ALL_FROZEN;
+			new_vmbits |= VISIBILITYMAP_ALL_FROZEN;
 		}
```

The comment here is a little confusing. In the old code, the Assert() as immediately above the call visibilitymap_set(), and cutoff_xid is a parameter to the call. But the new code moves the Assert() as well as the comment far away from the call visibilitymap_set(), so I think the comment should stay together with the call of visibilitymap_set().

3 - 0002
```
 * If it finds that the page-level visibility hint or VM is corrupted, it will
* fix them by clearing the VM bits and visibility page hint. This does not
```

In the second line, “visibility page hint” is understandable but feels not quite good. I know it’s actually “page-level visibility hint”, so how about just “visibility hint”.

4 - 0002
```
 	/*
-	 * As of PostgreSQL 9.2, the visibility map bit should never be set if the
-	 * page-level bit is clear.  However, it's possible that the bit got
-	 * cleared after heap_vac_scan_next_block() was called, so we must recheck
-	 * with buffer lock before concluding that the VM is corrupt.
+	 * For the purposes of logging, count whether or not the page was newly
+	 * set all-visible and, potentially, all-frozen.
 	 */
-	else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
-			 visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)
+	if ((old_vmbits & VISIBILITYMAP_ALL_VISIBLE) == 0 &&
+		(new_vmbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
 	{
```

Without do_set_vm==true, old_vmbits will only be 0, thus this “if-elseif” that uses old_vmbits should be moved into “if (do_set_vm)”. From this perspective, if not do_set_vm, this function can return early, like:

```
Do_set_vm = heap_page_will_set_vm(&new_vmbits)
If (!do_set_vm)
   Return presult.ndeleted;

PageSetAllVisible(page);
MarkBufferDirty(buf);
old_vmbits = visibilitymap_set(new_vmbits);
If (old_vmbits..)
{
..
}
Else if (old_vmbits…)
{
…
}

Return presult.ndeleted;
```

5 - 0003
```
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
@@ -2076,15 +1979,14 @@ lazy_scan_prune(LVRelState *vacrel,
 				bool *vm_page_frozen)
 {
 	Relation	rel = vacrel->rel;
-	bool		do_set_vm = false;
-	uint8		new_vmbits = 0;
-	uint8		old_vmbits = 0;
 	PruneFreezeResult presult;
 	PruneFreezeParams params = {
 		.relation = rel,
 		.buffer = buf,
+		.vmbuffer = vmbuffer,
+		.blk_known_av = all_visible_according_to_vm,
 		.reason = PRUNE_VACUUM_SCAN,
-		.options = HEAP_PAGE_PRUNE_FREEZE,
+		.options = HEAP_PAGE_PRUNE_FREEZE | HEAP_PAGE_PRUNE_UPDATE_VM,
 		.vistest = vacrel->vistest,
 		.cutoffs = &vacrel->cutoffs,
 	};
```

This maybe a legacy bug. Here presult is not initialized, and it is immediately passed to heap_page_prune_and_freeze():

```
	heap_page_prune_and_freeze(&params,
							   &presult, <=== here
							   &vacrel->offnum,
							   &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid);
```

Then heap_page_prune_and_freeze() immediately calls prune_freeze_setup():
```
	/* Initialize prstate */
	prune_freeze_setup(params,
					   new_relfrozen_xid, new_relmin_mxid,
					   presult, &prstate);
```

And prune_freeze_setup() takes presult as a const pointer:
```
static void
prune_freeze_setup(PruneFreezeParams *params,
				   TransactionId *new_relfrozen_xid,
				   MultiXactId *new_relmin_mxid,
				   const PruneFreezeResult *presult, <=== here
				   PruneState *prstate)
{
    prstate->deadoffsets = (OffsetNumber *) presult->deadoffsets; <== here, presult->deadoffsets could be a random value
}
```

As this is a separate issue off the current patch, I just filed a new patch to fix it. Please take a look at:
https://www.postgresql.org/message-id/CAEoWx2%3DjiD1nqch4JQN%2BodAxZSD7mRvdoHUGJYN2r6tQG_66yQ%40mail...

6 - 0003
```
+ * Returns true if one or both VM bits should be set, along with returning the
+ * desired what bits should be set in the VM in *new_vmbits.
```

Looks like a typo: “returning the desired what bits should be set”, maybe change to “returning the desired bits to be set”.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









view thread (143+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox