MIME-Version: 1.0
References: <6BC5DBAB-6084-4BB8-8450-52E9648AB021@gmail.com>
 <CAAKRu_ZCjHoRPfQ8AbMrFY8TOMCPAvZ0_m9SX7yg0edfTk45-g@mail.gmail.com>
 <7F5BCD7A-764D-4D8D-8E27-6F2CAAEA1CEE@gmail.com>
 <CAAKRu_a04jbDACwzRYwzDND31aPyf7Yvz9TAZrTr=+F5bK1aVA@mail.gmail.com>
 <CALdSSPjcv25jmXm29X-MRWZBae6+HwcWfVH1PE8NfD=EMTnkAg@mail.gmail.com>
 <CAAKRu_bwtBEzDwemyim1r6yYonw7FTyFr1HXG8vywCe-MdbPBQ@mail.gmail.com>
 <4379FDA3-9446-4E2C-9C15-32EFE8D4F31B@yandex-team.ru>
 <CAAKRu_YQd=2KvomM+RHcpeDKj0bq+peJ=3W-fip+pkvzA-Jq9w@mail.gmail.com>
 <7ib3sa55sapwjlaz4sijbiq7iezna27kjvvvar4dpgkmadml6t@gfpkkwmdnepx>
 <CAAKRu_bs+gZ83QDacmBxunPvCGnXJ05hxP2BDPJ3BGwdbGRXzg@mail.gmail.com>
 <j3g4ggo2f2bjrtbrxzf2rrypxzvb52u7p7etf2hnfrukqibytt@ruhpo4cg4ty7>
In-Reply-To: <j3g4ggo2f2bjrtbrxzf2rrypxzvb52u7p7etf2hnfrukqibytt@ruhpo4cg4ty7>
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Mon, 2 Mar 2026 19:04:24 -0500
Message-ID: 
 <CAAKRu_a1V7TUUYM7qO2c5Z-JyTKOsrryQBrk7Eu69ESzhqgd9w@mail.gmail.com>
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM
 on-access)
To: Andres Freund <andres@anarazel.de>
Cc: Andrey Borodin <x4mmm@yandex-team.ru>,
 Kirill Reshke <reshkekirill@gmail.com>,
	Chao Li <li.evan.chao@gmail.com>, Xuneng Zhou <xunengzhou@gmail.com>,
	Robert Haas <robertmhaas@gmail.com>,
	PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
 Heikki Linnakangas <hlinnaka@iki.fi>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAAKRu_a1V7TUUYM7qO2c5Z-JyTKOsrryQBrk7Eu69ESzhqgd9w%40mail.gmail.com>
Precedence: bulk

On Fri, Feb 20, 2026 at 4:34=E2=80=AFPM Andres Freund <andres@anarazel.de> =
wrote:
>
> On 2026-01-28 18:16:10 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v34 13/14] Allow on-access pruning to set pages all-vis=
ible
> >
> > Many queries do not modify the underlying relation. For such queries, i=
f
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
> >
> > This commit implements on-access VM setting for sequential scans as wel=
l
> > as for the underlying heap relation in index scans and bitmap heap
> > scans.
>
> For evaluating this, did you build anything that evaluates the frequency =
of
> this succeeding, causing unnecessary un-all-visibling etc during benchmar=
ks?

I didn't develop a specific micro-benchmark for this, but I did run
some generic pgbenches (which does a single tuple update on accounts
followed by a select) because I thought there would be a good amount
of un-all-visibling there. I didn't gather stats to confirm though and
who knows with a random data distribution (IIRC it was a relatively
small working set, but still). I can develop something more targeted,
though.

> > @@ -631,7 +632,9 @@ heap_prepare_pagescan(TableScanDesc sscan)
> >       /*
> >        * Prune and repair fragmentation for the whole page, if possible=
.
> >        */
> > -     heap_page_prune_opt(scan->rs_base.rs_rd, buffer);
> > +     if (sscan->rs_flags & SO_HINT_REL_READ_ONLY)
> > +             vmbuffer =3D &scan->rs_vmbuffer;
> > +     heap_page_prune_opt(scan->rs_base.rs_rd, buffer, vmbuffer);
>
> I don't love that the signalling to heap_page_prune_opt() about this is b=
y
> passing vmbuffer or NULL.

v35 is more explicit and heap_page_prune_opt() has a rel_read_only flag.

> We clearly don't want to actually freeze rows if we're doing an update an=
d
> might just update the rows again. But it's less clear to me that, if we a=
re
> pruning dead row versions *and* the page is already all-visible after tha=
t
> (say because only HOT versions were removed), we shouldn't mark the page =
as
> such?

If we're doing an update and the new tuple fits on the same page, then
the page will not be all-visible by the time the update is over,
right? And if the new tuple doesn't fit on the same page as the old
tuple, then while it would be nice to mark the old page as
all-visible, don't we on-access prune the page before actually
updating the tuple? Like we are scanning in the old page to update it
and on-access prune then to make space for it and then we make the
page modification.

> > @@ -306,6 +312,13 @@ heap_page_prune_opt(Relation relation, Buffer buff=
er)
> >                               .cutoffs =3D NULL,
> >                       };
> >
> > +                     if (vmbuffer)
> > +                     {
> > +                             visibilitymap_pin(relation, BufferGetBloc=
kNumber(buffer), vmbuffer);
> > +                             params.options |=3D HEAP_PAGE_PRUNE_UPDAT=
E_VM;
> > +                             params.vmbuffer =3D *vmbuffer;
>
> Why do we pin the buffer at this time, rather than deferring that until w=
e
> actually need it?  I guess we just always will access it, but that doesn'=
t
> seem like it's inherent (c.f. my earlier points about a faster exit when
> looking at an already all-frozen page or such).

We would need to pin the VM to see if it is all-frozen to exit early.
For the on-access case, since we won't freeze, we could rely on
PD_ALL_VISIBLE to exit early, but that means we wouldn't be able to
identify and fix PD_ALL_VISIBLE/VM-all-visible mismatches.

> It's not clear to me why we are pinning the page in lazy_scan_heap(), bef=
ore
> it's clear that we need it, either.  But there the cost is often very low=
,
> because we have a lot of sequential accesses.  But here we might be calle=
d
> from an index scan, with very little locality of access.

Now that, as of v35, we check for VM corruption unconditionally at the
start of heap_page_prune_and_freeze() and check the VM to potentially
exit early, there's no benefit in deferring pinning the VM in either
vacuum or on-access.

- Melanie