public inbox for [email protected]  
help / color / mirror / Atom feed
From: Xuneng Zhou <[email protected]>
To: Andres Freund <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Subject: Re: Streamify more code paths
Date: Mon, 16 Mar 2026 11:45:05 +0800
Message-ID: <CABPTF7XFkUM9jm3cvcrr7sPAdmCVVW+PP=GfwUt3N90CeX3u1Q@mail.gmail.com> (raw)
In-Reply-To: <CABPTF7XE1AE2B2Jf_jzRhzPZpU3DZ+ZGCdzx27U55HP=f0vY1w@mail.gmail.com>
References: <CABPTF7VyePb8O-WDgs2hCCXYhZzGzdjg0N3NkxojZ=ke4SB3pA@mail.gmail.com>
	<CAN55FZ39HSsXKTSi66ASq+i4Ed5FuGXD11hmJ+8c0F0O0+ozew@mail.gmail.com>
	<CABPTF7Vd4JWSHi9N7pGTzn6xmOdtAToCe1NGbZAH8U9_mXOqpw@mail.gmail.com>
	<CABPTF7W-f_zPN442FCp4Xaopi721oDmGYimq=VhAk=F7jwYZDQ@mail.gmail.com>
	<CABPTF7VUaRnvsXqa+628YkuR4oPVRr1mR2seXTkxabfiqQ3NHw@mail.gmail.com>
	<CABPTF7VtSYmC5LZSnkJWYn9PCkxgOJd9QbtAM79qftBK-fbA4w@mail.gmail.com>
	<CABPTF7UVCkub6jFXVk-qrYd4xjgiwRt1FTFL2=rBVV9SYcgfkQ@mail.gmail.com>
	<[email protected]>
	<CABPTF7XD51Qx2043p80npKmYEd67qMagK5AW=s6LNXyZt5s2nw@mail.gmail.com>
	<6vfwrw7xci7pccrfcne5ekvjw5l2uvjsjzpmgi356h7y2rkojx@6ogr34zctcue>
	<r2lwb3342lwrvyoz4fuwctfgfx776rm22ttaytvlinbkarakmp@gujbzwverhra>
	<CABPTF7Us4JtnkDr31khm0=Jn8OxKbbhKC3A0SK9dS+pwDdWSjw@mail.gmail.com>
	<CABPTF7XE1AE2B2Jf_jzRhzPZpU3DZ+ZGCdzx27U55HP=f0vY1w@mail.gmail.com>

On Wed, Mar 11, 2026 at 11:00 PM Xuneng Zhou <[email protected]> wrote:
>
> Hi,
>
> On Wed, Mar 11, 2026 at 10:23 AM Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 11, 2026 at 8:16 AM Andres Freund <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > On 2026-03-10 19:27:59 -0400, Andres Freund wrote:
> > > > > > pgstattuple_large          base= 12429.3ms  patch= 11916.8ms   1.04x
> > > > > > (  4.1%)  (reads=206945->12983, io_time=6501.91->32.24ms)
> > > > >
> > > > > > pgstattuple_large          base= 12642.9ms  patch= 11873.5ms   1.06x
> > > > > > (  6.1%)  (reads=206945->12983, io_time=6516.70->143.46ms)
> > > > >
> > > > > Yeah, this looks somewhat strange. The io_time has been reduced
> > > > > significantly, which should also lead to a substantial reduction in
> > > > > runtime.
> > > >
> > > > It's possible that the bottleneck just moved, e.g to the checksum computation,
> > > > if you have data checksums enabled.
> > > >
> > > > It's also worth noting that likely each of the test reps measures
> > > > something different, as likely
> > > >   psql_run "$ROOT" "$PORT" -c "UPDATE heap_test SET data = data || '!' WHERE id % 5 = 0;"
> > > >
> > > > leads to some out-of-page updates.
> > > >
> > > > You're probably better off deleting some of the data in a transaction that is
> > > > then rolled back. That will also unset all-visible, but won't otherwise change
> > > > the layout, no matter how many test iterations you run.
> > > >
> > > >
> > > > I'd also guess that you're seeing a relatively small win because you're
> > > > updating every page. When reading every page from disk, the OS can do
> > > > efficient readahead.  If there are only occasional misses, that does not work.
> > >
> > > I think that last one is a big part - if I use
> > >   BEGIN; DELETE FROM heap_test WHERE id % 500 = 0; ROLLBACK;
> > > (which leaves a lot of
> > >
> > > I see much bigger wins due to the pgstattuple changes.
> > >
> > >                        time buffered          time DIO
> > > w/o read stream        2222.078 ms            2090.239 ms
> > > w   read stream         299.455 ms             155.124 ms
> > >
> > > That's with local storage. io_uring, but numbers with worker are similar.
> > >
> >
> > The results look great and interesting. This looks far better than
> > what I observed in my earlier tests. I’ll run perf for pgstattuple
> > without the switching to see what is keeping the CPU busy.
> >
> > --
> > Best,
> > Xuneng
>
> io_uring
> pgstattuple_large          base=  1090.6ms  patch=   143.3ms   7.61x
> ( 86.9%)  (reads=20049→20049, io_time=1040.80→46.91ms)
>
> I observed a similar magnitude of runtime reduction after switching to
> pg_buffercache_evict_relation() and using BEGIN; DELETE FROM heap_test
> WHERE id % 500 = 0; ROLLBACK. However, I lost the original flame
> graphs after running many performance tests. I will regenerate them
> and post them later.

In the original setup, UPDATE ... WHERE id % 5 = 0 touches a large
fraction of heap pages. On touched pages, we clears PD_ALL_VISIBLE and
the corresponding visibility-map bit, and UPDATE also creates new
tuple versions, adding tuple-chain/page churn. Since
pgstattuple_approx skips only pages still marked all-visible, it ends
up reading most heap pages. With shared-buffer eviction and OS
page-cache drop before timing, many of those reads are cold misses, so
runtime is spent a lot in the buffer-miss path as seen from the
original flamegraph.

The rollback-delete setup changes this in two ways. First, BEGIN;
DELETE ... WHERE id % 500 = 0; ROLLBACK; still clears all-visible
state on touched pages, but does not leave persistent successor tuple
versions the way UPDATE does. Pages are still modified (tuple
headers/page flags), but physical churn is lower. Second, the
predicate is much sparser, so fewer pages lose all-visible status, and
pgstattuple_approx reads fewer heap pages.

A warmup pass further stabilizes results by setting hint bits for
aborted xmax once up front, so later repetitions avoid repeating part
of transaction-status resolution during visibility checks.

With fewer non-all-visible pages, fewer cold misses, less
eviction/victim churn, we can observe more pronounced speed-up in the
improved version of pgstattuple test.

-- 
Best,
Xuneng


Attachments:

  [image/svg+xml] patch_pgstattuple_original_medium.svg (215.2K, 2-patch_pgstattuple_original_medium.svg)
  download | view image

  [image/svg+xml] patch_pgstattuple_medium.svg (145.2K, 3-patch_pgstattuple_medium.svg)
  download | view image

view thread (36+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Streamify more code paths
  In-Reply-To: <CABPTF7XFkUM9jm3cvcrr7sPAdmCVVW+PP=GfwUt3N90CeX3u1Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox