public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter Geoghegan <[email protected]>
To: Tomas Vondra <[email protected]>
To: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 14 Aug 2025 12:56:21 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAH2-WzmdiO4fHA1O06SYUjgMQZG7haysY7Tu5DS5z-CHsv5MLQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=Y-PsC6_tZOPhHWvPx0geGnrh9VKjUZ-168ezUM_XM2Q@mail.gmail.com>
	<CA+hUKGKMaZLmNQHaa_DZMw9MJJKGegjrqnTY3KOZB-_nvFa3wQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAH2-Wzko86NwiENCJGtakJ=fOhWpr-Yz-F+1oxgv2Ku1mvXwvA@mail.gmail.com>
	<[email protected]>
	<c7a77pcyc5eynme376wvyojryijtlieyxsu3bvxp4eiy6au6uf@caniulyi4jr5>
	<CAH2-Wz=L7h-koDKa3_NEg39Faw7MrOkSVOsodvQ4toSQahvWjQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote:
> On 8/14/25 01:50, Peter Geoghegan wrote:
>> I first made the order of the table random, except among groups of index tuples
>> that have exactly the same value. Those will still point to the same 1 or 2 heap
>> blocks in virtually all cases, so we have "heap clustering without any heap
>> correlation" in the newly rewritten table.  To set things up this way, I first
>> made another index, and then clustered the table using that new index:

> Interesting. It's really surprising random I/O beats the sequential.

It should be noted that the effect seems to be limited to io_method=io_uring.
I find that with io_method=worker, the execution time of the original
"sequential heap access" backwards scan is very similar to the execution time
of the variant with the index that exhibits "heap clustering without any heap
correlation" (the variant where individual heap blocks appear in random order).

Benchmark that includes both io_uring and worker
================================================

I performed the usual procedure of prewarming the index and evicting the heap
relation, and then actually running the relevant query through EXPLAIN
ANALYZE. Direct I/O was used throughout.

io_method=worker
----------------

Original backwards scan: 1498.024 ms (shared read=48.080)
"No heap correlation" backwards scan: 1483.348 ms (shared read=22.036)

Original forwards scan: 656.884 ms (shared read=19.904)
"No heap correlation" forwards scan: 578.076 ms (shared read=10.159)

io_method=io_uring
------------------

Original backwards scan: 1052.807 ms (shared read=187.876)
"No heap correlation" backwards scan: 649.473 ms (shared read=365.802)

Original forwards scan: 593.126 ms (shared read=55.837)
"No heap correlation" forwards scan: 429.888 ms (shared read=188.619)

Summary
-------

As of this morning, io_method=io_uring also shows that the forwards scan is
faster with random heap accesses than without (not just the backwards scan).
I double-checked, to make sure that the effect was real; it seems to be.
I'm aware that some of these numbers (those for the original/sequential
forward scan case) don't match what I reported on Tuesday. I believe that
this is due to changes I made to my SSD's readahead using blockdev, though
it's possible that there's some other explanation. (In case it matters, I'm
running Debian unstable with liburing2 "2.9-1".)

The important point remains: at least with io_uring, the backwards scan query
is much faster with random I/O than it is with descending sequential I/O. It
might make sense if they were at least at parity, but clearly they're not.

-- 
Peter Geoghegan





view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox