public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Peter Geoghegan <[email protected]>
To: Thomas Munro <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Tue, 12 Aug 2025 23:52:17 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CAH2-WzmER9kc4OtmkDh+h51QV=v6Yc5BGsJikwJHtucf1C1HWw@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=0enySZ5g0k0BLY3tHRs=wyG=7yXDYP=Abt=6GM=7XkQ@mail.gmail.com>
	<[email protected]>
	<CAH2-WzkX2fwtiNOX4RrBR8=XKta999NM_5+ghTUnwUBkeyfcHQ@mail.gmail.com>
	<[email protected]>
	<CAH2-WzkPh+L2u8_4jG=NgGgzFNqW7ZZhSxGb6mJR=2YdouL1_Q@mail.gmail.com>
	<[email protected]>
	<xhe4l63oxs36sunvqbf3bzlzrondjyhumm7ywt4arkxu622f52@yriy7rejf5g2>
	<[email protected]>
	<CAH2-WzmdiO4fHA1O06SYUjgMQZG7haysY7Tu5DS5z-CHsv5MLQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=Y-PsC6_tZOPhHWvPx0geGnrh9VKjUZ-168ezUM_XM2Q@mail.gmail.com>
	<CA+hUKGKMaZLmNQHaa_DZMw9MJJKGegjrqnTY3KOZB-_nvFa3wQ@mail.gmail.com>
	<[email protected]>


On 8/12/25 23:22, Peter Geoghegan wrote:
> ...
>
> It looks like the patch does significantly better with the forwards scan,
> compared to the backwards scan (though both are improved by a lot).  But that's
> not the main thing about these results that I find interesting.
> 
> The really odd thing is that we get "shared hit=6619 read=49933" for the
> forwards scan, and "shared hit=10350 read=49933" for the backwards scan.  The
> latter matches master (regardless of the scan direction used on master), while
> the former just looks wrong.  What explains the "missing buffer hits" seen with
> the forwards scan?
> 
> Discrepancies
> -------------
> 
> All 4 query executions agree that "rows=1048576.00", so the patch doesn't appear
> to simply be broken/giving wrong answers.  Might it be that the "Buffers"
> instrumentation is broken?
> 

I think a bug in the prefetch patch is more likely. I tried with a patch
that adds various prefetch-related counters to explain, and I see this:


test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a
BETWEEN 16336 AND 49103 ORDER BY a;

                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan using idx on public.t (actual time=0.682..527.055
rows=1048576.00 loops=1)
   Output: a, b
   Index Cond: ((t.a >= 16336) AND (t.a <= 49103))
   Index Searches: 1
   Prefetch Distance: 271.263
   Prefetch Count: 60888
   Prefetch Stalls: 1
   Prefetch Skips: 991211
   Prefetch Resets: 3
   Prefetch Histogram: [2,4) => 2, [4,8) => 8, [8,16) => 17, [16,32) =>
24, [32,64) => 34, [64,128) => 52, [128,256) => 82, [256,512) => 60669
   Buffers: shared hit=5027 read=50872
   I/O Timings: shared read=33.528
 Planning:
   Buffers: shared hit=78 read=23
   I/O Timings: shared read=2.349
 Planning Time: 3.686 ms
 Execution Time: 559.659 ms
(17 rows)


test=# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) SELECT * FROM t WHERE a
BETWEEN 16336 AND 49103 ORDER BY a DESC;
                                QUERY PLAN
------------------------------------------------------------------------
 Index Scan Backward using idx on public.t (actual time=1.110..4116.201
rows=1048576.00 loops=1)
   Output: a, b
   Index Cond: ((t.a >= 16336) AND (t.a <= 49103))
   Index Searches: 1
   Prefetch Distance: 271.061
   Prefetch Count: 118806
   Prefetch Stalls: 1
   Prefetch Skips: 962515
   Prefetch Resets: 3
   Prefetch Histogram: [2,4) => 2, [4,8) => 7, [8,16) => 12, [16,32) =>
17, [32,64) => 24, [64,128) => 3, [128,256) => 4, [256,512) => 118737
   Buffers: shared hit=30024 read=50872
   I/O Timings: shared read=581.353
 Planning:
   Buffers: shared hit=82 read=23
   I/O Timings: shared read=3.168
 Planning Time: 4.289 ms
 Execution Time: 4185.407 ms
(17 rows)

These two parts are interesting:

   Prefetch Count: 60888
   Prefetch Skips: 991211

   Prefetch Count: 118806
   Prefetch Skips: 962515

It looks like the backwards scan skips fewer blocks. This is based on
the lastBlock optimization, i.e. looking for runs of the same block
number. I don't quite see why would it affect just the backwards scan,
though. Seems weird.

> The premise of my original complaint was that big inconsistencies in performance
> shouldn't happen between similar forwards and backwards scans (at least not with
> direct I/O).  I now have serious doubts about that premise, since it looks like
> OS readahead remains a big factor with direct I/O.  Did I just miss something
> obvious?
> 

I don't think you missed anything. It does seem the assumption relies on
the OS handling the underlying I/O patterns equally, and unfortunately
that does not seem to be the case. Maybe we could "invert" the data set,
i.e. make it "descending" instead of "ascending"? That would make the
heap access direction "forward" again ...

regards

-- 
Tomas Vondra






view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox