Re: index prefetching - Andres Freund

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andres Freund <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Mon, 25 Aug 2025 19:48:17 -0400
Message-ID: <y32ow7vfdrrcchxrwgfr5tbwg3ncm4f3rlp65svcbqorbm3tmn@cwvipbplnqyt> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok>
	<CAH2-WzntgDeopLJpyEbUh23Qr1vgoYv5jbFkYsymTScEKxBj7A@mail.gmail.com>
	<CAH2-WzkaTHg2X9R-gLRNBEoL82t2mkrQq-3f=y3GAzrj40fFZw@mail.gmail.com>
	<kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7>
	<CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com>
	<CAH2-WzkgkvbN_GqR+pfE7uKwhWxQ6h4jst7Rpjgrt68Vc1=FDA@mail.gmail.com>
	<5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn>
	<6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev>
	<[email protected]>

Hi,

On 2025-08-25 15:00:39 +0200, Tomas Vondra wrote:
> Thanks. Based on the testing so far, the patch seems to be a substantial
> improvement. What's needed to make this prototype committable?

Mainly some testing infrastructure that can trigger this kind of stream. The
logic is too finnicky for me to commit it without that.


> I assume this is PG19+ improvement, right? It probably affects PG18 too,
> but it's harder to hit / the impact is not as bad as on PG19.

Yea. It does apply to 18 too, but I can't come up with realistic scenarios
where it's a real issue. I can repro a slowdown when using many parallel
seqscans with debug_io_direct=data - but that's even slower in 17...


> On a related note, my test that generates random datasets / queries, and
> compares index prefetching with different io_method values found a
> pretty massive difference between worker and io_uring. I wonder if this
> might be some issue in io_method=worker.

> while with index prefetching (with the aio prototype patch), it looks
> like this:
> 
>                                 QUERY PLAN
>   ----------------------------------------------------------------------
>    Index Scan using idx on t (actual rows=9048576.00 loops=1)
>      Index Cond: ((a >= 16150) AND (a <= 4540437))
>      Index Searches: 1
>      Prefetch Distance: 2.032
>      Prefetch Count: 868165
>      Prefetch Stalls: 2140228
>      Prefetch Skips: 6039906
>      Prefetch Resets: 0
>      Stream Ungets: 0
>      Stream Forwarded: 4
>      Prefetch Histogram: [2,4) => 855753, [4,8) => 12412
>      Buffers: shared hit=2577599 read=455610
>    Planning:
>      Buffers: shared hit=78 read=26 dirtied=1
>    Planning Time: 1.032 ms
>    Execution Time: 3150.578 ms
>   (16 rows)
> 
> So it's about 2x slower. The prefetch distance collapses, because
> there's a lot of cache hits (about 50% of requests seem to be hits of
> already visited blocks). I think that's a problem with how we adjust the
> distance, but I'll post about that separately.
> 
> Let's try to simply set io_method=io_uring:
> 
>                                 QUERY PLAN
>   ----------------------------------------------------------------------
>    Index Scan using idx on t  (actual rows=9048576.00 loops=1)
>      Index Cond: ((a >= 16150) AND (a <= 4540437))
>      Index Searches: 1
>      Prefetch Distance: 2.032
>      Prefetch Count: 868165
>      Prefetch Stalls: 2140228
>      Prefetch Skips: 6039906
>      Prefetch Resets: 0
>      Stream Ungets: 0
>      Stream Forwarded: 4
>      Prefetch Histogram: [2,4) => 855753, [4,8) => 12412
>      Buffers: shared hit=2577599 read=455610
>    Planning:
>      Buffers: shared hit=78 read=26
>    Planning Time: 2.212 ms
>    Execution Time: 1837.615 ms
>   (16 rows)
> 
> That's much closer to master (and the difference could be mostly noise).
> 
> I'm not sure what's causing this, but almost all regressions my script
> is finding look like this - always io_method=worker, with distance close
> to 2.0. Is this some inherent io_method=worker overhead?

I think what you might be observing might be the inherent IPC / latency
overhead of the worker based approach. This is particularly pronounced if the
workers are idle (and the CPU they get scheduled on is clocked down). The
latency impact of that is small, but if you never actually get to do much
readahead it can be visible.

Greetings,

Andres Freund

view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <y32ow7vfdrrcchxrwgfr5tbwg3ncm4f3rlp65svcbqorbm3tmn@cwvipbplnqyt>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox