public inbox for [email protected]
help / color / mirror / Atom feedFrom: Andres Freund <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Fri, 15 Aug 2025 11:09:41 -0400
Message-ID: <6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev> (raw)
In-Reply-To: <5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn>
References: <[email protected]>
<CAH2-WzmuGzTH-62EWTgQ4F66XEBJtJk25psF4GDuAGqeC4a34g@mail.gmail.com>
<[email protected]>
<6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok>
<CAH2-WzntgDeopLJpyEbUh23Qr1vgoYv5jbFkYsymTScEKxBj7A@mail.gmail.com>
<CAH2-WzkaTHg2X9R-gLRNBEoL82t2mkrQq-3f=y3GAzrj40fFZw@mail.gmail.com>
<kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7>
<CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com>
<CAH2-WzkgkvbN_GqR+pfE7uKwhWxQ6h4jst7Rpjgrt68Vc1=FDA@mail.gmail.com>
<5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn>
Hi,
On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan <[email protected]> wrote:
> > > > We can optimize that by deferring the StartBufferIO() if we're encountering a
> > > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure
> > > > real-world queries will often encounter the pattern of the same block being
> > > > read in by a read stream multiple times in close proximity sufficiently often
> > > > to make that worth it.
> > >
> > > We definitely need to be prepared for duplicate prefetch requests in
> > > the context of index scans.
> >
> > Can you (or anybody else) think of a quick and dirty way of working
> > around the problem on the read stream side? I would like to prioritize
> > getting the patch into a state where its overall performance profile
> > "feels right". From there we can iterate on fixing the underlying
> > issues in more principled ways.
>
> I think I can see a way to fix the issue, below read stream. Basically,
> whenever AsyncReadBuffers() finds a buffer that has ongoing IO, instead of
> waiting, as we do today, copy the wref to the ReadBuffersOperation() and set a
> new flag indicating that we are waiting for an IO that was not started by the
> wref. Then, in WaitReadBuffers(), we wait for such foreign started IOs. That
> has to be somewhat different code from today, because we have to deal with the
> fact of the "foreign" IO potentially having failed.
>
> I'll try writing a prototype for that tomorrow. I think to actually get that
> into a committable shape we need a test harness (probably a read stream
> controlled by an SQL function that gets an array of buffers).
Attached is a prototype of this approach. It does seem to fix this issue.
New code disabled:
#### backwards sequential table ####
┌──────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=10291 read=49933 │
│ I/O Timings: shared read=213.277 │
│ Planning: │
│ Buffers: shared hit=91 read=19 │
│ I/O Timings: shared read=2.124 │
│ Planning Time: 3.269 ms │
│ Execution Time: 1023.279 ms │
└──────────────────────────────────────────────────────────────────────┘
(10 rows)
New code enabled:
#### backwards sequential table ####
┌──────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=10291 read=49933 │
│ I/O Timings: shared read=217.225 │
│ Planning: │
│ Buffers: shared hit=91 read=19 │
│ I/O Timings: shared read=2.009 │
│ Planning Time: 2.685 ms │
│ Execution Time: 602.987 ms │
└──────────────────────────────────────────────────────────────────────┘
(10 rows)
With the change enabled, the sequential query is faster than the random query:
#### backwards random table ####
┌────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_randomized_pk on t_randomized (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=6085 read=77813 │
│ I/O Timings: shared read=347.285 │
│ Planning: │
│ Buffers: shared hit=127 read=5 │
│ I/O Timings: shared read=1.001 │
│ Planning Time: 1.751 ms │
│ Execution Time: 820.544 ms │
└────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)
Greetings,
Andres Freund
view thread (348+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: index prefetching
In-Reply-To: <6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox