public inbox for [email protected]
help / color / mirror / Atom feedFrom: Peter Geoghegan <[email protected]>
To: Andres Freund <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 14 Aug 2025 17:06:07 -0400
Message-ID: <CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com> (raw)
In-Reply-To: <kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7>
References: <CAH2-Wz=L7h-koDKa3_NEg39Faw7MrOkSVOsodvQ4toSQahvWjQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAH2-WzmuGzTH-62EWTgQ4F66XEBJtJk25psF4GDuAGqeC4a34g@mail.gmail.com>
<[email protected]>
<6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok>
<CAH2-WzntgDeopLJpyEbUh23Qr1vgoYv5jbFkYsymTScEKxBj7A@mail.gmail.com>
<CAH2-WzkaTHg2X9R-gLRNBEoL82t2mkrQq-3f=y3GAzrj40fFZw@mail.gmail.com>
<kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7>
On Thu, Aug 14, 2025 at 4:44 PM Andres Freund <[email protected]> wrote:
> Interesting. In the sequential case I see some waits that are not attributed
> in explain, due to the waits happening within WaitIO(), not WaitReadBuffers().
> Which indicates that the read stream is trying to re-read a buffer that
> previously started being read.
I *knew* that something had to be up here. Thanks for your help with debugging!
> read_stream_start_pending_read()
> -> StartReadBuffers()
> -> AsyncReadBuffers()
> -> ReadBuffersCanStartIO()
> -> StartBufferIO()
> -> WaitIO()
>
> There are far fewer cases of this in the random case.
Index tuples with TIDs that are slightly out of order are very normal.
Even for *perfectly* sequential inserts, the FSM tends to use the last
piece of free space on a heap page some time after the heap page
initially becomes "almost full". I recently described this to Tomas on
this thread [1].
> From what I can tell the sequential case so often will re-read a buffer that
> it is already in the process of reading - and thus wait for that IO before
> continuing - that we don't actually keep enough IO in flight.
Oops.
There is an existing stop-gap mechanism in the patch that is supposed
to deal with this problem. index_scan_stream_read_next, which is the
read stream callback, has logic that is supposed to suppress duplicate
block requests. But that's obviously not totally effective, since it
only remembers the very last heap block request.
If this same mechanism remembered (say) the last 2 heap blocks it
requested, that might be enough to totally fix this particular
problem. This isn't a serious proposal, but it'll be simple enough to
implement. Hopefully when I do that (which I plan to soon) it'll fully
validate your theory.
> We can optimize that by deferring the StartBufferIO() if we're encountering a
> buffer that is undergoing IO, at the cost of some complexity. I'm not sure
> real-world queries will often encounter the pattern of the same block being
> read in by a read stream multiple times in close proximity sufficiently often
> to make that worth it.
We definitely need to be prepared for duplicate prefetch requests in
the context of index scans. I'm far from sure how sophisticated that
actually needs to be. Obviously the design choices in this area are
far from settled right now.
[1] [email protected]
--
Peter Geoghegan
view thread (348+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: index prefetching
In-Reply-To: <CAH2-WzkWNtCRTcUajGYrCkp9-+btteAthg21BzxbKV09AJuSrA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox