Re: AIO / read stream heuristics adjustments for index prefetching

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Melanie Plageman <[email protected]>
To: Andres Freund <[email protected]>
Cc: [email protected]
Cc: Thomas Munro <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Subject: Re: AIO / read stream heuristics adjustments for index prefetching
Date: Thu, 2 Apr 2026 10:31:50 -0400
Message-ID: <CAAKRu_bfwBzg7=Zy88st6gBJf97Wkd3k=+m1ecApn=59SwmKSw@mail.gmail.com> (raw)
In-Reply-To: <f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu>
References: <f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu>

On Tue, Mar 31, 2026 at 12:02 PM Andres Freund <[email protected]> wrote:
>
> 0005+0006:  Only increase distance when waiting for IO
>
>     Until now we have increased the read ahead distance whenever there we
>     needed to do IO (doubling the distance every miss). But that will often be
>     way too aggressive, with the IO subsystem being able to keep up with a
>     much lower distance.
>
>     The idea here is to use information about whether we needed to wait for IO
>     before returning the buffer in read_stream_next_buffer() to control
>     whether we should increase the readahead distance.
>
>     This seems to work extremely well for worker.
>
>     Unfortuntely with io_uring the situation is more complicated, because
>     io_uring performs reads synchronously during submission if the data is the
>     kernel page cache.  This can reduce performance substantially compared to
>     worker, because it prevents parallelizing the copy from the page cache.
>     There is an existing heuristic for that in method_io_uring.c that adds a
>     flag to the IO submissions forcing the IO to be processed asynchronously,
>     allowing for parallelism.  Unfortunately the heuristic is triggered by the
>     number of IOs in flight - which will never become big enough to tgrigger
>     after using "needed to wait" to control how far to read ahead.

On some level, relying on worker mode overhead feels fragile. If
worker overhead decreases—say, by moving to IO worker threads—we won't
be able to rely on this to keep the distance to an advantageous level.

If io_uring async copying is advantageous even when the consumer never
needs to wait, then it seems like parallelizing copying to/from the
kernel buffer cache will always be advantageous to do at some level.

The case where it is not (as you've stated before) is when the
consumer doesn't need the extra blocks, so it is just wasted time
spent acquiring them.

So, it feels odd to try and find a heuristic that allows the readahead
distance to increase even when the consumer is not having to wait. I'm
not saying we should do this for this release, but I'm just wondering
if in the medium term, we should try to find a better way to identify
the situation where async processing is not beneficial because the
blocks won't be needed.

>     So 0005 expands the io_uring heuristic to also trigger based on the sizes
>     of IOs - but that's decidedly not perfect, we e.g. have some experiments
>     showing it regressing some parallel bitmap heap scan cases.  It may be
>     better to somehow tweak the logic to only trigger for worker.
>
>     As is this has another issue, which is that it prevents IO combining in
>     situations where it shouldn't, because right now using the distance to
>     control both. See 0008 for an attempt at splitting those concerns.

Yea, I think running ahead far enough to get bigger IOs needs to
happen and can't be based on the consumer having to wait.

- Melanie

view thread (23+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: AIO / read stream heuristics adjustments for index prefetching
  In-Reply-To: <CAAKRu_bfwBzg7=Zy88st6gBJf97Wkd3k=+m1ecApn=59SwmKSw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox