MIME-Version: 1.0
References: <f3xxfrkafjxpyqxywcxricxgyizjirfceychyxsgn7bwjp5eda@kwbduhy7tfmu>
 <CAAKRu_bfwBzg7=Zy88st6gBJf97Wkd3k=+m1ecApn=59SwmKSw@mail.gmail.com>
 <pj4kgtdrevvkfbmlri6p27belctxru7ytyprcb6v74c7zbh3l6@m4dcu2rljedv>
 <CAAKRu_b+tymZQc2vv__k8oPHsVQppg9QmaeQLiupZ3t2AO=z3A@mail.gmail.com>
 <dj6njk2kfy4ztfduptrk5olv43j2zhebhbz62ip2y7gyeddirm@s2ywu64xxftw>
In-Reply-To: <dj6njk2kfy4ztfduptrk5olv43j2zhebhbz62ip2y7gyeddirm@s2ywu64xxftw>
From: Melanie Plageman <melanieplageman@gmail.com>
Date: Fri, 3 Apr 2026 15:04:51 -0400
Message-ID: 
 <CAAKRu_b24CSa1ja8yEu+5PnTpWcpL0EFdRL1TwQye7VffvYjJA@mail.gmail.com>
Subject: Re: AIO / read stream heuristics adjustments for index prefetching
To: Andres Freund <andres@anarazel.de>
Cc: pgsql-hackers@postgresql.org, Thomas Munro <thomas.munro@gmail.com>,
	Peter Geoghegan <pg@bowt.ie>, Tomas Vondra <tv@fuzzy.cz>,
 Nazir Bilal Yavuz <byavuz81@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAAKRu_b24CSa1ja8yEu%2B5PnTpWcpL0EFdRL1TwQye7VffvYjJA%40mail.gmail.com>
Precedence: bulk

On Fri, Apr 3, 2026 at 1:30=E2=80=AFPM Andres Freund <andres@anarazel.de> w=
rote:
>
> > - why not remove the combine_distance requirement from the fast path
> > entry criteria (could save resume_combine_distance in the fast path
> > and restore it after a miss)
>
> Because entering the fast path prevents IO combining from happening. So i=
t's
> absolutely crucial that we do *not* enter it while we would combine.

But if it is a buffer hit, obviously we can't do IO combining anyway,
or am I misunderstanding the fast path's common case?

> > > In my experiments the batching was primarily useful to allow to reduc=
e the
> > > command submission & interrupt overhead when interacting with storage=
, i.e.
> > > when actually doing IO, not just copying from the page cache.
> >
> > It still seems to me that batching would help guarantee parallel
> > memcpys for workers when data is in the kernel buffer cache because if
> > the IO is quick, the same worker may pick up the next IO.
>
> When you say workers, do you mean for io_method=3Dworker? Or the internal=
 kernel
> threads of io_uring for async IOs?

I'm talking about io_method worker.

> > You mentioned that we don't want to read too far ahead (including for
> > a single combined IO) in part because:
> >
> > > The resowner and private refcount mechanisms take more CPU cycles if =
you have
> > > more buffers pinned
> >
> > But I don't see how either distance is responding to this or
> > self-calibrating with this in mind
>
> Using the minimal required distance to avoid needing to wait for IO compl=
etion
> is responding to that, no?  Without these patches we read ahead as far as
> possible, even if all the data is in the page cache, which makes this iss=
ue
> way worse (without these patches it's a major source of regressions in th=
e
> index prefetching patch).

But we aren't using the minimal distance to avoid needing to wait for
IO completion. We are also using a higher distance to try and get IO
combining and  toallow for async copying into the kernel buffer cache,
etc, etc. There's a lot of different considerations; it isn't just two
opposing forces. And, I'd imagine that the relationship between the
number of buffers pinned and CPU cycles required for resowner/refcount
isn't perfectly linear.

> > I think it is weird to have combine_distance only be relevant when
> > readahead_distance is low. You said:
> >
> > > We could, but I don't think there would be a benefit in doing so. In =
my mind,
> > > what combine_distance is used for, is to control how far to look ahea=
d to
> > > allow for IO combining when the readahead_distance is too low to allo=
w for IO
> > > combining.  But if pinned_buffers > 0, we already have another IO in =
flight,
> > > so the combine_distance mechanism doesn't have to kick in.
> >
> > But it seems like for a completely uncached workload bigger IOs is
> > still beneficial.
>
> Massively so - the storage layer getting too small IOs really hurts.
>
> But, as the code stands, we *do* end up with large IOs in that case, beca=
use
> we will not issue the IO until it is "complete". If we need to do actual =
IO,
> the readahead_distance will be larger, and allow multiple full sized IOs =
to be
> issued.
>
> /*
>  * We don't start the pending read just because we've hit the distance li=
mit,
>  * preferring to give it another chance to grow to full io_combine_limit =
size
>  * once more buffers have been consumed.  But this is not desirable in al=
l
>  * situations - see below.
>  */
> static inline bool
> read_stream_should_issue_now(ReadStream *stream)
> {
> ...
>         /*
>          * If we've already reached io_combine_limit, there's no chance o=
f growing
>          * the read further.
>          */
>         if (pending_read_nblocks >=3D stream->io_combine_limit)
>                 return true;
>
>         /* same if capped not by io_combine_limit but combine_distance */
>         if (stream->combine_distance > 0 &&
>                 pending_read_nblocks >=3D stream->combine_distance)
>                 return true;
>
>         /*
>          * If we currently have no reads in flight or prepared, issue the=
 IO once
>          * we are not looking ahead further. This ensures there's always =
at least
>          * one IO prepared.
>          */
>         if (stream->pinned_buffers =3D=3D 0 &&
>                 !read_stream_should_look_ahead(stream))
>                 return true;
>
>         return false;
>
> So, unless we are not reading ahead, we are not issuing IOs until they ar=
e
> fully sized (or can't be combined, obviously).
>
> Am I misunderstanding what you're talking about?

I'm not saying that we don't do IO combining at high distances, I'm
more saying that it is confusing that combine_distance controls how
far we look ahead when readahead_distance is low but when
readahead_distance is high, it controls when we issue the IO and not
how far we look ahead. I don't think we should change course now, but
I wanted to call out that this felt a little uncomfortable to me.

- Melanie