public inbox for [email protected]
help / color / mirror / Atom feedFrom: Thomas Munro <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Sun, 20 Jul 2025 01:07:16 +1200
Message-ID: <CA+hUKG+WWr4-8TYemyU=ucQsNe6bUBN_Sq3mCnBoBtxaJ9w3ug@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAH2-Wzn7vqmt=qE_hDrOx4NETkUoCbdn74G1gswMXi1APUuYrA@mail.gmail.com>
<CAH2-WznFwgU3AddTqnvJABX5xo-9upG6NiX+2s0eaFhFj6tRAg@mail.gmail.com>
<CA+Tgmobav+-oR9-jJUGbHj3j7bhwPpz7qVkfr_9zUSF-kens9A@mail.gmail.com>
<CAH2-WzkqnVGLEQ31W1vm8T_uzy-ma-6A8QL-C56=0QUqs12b=Q@mail.gmail.com>
<CAH2-WznmrgwFShyKuKjF2v7M_Eid6VzKd+SRPjh4-y68T6uCDw@mail.gmail.com>
<esftck6ayqkkdtzijd736oazhve577sp7hthnwouyg2stlwlqj@rmhohbhl7tuz>
<[email protected]>
<[email protected]>
<[email protected]>
<CAH2-WzmVvU2KmKyq8sUeYXrc_roA5LfOgDE1-vdtorpk_M3DfA@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAH2-Wzm-u6b4gDbLNP=1pkfqJbEyPyey9M-8wG0C+QOTit963Q@mail.gmail.com>
<[email protected]>
<[email protected]>
<CA+hUKG+P0RnG_c4vL6=d8ACwpmux5ZF91FO4UpX8PDu1WqEg9w@mail.gmail.com>
<[email protected]>
On Sat, Jul 19, 2025 at 11:23 PM Tomas Vondra <[email protected]> wrote:
> Thanks for the link. It seems I came up with an almost the same patch,
> with three minor differences:
>
> 1) There's another place that sets "distance = 0" in
> read_stream_next_buffer, so maybe this should preserve the distance too?
>
> 2) I suspect we need to preserve the distance at the beginning of
> read_stream_reset, like
>
> stream->reset_distance = Max(stream->reset_distance,
> stream->distance);
>
> because what if you call _reset before reaching the end of the stream?
>
> 3) Shouldn't it reset the reset_distance to 0 after restoring it?
Probably. Hmm... an earlier version of this code didn't use distance
== 0 to indicate end-of-stream, but instead had a separate internal
end_of_stream flag. If we brought that back and didn't clobber
distance, we wouldn't need this save-and-restore dance. It seemed
shorter and sweeter without it back then, before _reset() existed in
its present form, but I wonder if end_of_stream would be nicer than
having to add this kind of stuff, without measurable downsides.
> > There was also some discussion at the time about whether "reset so I
> > can rescan", and "reset so I can continue after a temporary stop"
> > should be different operations requiring different APIs. It now seems
> > like one operation is sufficient, but it should preserve the distance
> > as you showed and then let the algorithm learn about already-cached
> > data in the rescan case (if it is even true then, which is also
> > debatable since it depends on the size of the scan). So, I think we
> > should just go ahead and commit a patch like that.
>
> Not sure. To me it seems more like two distinct cases, but I'm not sure
> if it requires two distinct "operations" with distinct API. Perhaps a
> simple flag for the _reset() would be enough? It'd need to track the
> distance anyway, just in case.
>
> Consider for example a nested loop, which does a rescan every time the
> outer row changes. Is there a reason to believe the outer rows will need
> the same number of inner rows? Aren't those "distinct streams"? Maybe
> I'm thinking about this wrong, of course.
Good question. Yeah, your flag idea seems like a good way to avoid
baking opinion into this level. I wonder if it should be a bitmask
rather than a boolean, in case we think of more things that need to be
included or not when resetting.
> The thing that however concerns me is that what I observed was not the
> distance getting reset to 1, and then ramping up. Which should happen
> pretty quickly, thanks to the doubling. In my experiments it *never*
> ramped up again, it stayed at 1. I still don't quite understand why.
Huh. Will look into that on Monday.
view thread (348+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: index prefetching
In-Reply-To: <CA+hUKG+WWr4-8TYemyU=ucQsNe6bUBN_Sq3mCnBoBtxaJ9w3ug@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox