public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tomas Vondra <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 15 Feb 2024 18:26:09 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com>
References: <[email protected]>
<CAAKRu_ZPDhNwwFxQwS8NdeTFkycM1c=tNLKdU0J-M6KxCjdEmQ@mail.gmail.com>
<[email protected]>
<CAAKRu_ad8cywU4_X+5e4A9Wy_PZmrCx2aoLpdHPBRvb9inGDgQ@mail.gmail.com>
<[email protected]>
<CAAKRu_Y=go4u99udRaj3k0vuF6EAfWqM86BONhxB4MV9X4FRpQ@mail.gmail.com>
<CAAKRu_ZPuaO5XwXfM7wowf4sSmPSJ2LT9+zfmD5=LQ=WhV_j=Q@mail.gmail.com>
<[email protected]>
<CAH2-WznsJqDgr_0yUwApgYXi3cRZQbimFkiYRqqXhpMcw4s8ZQ@mail.gmail.com>
<[email protected]>
<CAH2-WznBuxhvsEgX3mYDjxKhQk9GFdF46vMfE2ugU6SUekHp_A@mail.gmail.com>
<CAAKRu_ZKp1BCT+V324jENxKTfsetxJwxh309rJGWxebSggPisw@mail.gmail.com>
<CAH2-Wzkrej9cXjERrA5p8pgD9QfR0LZwCCcgPPu6wiRgFpYVQQ@mail.gmail.com>
<[email protected]>
<CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com>
On 2/15/24 17:42, Peter Geoghegan wrote:
> On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra
> <[email protected]> wrote:
>> On 2/15/24 00:06, Peter Geoghegan wrote:
>>> I suppose that it might be much more important than I imagine it is
>>> right now, but it'd be nice to have something a bit more concrete to
>>> go on.
>>>
>>
>> This probably depends on which corner cases are considered important.
>>
>> The page-at-a-time approach essentially means index items at the
>> beginning of the page won't get prefetched (or vice versa, prefetch
>> distance drops to 0 when we get to end of index page).
>
> I don't think that's true. At least not for nbtree scans.
>
> As I went into last year, you'd get the benefit of the work I've done
> on "boundary cases" (most recently in commit c9c0589f from just a
> couple of months back), which helps us get the most out of suffix
> truncation. This maximizes the chances of only having to scan a single
> index leaf page in many important cases. So I can see no reason why
> index items at the beginning of the page are at any particular
> disadvantage (compared to those from the middle or the end of the
> page).
>
I may be missing something, but it seems fairly self-evident to me an
entry at the beginning of an index page won't get prefetched (assuming
the page-at-a-time thing).
If I understand your point about boundary cases / suffix truncation,
that helps us by (a) picking the split in a way to minimize a single key
spanning multiple pages, if possible and (b) increasing the number of
entries that fit onto a single index page.
That's certainly true / helpful, and it makes the "first entry" issue
much less common. But the issue is still there. Of course, this says
nothing about the importance of the issue - the impact may easily be so
small it's not worth worrying about.
> Where you might have a problem is cases where it's just inherently
> necessary to visit more than a single leaf page, despite the best
> efforts of the nbtsplitloc.c logic -- cases where the scan just
> inherently needs to return tuples that "straddle the boundary between
> two neighboring pages". That isn't a particularly natural restriction,
> but it's also not obvious that it's all that much of a disadvantage in
> practice.
>
One case I've been thinking about is sorting using index, where we often
read large part of the index.
>> It certainly was a great improvement, no doubt about that. I dislike the
>> restriction, but that's partially for aesthetic reasons - it just seems
>> it'd be nice to not have this.
>>
>> That being said, I'd be OK with having this restriction if it makes v1
>> feasible. For me, the big question is whether it'd mean we're stuck with
>> this restriction forever, or whether there's a viable way to improve
>> this in v2.
>
> I think that there is no question that this will need to not
> completely disable kill_prior_tuple -- I'd be surprised if one single
> person disagreed with me on this point. There is also a more nuanced
> way of describing this same restriction, but we don't necessarily need
> to agree on what exactly that is right now.
>
Even for the page-at-a-time approach? Or are you talking about the v2?
>> And I don't have answer to that :-( I got completely lost in the ongoing
>> discussion about the locking implications (which I happily ignored while
>> working on the PoC patch), layering tensions and questions which part
>> should be "in control".
>
> Honestly, I always thought that it made sense to do things on the
> index AM side. When you went the other way I was surprised. Perhaps I
> should have said more about that, sooner, but I'd already said quite a
> bit at that point, so...
>
> Anyway, I think that it's pretty clear that "naive desynchronization"
> is just not acceptable, because that'll disable kill_prior_tuple
> altogether. So you're going to have to do this in a way that more or
> less preserves something like the current kill_prior_tuple behavior.
> It's going to have some downsides, but those can be managed. They can
> be managed from within the index AM itself, a bit like the
> _bt_killitems() no-pin stuff does things already.
>
> Obviously this interpretation suggests that doing things at the index
> AM level is indeed the right way to go, layering-wise. Does it make
> sense to you, though?
>
Yeah. The basic idea was that by moving this above index AM it will work
for all indexes automatically - but given the current discussion about
kill_prior_tuple, locking etc. I'm not sure that's really feasible.
The index AM clearly needs to have more control over this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
view thread (8+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: index prefetching
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox