Message-ID: <f0fda1a3-23ef-4cbe-b6d5-48f0bd9432cb@enterprisedb.com>
Date: Thu, 15 Feb 2024 18:26:09 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: index prefetching
Content-Language: en-US
To: Peter Geoghegan <pg@bowt.ie>
Cc: Melanie Plageman <melanieplageman@gmail.com>,
 Robert Haas <robertmhaas@gmail.com>, Andres Freund <andres@anarazel.de>,
 PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
 Georgios <gkokolatos@protonmail.com>, Thomas Munro <thomas.munro@gmail.com>,
 Konstantin Knizhnik <knizhnik@garret.ru>, Dilip Kumar <dilipbalaut@gmail.com>
References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com>
 <CAAKRu_ZPDhNwwFxQwS8NdeTFkycM1c=tNLKdU0J-M6KxCjdEmQ@mail.gmail.com>
 <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com>
 <CAAKRu_ad8cywU4_X+5e4A9Wy_PZmrCx2aoLpdHPBRvb9inGDgQ@mail.gmail.com>
 <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com>
 <CAAKRu_Y=go4u99udRaj3k0vuF6EAfWqM86BONhxB4MV9X4FRpQ@mail.gmail.com>
 <CAAKRu_ZPuaO5XwXfM7wowf4sSmPSJ2LT9+zfmD5=LQ=WhV_j=Q@mail.gmail.com>
 <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com>
 <CAH2-WznsJqDgr_0yUwApgYXi3cRZQbimFkiYRqqXhpMcw4s8ZQ@mail.gmail.com>
 <c332fe83-c187-48ea-8520-458d4738f3a1@enterprisedb.com>
 <CAH2-WznBuxhvsEgX3mYDjxKhQk9GFdF46vMfE2ugU6SUekHp_A@mail.gmail.com>
 <CAAKRu_ZKp1BCT+V324jENxKTfsetxJwxh309rJGWxebSggPisw@mail.gmail.com>
 <CAH2-Wzkrej9cXjERrA5p8pgD9QfR0LZwCCcgPPu6wiRgFpYVQQ@mail.gmail.com>
 <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com>
 <CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com>
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
In-Reply-To: <CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Archived-At: <https://www.postgresql.org/message-id/f0fda1a3-23ef-4cbe-b6d5-48f0bd9432cb%40enterprisedb.com>
Precedence: bulk


On 2/15/24 17:42, Peter Geoghegan wrote:
> On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 2/15/24 00:06, Peter Geoghegan wrote:
>>> I suppose that it might be much more important than I imagine it is
>>> right now, but it'd be nice to have something a bit more concrete to
>>> go on.
>>>
>>
>> This probably depends on which corner cases are considered important.
>>
>> The page-at-a-time approach essentially means index items at the
>> beginning of the page won't get prefetched (or vice versa, prefetch
>> distance drops to 0 when we get to end of index page).
> 
> I don't think that's true. At least not for nbtree scans.
> 
> As I went into last year, you'd get the benefit of the work I've done
> on "boundary cases" (most recently in commit c9c0589f from just a
> couple of months back), which helps us get the most out of suffix
> truncation. This maximizes the chances of only having to scan a single
> index leaf page in many important cases. So I can see no reason why
> index items at the beginning of the page are at any particular
> disadvantage (compared to those from the middle or the end of the
> page).
> 

I may be missing something, but it seems fairly self-evident to me an
entry at the beginning of an index page won't get prefetched (assuming
the page-at-a-time thing).

If I understand your point about boundary cases / suffix truncation,
that helps us by (a) picking the split in a way to minimize a single key
spanning multiple pages, if possible and (b) increasing the number of
entries that fit onto a single index page.

That's certainly true / helpful, and it makes the "first entry" issue
much less common. But the issue is still there. Of course, this says
nothing about the importance of the issue - the impact may easily be so
small it's not worth worrying about.

> Where you might have a problem is cases where it's just inherently
> necessary to visit more than a single leaf page, despite the best
> efforts of the nbtsplitloc.c logic -- cases where the scan just
> inherently needs to return tuples that "straddle the boundary between
> two neighboring pages". That isn't a particularly natural restriction,
> but it's also not obvious that it's all that much of a disadvantage in
> practice.
> 

One case I've been thinking about is sorting using index, where we often
read large part of the index.

>> It certainly was a great improvement, no doubt about that. I dislike the
>> restriction, but that's partially for aesthetic reasons - it just seems
>> it'd be nice to not have this.
>>
>> That being said, I'd be OK with having this restriction if it makes v1
>> feasible. For me, the big question is whether it'd mean we're stuck with
>> this restriction forever, or whether there's a viable way to improve
>> this in v2.
> 
> I think that there is no question that this will need to not
> completely disable kill_prior_tuple -- I'd be surprised if one single
> person disagreed with me on this point. There is also a more nuanced
> way of describing this same restriction, but we don't necessarily need
> to agree on what exactly that is right now.
> 

Even for the page-at-a-time approach? Or are you talking about the v2?

>> And I don't have answer to that :-( I got completely lost in the ongoing
>> discussion about the locking implications (which I happily ignored while
>> working on the PoC patch), layering tensions and questions which part
>> should be "in control".
> 
> Honestly, I always thought that it made sense to do things on the
> index AM side. When you went the other way I was surprised. Perhaps I
> should have said more about that, sooner, but I'd already said quite a
> bit at that point, so...
> 
> Anyway, I think that it's pretty clear that "naive desynchronization"
> is just not acceptable, because that'll disable kill_prior_tuple
> altogether. So you're going to have to do this in a way that more or
> less preserves something like the current kill_prior_tuple behavior.
> It's going to have some downsides, but those can be managed. They can
> be managed from within the index AM itself, a bit like the
> _bt_killitems() no-pin stuff does things already.
> 
> Obviously this interpretation suggests that doing things at the index
> AM level is indeed the right way to go, layering-wise. Does it make
> sense to you, though?
> 

Yeah. The basic idea was that by moving this above index AM it will work
for all indexes automatically - but given the current discussion about
kill_prior_tuple, locking etc. I'm not sure that's really feasible.

The index AM clearly needs to have more control over this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company