public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter Geoghegan <[email protected]>
To: Andres Freund <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 15 Feb 2024 15:30:06 -0500
Message-ID: <CAH2-WzkToUXuqfktW3CPoKq3odtNChkFwQFWHARz=n-h_Zm2Kw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAH2-WznsJqDgr_0yUwApgYXi3cRZQbimFkiYRqqXhpMcw4s8ZQ@mail.gmail.com>
	<[email protected]>
	<CAH2-WznBuxhvsEgX3mYDjxKhQk9GFdF46vMfE2ugU6SUekHp_A@mail.gmail.com>
	<CAAKRu_ZKp1BCT+V324jENxKTfsetxJwxh309rJGWxebSggPisw@mail.gmail.com>
	<CAH2-Wzkrej9cXjERrA5p8pgD9QfR0LZwCCcgPPu6wiRgFpYVQQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com>
	<[email protected]>
	<CAH2-WzkpNN1+sovB8G=5dVwYW25=J6Qj4V9L7DzD26NTVQWM2w@mail.gmail.com>
	<[email protected]>

On Thu, Feb 15, 2024 at 3:13 PM Andres Freund <[email protected]> wrote:
> > This is why I don't think that the tuples with lower page offset
> > numbers are in any way significant here.  The significant part is
> > whether or not you'll actually need to visit more than one leaf page
> > in the first place (plus the penalty from not being able to reorder
> > the work across page boundaries in your initial v1 of prefetching).
>
> To me this your phrasing just seems to reformulate the issue.

What I said to Tomas seems very obvious to me. I think that there
might have been some kind of miscommunication (not a real
disagreement). I was just trying to work through that.

> In practical terms you'll have to wait for the full IO latency when fetching
> the table tuple corresponding to the first tid on a leaf page. Of course
> that's also the moment you had to visit another leaf page. Whether the stall
> is due to visit another leaf page or due to processing the first entry on such
> a leaf page is a distinction without a difference.

I don't think anybody said otherwise?

> > > That's certainly true / helpful, and it makes the "first entry" issue
> > > much less common. But the issue is still there. Of course, this says
> > > nothing about the importance of the issue - the impact may easily be so
> > > small it's not worth worrying about.
> >
> > Right. And I want to be clear: I'm really *not* sure how much it
> > matters. I just doubt that it's worth worrying about in v1 -- time
> > grows short. Although I agree that we should commit a v1 that leaves
> > the door open to improving matters in this area in v2.
>
> I somewhat doubt that it's realistic to aim for 17 at this point.

That's a fair point. Tomas?

> We seem to
> still be doing fairly fundamental architectual work. I think it might be the
> right thing even for 18 to go for the simpler only-a-single-leaf-page
> approach though.

I definitely think it's a good idea to have that as a fall back
option. And to not commit ourselves to having something better than
that for v1 (though we probably should commit to making that possible
in v2).

> I wonder if there are prerequisites that can be tackled for 17. One idea is to
> work on infrastructure to provide executor nodes with information about the
> number of tuples likely to be fetched - I suspect we'll trigger regressions
> without that in place.

I don't think that there'll be regressions if we just take the simpler
only-a-single-leaf-page approach. At least it seems much less likely.

> One way to *sometimes* process more than a single leaf page, without having to
> redesign kill_prior_tuple, would be to use the visibilitymap to check if the
> target pages are all-visible. If all the table pages on a leaf page are
> all-visible, we know that we don't need to kill index entries, and thus can
> move on to the next leaf page

It's possible that we'll need a variety of different strategies.
nbtree already has two such strategies in _bt_killitems(), in a way.
Though its "Modified while not pinned means hinting is not safe" path
(LSN doesn't match canary value path) seems pretty naive. The
prefetching stuff might present us with a good opportunity to replace
that with something fundamentally better.

-- 
Peter Geoghegan






view thread (8+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CAH2-WzkToUXuqfktW3CPoKq3odtNChkFwQFWHARz=n-h_Zm2Kw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox