Re: index prefetching - Tomas Vondra

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tomas Vondra <[email protected]>
To: Melanie Plageman <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Subject: Re: index prefetching
Date: Tue, 23 Jan 2024 18:43:25 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAKRu_ZPDhNwwFxQwS8NdeTFkycM1c=tNLKdU0J-M6KxCjdEmQ@mail.gmail.com>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CA+Tgmoaf0EHzcfWRydntzqhfc1Nv96C-60fEBbdf1pYZmT0SLA@mail.gmail.com>
	<[email protected]>
	<CA+Tgmob5Nh7sQO8GENzEDSjYJQuKCgfhWtwDLjMtdueFMCh05A@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CA+TgmoacmxNQBZsgMPSn9b7oE6QQLOqr_EP=YZTzdfHrqa5dqw@mail.gmail.com>
	<[email protected]>
	<CAAKRu_ZPDhNwwFxQwS8NdeTFkycM1c=tNLKdU0J-M6KxCjdEmQ@mail.gmail.com>

On 1/19/24 22:43, Melanie Plageman wrote:
> On Fri, Jan 12, 2024 at 11:42 AM Tomas Vondra
> <[email protected]> wrote:
>>
>> On 1/9/24 21:31, Robert Haas wrote:
>>> On Thu, Jan 4, 2024 at 9:55 AM Tomas Vondra
>>> <[email protected]> wrote:
>>>> Here's a somewhat reworked version of the patch. My initial goal was to
>>>> see if it could adopt the StreamingRead API proposed in [1], but that
>>>> turned out to be less straight-forward than I hoped, for two reasons:
>>>
>>> I guess we need Thomas or Andres or maybe Melanie to comment on this.
>>>
>>
>> Yeah. Or maybe Thomas if he has thoughts on how to combine this with the
>> streaming I/O stuff.
> 
> I've been studying your patch with the intent of finding a way to
> change it and or the streaming read API to work together. I've
> attached a very rough sketch of how I think it could work.
> 

Thanks.

> We fill a queue with blocks from TIDs that we fetched from the index.
> The queue is saved in a scan descriptor that is made available to the
> streaming read callback. Once the queue is full, we invoke the table
> AM specific index_fetch_tuple() function which calls
> pg_streaming_read_buffer_get_next(). When the streaming read API
> invokes the callback we registered, it simply dequeues a block number
> for prefetching.

So in a way there are two queues in IndexFetchTableData. One (blk_queue)
is being filled from IndexNext, and then the queue in StreamingRead.

> The only change to the streaming read API is that now, even if the
> callback returns InvalidBlockNumber, we may not be finished, so make
> it resumable.
> 

Hmm, not sure when can the callback return InvalidBlockNumber before
reaching the end. Perhaps for the first index_fetch_heap call? Any
reason not to fill the blk_queue before calling index_fetch_heap?


> Structurally, this changes the timing of when the heap blocks are
> prefetched. Your code would get a tid from the index and then prefetch
> the heap block -- doing this until it filled a queue that had the
> actual tids saved in it. With my approach and the streaming read API,
> you fetch tids from the index until you've filled up a queue of block
> numbers. Then the streaming read API will prefetch those heap blocks.
> 

And is that a good/desirable change? I'm not saying it's not, but maybe
we should not be filling either queue in one go - we don't want to
overload the prefetching.

> I didn't actually implement the block queue -- I just saved a single
> block number and pretended it was a block queue. I was imagining we
> replace this with something like your IndexPrefetch->blockItems --
> which has light deduplication. We'd probably have to flesh it out more
> than that.
> 

I don't understand how this passes the TID to the index_fetch_heap.
Isn't it working only by accident, due to blk_queue only having a single
entry? Shouldn't the first queue (blk_queue) store TIDs instead?

> There are also table AM layering violations in my sketch which would
> have to be worked out (not to mention some resource leakage I didn't
> bother investigating [which causes it to fail tests]).
> 
> 0001 is all of Thomas' streaming read API code that isn't yet in
> master and 0002 is my rough sketch of index prefetching using the
> streaming read API
> 
> There are also numerous optimizations that your index prefetching
> patch set does that would need to be added in some way. I haven't
> thought much about it yet. I wanted to see what you thought of this
> approach first. Basically, is it workable?
> 

It seems workable, yes. I'm not sure it's much simpler than my patch
(considering a lot of the code is in the optimizations, which are
missing from this patch).

I think the question is where should the optimizations happen. I suppose
some of them might/should happen in the StreamingRead API itself - like
the detection of sequential patterns, recently prefetched blocks, ...

But I'm not sure what to do about optimizations that are more specific
to the access path. Consider for example the index-only scans. We don't
want to prefetch all the pages, we need to inspect the VM and prefetch
just the not-all-visible ones. And then pass the info to the index scan,
so that it does not need to check the VM again. It's not clear to me how
to do this with this approach.


The main


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

view thread (349+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox