Re: index prefetching - Alexandre Felipe

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alexandre Felipe <[email protected]>
To: Andres Freund <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Thu, 5 Mar 2026 13:47:10 +0000
Message-ID: <CAE8JnxNPK1T79gpS3PTPbFJB7W9rCLMwCZvJfYPR_mAZ-xibZw@mail.gmail.com> (raw)
In-Reply-To: <s5p7iou7pdhxhvmv4rohmskwqmr36dc4rybvwlep5yvwrjs4pa@6oxsemms5mw4>
References: <issqornf6vdn3vb64fjuoathypmu3e5pgputd3lpfuvoeqyvzr@qfordnhplp2v>
	<CAE8JnxOn4+xUAnce+M7LfZWOqfrMMxasMaEmSKwiKbQtZr65uA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAE8JnxPtia9m8y7b5s+gOMjZ_3QP=pTo+A6p_HmtrAV4PMo3ZQ@mail.gmail.com>
	<[email protected]>
	<CAE8JnxOJ48NU3rwW+gS67NUDKgxDS5pKNUywbUBGCBJkgUf+Hg@mail.gmail.com>
	<[email protected]>
	<3cbwjhwkomjv7jifau4yhb357gfnckut3sdrlbmhwzesd3kngj@affs2mpxg4gh>
	<CAE8JnxOQG=m_6-v-M_Nude5KRrOzAcbi2QPhySCJQ+e771BQHA@mail.gmail.com>
	<s5p7iou7pdhxhvmv4rohmskwqmr36dc4rybvwlep5yvwrjs4pa@6oxsemms5mw4>

Thank you Andres,

I see, It combines an array that is fast for few buffers and a hash that in
theory
scales well for very large number of buffers. And avoids using an array that
would be fast but would multiply the memory usage by the number of backends.

> Index prefetching patch:
> uncorrelated: 228.936 ms
> correlated:   71.684  ms

I did some tests
> Possible improvements to refcount tracking:
>
> - increase REFCOUNT_ARRAY_ENTRIES - there's a very significant cliff at 8
>  right now, and with vectorized lookup it might not hurt too much to go
to 16
>  or so

Yes, that is true, but only up to 16, the index prefetch test I was doing
was
getting to 90 or so, and that was clipped by max_pinned_buffers.
Also, I noticed a commit 3 months ago that removed the mid-loop return
that effectively will add the first few pins right to left instead of left
to right.

Maybe this works well with vectorisation, but I see an optimization for the
for the (pin/unpin)+ sequence, what about the pin(pin/unpin)+ sequence.
The previous code would always find the buffers on the first or the second
iteration, the new implementation will have to go to the 7th or 8th
iteration,
(or I am not missing something very important).

> - To make the cliff at REFCOUNT_ARRAY_ENTRIES smaller, replace dynahash
with
>  simplehash. That should reduce the perf penalty a good bit.

This is also true, even remove the refcount array completely.

> Unfortunately it's not just the refcount tracking, it's also resowner
> management that gets more expensive.

I didn't read this sentence until I came back to reply. It is exactly what
I noticed.
Once we fix the reference counting the resowner still puts a floor.
And that is even more important when a buffer is pinned multiple times
because the resowner will add one entry to the buffer for each pin.

There is another problem, ResOwnerReleaseBuffer unlocks buffers,
even if it is not the owner of the lock.

I think this deserves a separate thread.

view thread (367+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CAE8JnxNPK1T79gpS3PTPbFJB7W9rCLMwCZvJfYPR_mAZ-xibZw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox