Re: index prefetching - Peter Geoghegan

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Peter Geoghegan <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Wed, 16 Jul 2025 09:36:48 -0400
Message-ID: <CAH2-Wzm-u6b4gDbLNP=1pkfqJbEyPyey9M-8wG0C+QOTit963Q@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAH2-Wzn7vqmt=qE_hDrOx4NETkUoCbdn74G1gswMXi1APUuYrA@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz==zkbOumyX-M4quqbX9GCLcjW_zXmdsaK37q-55rj_fQ@mail.gmail.com>
	<[email protected]>
	<CAH2-Wz=G_HKYZ34-UkBMQjk7SCNtu9O5V4t=A0u82=WT-rtBuw@mail.gmail.com>
	<CA+TgmoaEvBnx2npWycGt1ChTe8m800wMiioUCspaSb0qzc3=Kg@mail.gmail.com>
	<CAH2-WznFwgU3AddTqnvJABX5xo-9upG6NiX+2s0eaFhFj6tRAg@mail.gmail.com>
	<CA+Tgmobav+-oR9-jJUGbHj3j7bhwPpz7qVkfr_9zUSF-kens9A@mail.gmail.com>
	<CAH2-WzkqnVGLEQ31W1vm8T_uzy-ma-6A8QL-C56=0QUqs12b=Q@mail.gmail.com>
	<CAH2-WznmrgwFShyKuKjF2v7M_Eid6VzKd+SRPjh4-y68T6uCDw@mail.gmail.com>
	<esftck6ayqkkdtzijd736oazhve577sp7hthnwouyg2stlwlqj@rmhohbhl7tuz>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAH2-WzmVvU2KmKyq8sUeYXrc_roA5LfOgDE1-vdtorpk_M3DfA@mail.gmail.com>
	<[email protected]>
	<[email protected]>

On Wed, Jul 16, 2025 at 4:40 AM Tomas Vondra <[email protected]> wrote:
> But the thing I don't really understand it the "cyclic" dataset (for
> example). And the "simple" patch performs really badly here. This data
> set is designed to not work for prefetching, it's pretty much an
> adversary case. There's ~100 TIDs from 100 pages for each key value, and
> once you read the 100 pages you'll hit them many times for following
> values. Prefetching is pointless, and skipping duplicate blocks can't
> help, because the blocks are not effective.
>
> But how come the "complex" patch does so much better? It can't really
> benefit from prefetching TID from the next leaf - not this much. Yet it
> does a bit better than master. I'm looking at this since yesterday, and
> it makes no sense to me. Per "perf trace" it actually does 2x many
> fadvise calls compared to the "simple" patch (which is strange on it's
> own, I think), yet it's apparently so much faster?

The "simple" patch has _bt_readpage reset the read stream. That
doesn't make any sense to me. Though it does explain why the "complex"
patch does so many more fadvise calls.

Another issue with the "simple" patch: it adds 2 bool fields to
"BTScanPosItem". That increases its size considerably. We're very
sensitive to the size of this struct (I think that you know about this
already). Bloating it like this will blow up our memory usage, since
right now we allocate MaxTIDsPerBTreePage/1358 such structs for
so->currPos (and so->markPos). Wasting all that memory on alignment
padding is probably going to have consequences beyond memory bloat.

-- 
Peter Geoghegan

view thread (348+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CAH2-Wzm-u6b4gDbLNP=1pkfqJbEyPyey9M-8wG0C+QOTit963Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox