public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Alexandre Felipe <[email protected]>
To: Andres Freund <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Tue, 17 Feb 2026 00:33:21 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAE8JnxNmKP+iAhfRwt9C8BTHK1KYBUBZLQav5=1wudEzSFmMSg@mail.gmail.com>
References: <CAH2-WzmH7pVQ0-mYAxb82aWbz29_BiBPq2wV5p7+1o2sRFqDRQ@mail.gmail.com>
	<CAH2-Wz=6a7fGz2rALDX+xiFDuEaGQWpZ49xEaBUDKiPH8gcL+Q@mail.gmail.com>
	<CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com>
	<CAH2-WzmymSyOt5Y2RGbm6cJXg18J_ttfqjdcpodHe6Gp23ConQ@mail.gmail.com>
	<CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>
	<CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com>
	<64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl>
	<a67mvhyi2q45eg4eimhpwdg6l3s3dmpahti2svffvmvzwmss27@r4nohusvndbq>
	<[email protected]>
	<CAE8JnxNOV9kOgmU1-WUWts9Q-Jj_Nf0K480wyEwJXUQYMnYu3g@mail.gmail.com>
	<uwwucxl5psl5ycwnebhn3pwfyb7jdjrrwgif2yqqtboeuscsfh@lo33ijtfdrbd>
	<CAE8JnxNmKP+iAhfRwt9C8BTHK1KYBUBZLQav5=1wudEzSFmMSg@mail.gmail.com>

On 2/17/26 00:05, Alexandre Felipe wrote:
> Hi guys,
> 
> There seems to be some very interesting stuff here, I have to try to
> catch up with your analysis Andres.
> 
> In the meantime.
> 
> I am sharing the results I have got on a well behaved Linux system.
> 

Can you share how is the system / Postgres configured? It's a good
practice to provide all the information others might need to reproduce
your results.

In particular, what is shared_buffers set to? Are you still using
io_method=worker? With how many io workers?

> No sophisticated algorithm here but evicting OS cache helps to verify
> the benefit of prefetching at a much smaller scale, and I think this is
> useful
> % gcc drop_cache.c -o drop_cache;
> % sudo chown root:root drop_cache;
> % sudo chmod 4755 drop_cache;
> 
> I was executing like this
> python3 .../run_regression_test.py --port 5433 --iterations 10 \
>             --columns sequential,random --workers 0 --evict os,off \
>             --payload-size 50 \
>             --rows 10000 \
>             --reset \
>             --ntables 5
> 
> 1 table: significant benefit with HDD cold, SSD random cold access.
> 5 tables: significant benefit for random cold access. Somewhat
> detrimental for sequential cold access, and random hot access.
> 10 tables: significant benefit for random cold access. Slightly better
> than 5 tables for cold sequential access, and somewhat detrimental for
> random hot access.
> 
> These results are hard to explain, but maybe Andres has the answer:
>> I think this specific issue is a bit different, because today you get
>> drastically different behaviour if you have
>> 
>> a) [miss, (miss, hit)+]
>> vs
>> b) [(miss, hit)+]
> 

What's the distance in those cases? You may need to add some logging to
read_stream to show that. If the distance is not ~1.0 then it's not the
issue described by Andres, I think.

There are other ways to look at issued IOs, either using iostat, or
tools like perf-trace.

> 
> Tomas said
>> I think a "proper" solution would require some sort of cost model for
>> the I/O part, so that we can schedule the I/Os just so that the I/O
>> completes right before we actually need the page.
> 
> I dare to ask
> Why not use this on a feedback loop?
> 
> while (!current_buffer.ready && reasonable to prefetch) {
>   fetch next index tuple.
>   if necessary prefetch one more buffer
> }
> 

What does "reasonable to prefetch" mean in practice, and how you
determine it at runtime, before initiating the buffer prefetch?

> I also dare to ask
> Is it possible to skip an unavailable buffer and gain time processing
> the rows that will be needed afterwards?
> This could also help by releasing buffers more quickly if they need to
> be recycled.
> 

Not at the moment, AFAIK. And for most index-only scans that would not
really work anyway, because those need to produce sorted output.


regards

-- 
Tomas Vondra







view thread (87+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox