Re: index prefetching - Alexandre Felipe

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alexandre Felipe <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Mon, 16 Feb 2026 05:30:00 +0000
Message-ID: <CAE8JnxNOV9kOgmU1-WUWts9Q-Jj_Nf0K480wyEwJXUQYMnYu3g@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAH2-WzkC_EdGB_tdEQr63BiOuJbcvZiV=xO+OJYux4wOaxPD8w@mail.gmail.com>
	<CAH2-Wzn8whv-RE4E5L2WQB_Ha_jy0UReJq1yypu83gDYHeXq5g@mail.gmail.com>
	<CAH2-Wzm7-QuDOs6TcqfhhDsGEZCuHtn=D-SriOTnTZ_fiXNBvA@mail.gmail.com>
	<CAH2-WzmH7pVQ0-mYAxb82aWbz29_BiBPq2wV5p7+1o2sRFqDRQ@mail.gmail.com>
	<CAH2-Wz=6a7fGz2rALDX+xiFDuEaGQWpZ49xEaBUDKiPH8gcL+Q@mail.gmail.com>
	<CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com>
	<CAH2-WzmymSyOt5Y2RGbm6cJXg18J_ttfqjdcpodHe6Gp23ConQ@mail.gmail.com>
	<CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>
	<CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com>
	<64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl>
	<a67mvhyi2q45eg4eimhpwdg6l3s3dmpahti2svffvmvzwmss27@r4nohusvndbq>
	<[email protected]>

Thank you for your comments Tomas and Andres,

I will try to be brief.

## Large table MacOS test

I did a 1000x larger test in MacOS, with the sql script . prefetch had
negligible effect for random and periodic, and made sequential 16% slower
[a].

## Small scale linux test.

I have repeated the tests with a 10k table row on a linux system, this time
I tried using either a SSD or a HDD, with shared_buffers of either 128kB or
128MB, and either psql or python with psycopg. I used a slightly different
table [b].

Prefetch makes...

SDD with low available buffers is significantly slower, except for
sequential reads.
SDD with cold OS reads is significantly faster for random access.
HDD faster, but not reliably (high variance).

I am not questioning the usefulness of the patch, and I know that there is
a log of work already put into it. The reason why I decided to review it is
because I believe this can be one important step forward. But I hope that
it is not hard to agree that these (counter)examples suggest that there are
some edges to be pruned. Where I work, most of the queries will access
at most a few hundred lines and are expected to be perceived as
instantaneous.

If the tests I am doing are pointless, should we consider having something
in the planner to prevent these scans from using prefetch?

Should we introduce centralized coordination for IO? As far as I know this
is an area where we just let each query request what they need and hope for
the best. What happens if we have two sequential scans in different tables?
the disk access could interleave pages of the two scans, falling into a
random access pattern, right?

### Cache control

This is a way to make the script run without sudo in linux, you give
ownership to root, and then you pin this program to the owner
% gcc drop_cache.c -o drop_cache;
% sudo chown root:root drop_cache;
% sudo chmod 4755 drop_cache;

In MacOS  purge in the sudoers[2] temporarily, similar to [3]
user ALL=(ALL) NOPASSWD: /usr/bin/purge
So that I don't need to give sudo privileges to the script (that imports
code that I am not even aware of).

Notes:

[a] I did some profiling with sample [4], and tried to spot functions with
the highest increase or decrease in run time, but I was too confused, no
point in dumping raw logs here.

[b] This time I used a (SELECT string_agg((i*j)::text, '+') FROM
generate_series(1, 50)) instead of repeat('x', 100), just to prevent it
from compressing to nothing when I try larger payloads, and hit the TOAST
thresholds. I removed the primary key `id` because it was annoying to take
20 minutes to insert the data in the large scale test.

References:

[1]
https://www.reddit.com/r/sysadmin/comments/pwiboy/whats_the_purpose_of_chmod_4755_binsh_command/
[2] https://man7.org/linux/man-pages/man5/sudoers.5.html
[3] https://unix.stackexchange.com/a/13058
[4] https://www.unix.com/man_page/osx/1/sample/

Regards,
Alexandre

Attachments:

  [image/png] hdd-linux-psql-128MB.png (53.6K, 3-hdd-linux-psql-128MB.png)
  download | view image

  [image/png] hdd-linux-psql-128kB.png (60.2K, 4-hdd-linux-psql-128kB.png)
  download | view image

  [image/png] sdd-linux-psql-128kB.png (58.7K, 5-sdd-linux-psql-128kB.png)
  download | view image

  [image/png] hdd-linux-psycopg-128kB.png (85.7K, 6-hdd-linux-psycopg-128kB.png)
  download | view image

  [image/png] hdd-linux-psycopg-128MB.png (87.2K, 7-hdd-linux-psycopg-128MB.png)
  download | view image

  [image/png] sdd-linux-psycopg-128kB.png (87.4K, 8-sdd-linux-psycopg-128kB.png)
  download | view image

  [image/png] sdd-linux-psql-128MB.png (57.6K, 9-sdd-linux-psql-128MB.png)
  download | view image

  [image/png] sdd-linux-psycopg-128MB.png (83.3K, 10-sdd-linux-psycopg-128MB.png)
  download | view image

view thread (87+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CAE8JnxNOV9kOgmU1-WUWts9Q-Jj_Nf0K480wyEwJXUQYMnYu3g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox