Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrvH7-00D6B1-1A for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Feb 2026 09:52:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrvH5-000chq-2n for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Feb 2026 09:52:16 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrvDV-000BZn-1U for pgsql-hackers@lists.postgresql.org; Mon, 16 Feb 2026 09:48:33 +0000 Received: from relay6-d.mail.gandi.net ([2001:4b98:dc4:8::226]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vruAJ-00000000z6k-2Ga9 for pgsql-hackers@lists.postgresql.org; Mon, 16 Feb 2026 08:41:13 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 8A3B14441D; Mon, 16 Feb 2026 08:41:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1771231265; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2qgNvybcZnxNPo3kD/P1xd3nunmElPr1LcVS9Lzc65Q=; b=HSDR2RchzcCDssDoWpTn5+CcEfL/tzMT9B7xN9aITxjJc5ZXSxtTTRTKCfDBjooV9l3FYS 8tjAhoVwM7/uunAklFmgqOvA1nRCBE2Ez6IYvxJvG5s4//ay242pPHtAFmS1EAi6cOagvY 4q7oyMbPYGoWqjvnW1JBfBFiJ3o0PGNdrVUn6yUOvK09p1+tSTdLj9IzU9mePj2GBJLwrY k2XfGb+s6QdqoqhRRENkk0DtOPoAKxMTMZMO+Db4DJUZbsu+N+U83dSEhNU8z2iqGNr+77 RfySN0zmFSCIiv0sDGK4eqQsSgiGApBz+nMCIjYL2CeMrpz0WhkBieWrmeursQ== Message-ID: Date: Mon, 16 Feb 2026 09:41:01 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: index prefetching To: Alexandre Felipe Cc: Andres Freund , Peter Geoghegan , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar References: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-Sasl: tomas@vondra.me X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvudeigedvucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthekredttddvjeenucfhrhhomhepvfhomhgrshcugghonhgurhgruceothhomhgrshesvhhonhgurhgrrdhmvgeqnecuggftrfgrthhtvghrnhepuedvvdeifefffeekudeggfdtieeglefggeduheffveeihefggfehgfdvudetffevnecukfhppeekiedrgeelrddvfedtrddvtdeinecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepkeeirdegledrvdeftddrvddtiedphhgvlhhopegluddtrddufeejrddtrddukegnpdhmrghilhhfrhhomhepthhomhgrshesvhhonhgurhgrrdhmvgdpqhhiugepkeetfeeuudeggeegudffpdhmohguvgepshhmthhpohhuthdpnhgspghrtghpthhtohepuddupdhrtghpthhtohepohgrlhgvgigrnhgurhgvfhgvlhhiphgvsehgmhgrihhlrdgtohhmpdhrtghpthhtoheprghnughrvghssegrnhgrrhgriigvlhdruggvpdhrtghpthhtohepphhgsegsohifthdrihgvpdhrtghpthhtohepthhhohhmrghsmhhunhhrohesghhmrghilhdrtghomhdprhgtphhtthhopegshigrvhhuiiekudesghhmrghilhdrtghomhdprhgtphhtt hhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhm List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/16/26 06:30, Alexandre Felipe wrote: > Thank you for your comments Tomas and Andres, > > I will try to be brief. > > ## Large table MacOS test > > I did a 1000x larger test in MacOS, with the sql script . prefetch had > negligible effect for random and periodic, and made sequential 16% > slower [a]. > How did you do that? Did you increase the number of rows, make the rows wider (by increasing the 'repeat' parameter in the script), or something else? Did you verify the table really is 1000x larger? > > ## Small scale linux test. > > I have repeated the tests with a 10k table row on a linux system, this > time I tried using either a SSD or a HDD, with shared_buffers of either > 128kB or 128MB, and either psql or python with psycopg. I used a > slightly different table [b]. > The "10k table row" means repeat('x',10000) when generating data? Oh, I see you're using some string_agg(), to make it not compress. But note that if it's TOASTed, it become entirely irrelevant for the prefetching test because it's in a separate relation. Unfortunately, you have not included the new script, so we can't try reproducing your results. 128kB shared buffers is a little bit ... insane. I refuse to optimize anything for this value, and I don't even call about regressions. Even 128MB is not really practical, any serious system caring about performance will use tens or hundreds of GBs of shared buffers. > Prefetch makes... > > SDDwith low available buffers is significantly slower, except for > sequential reads. > SDD with cold OS reads is significantly faster for random access. > HDD faster, but not reliably (high variance). > > I am not questioning the usefulness of the patch, and I know that there > is a log of work already put into it. The reason why I decided to review > it is because I believe this can be one important step forward. But I > hope that it is not hard to agree that these (counter)examples suggest > that there are some edges to be pruned. Where I work, most of the > queries will access at most a few hundred lines and are expected to be > perceived as instantaneous. > No comment unless I can reproduce these results. > If the tests I am doing are pointless, should we consider having > something in the planner to prevent these scans from using prefetch? > How would you do that? Please explain. The planner would need to know things it fundamentally does not (and can't) know - for example it does not know if it'll touch only cached data, or the particular I/O pattern (which may matter a lot). There are runtime provisions to not enable prefetching in some cases, and we may improve them in the future. But all of this is heuristics, and every heuristics has a set of adversary cases. > Should we introduce centralized coordination for IO? As far as I know > this is an area where we just let each query request what they need and > hope for the best. What happens if we have two sequential scans in > different tables? the disk access could interleave pages of the two > scans, falling into a random access pattern, right? > The patch does nothing about that, it's the same as now. And we have no ambition to work on that. > ### Cache control > > This is a way to make the script run without sudo in linux, you give > ownership to root, and then you pin this program to the owner > % gcc drop_cache.c -o drop_cache; > % sudo chown root:root drop_cache; > % sudo chmod 4755 drop_cache; > > In MacOS  purge in the sudoers[2] temporarily, similar to [3] > user ALL=(ALL) NOPASSWD: /usr/bin/purge > So that I don't need to give sudo privileges to the script (that imports > code that I am not even aware of). > ... or you could modify the script to simply use sudo. > > Notes: > > [a] I did some profiling with sample [4], and tried to spot functions > with the highest increase or decrease in run time, but I was too > confused, no point in dumping raw logs here. > > [b] This time I used a (SELECT string_agg((i*j)::text, '+') FROM > generate_series(1, 50)) instead of repeat('x', 100), just to prevent it > from compressing to nothing when I try larger payloads, and hit the > TOAST thresholds. I removed the primary key `id` because it was annoying > to take 20 minutes to insert the data in the large scale test. > It's better to load the data first, and then create indexes. regards -- Tomas Vondra