Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vs2Bz-001qay-1S for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Feb 2026 17:15:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vs2By-004FmC-0d for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Feb 2026 17:15:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vs2Bx-004Flv-1S for pgsql-hackers@lists.postgresql.org; Mon, 16 Feb 2026 17:15:26 +0000 Received: from fout-b4-smtp.messagingengine.com ([202.12.124.147]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vs2Bu-00000000vMd-2ufh for pgsql-hackers@lists.postgresql.org; Mon, 16 Feb 2026 17:15:24 +0000 Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id 7BE2B1D00426; Mon, 16 Feb 2026 12:15:22 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Mon, 16 Feb 2026 12:15:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1771262122; x=1771348522; bh=bHdVNKiHJyMOyJvt4XhBxbIbH2+/cEYpCpw22o452Qg=; b= JVI+HoZWYy0v3HJzUkSitCWLyqGk7rIrxF4xx1ClzUU6QlIsM6EBSOvgYkoZyEhQ EAMull1gq9+IVkY4NvK9wj5yr33RRTt/NGbFOVSee+R5bRT55M4ymi4+16/rnPx7 tIYug4Ie65rqbv/+sEnHMNqOBgHNsw3YWbDvN3wFJthNAMJ0THQibXR7RJDzXu7r Pm2eVu7TKWP2uJYHELi4WFrrqZzsPnWaFPnjna9YB6Sfw3ZSc8c7qBx1jHOoAKp2 YbbVww72DmpbK6C2vKbHAZkGAZBdL9a69MTpV0untGZyQPYqmOqd72EFPFAgEKzp 4ROx+AtrOjUPV8di8b0mgg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1771262122; x= 1771348522; bh=bHdVNKiHJyMOyJvt4XhBxbIbH2+/cEYpCpw22o452Qg=; b=i sq5FNU0LX23cif3kRx5sxn3EJU085baHqAq4KMqbUJNs1WCNJ0RbasR892t4M5Lc FOLsD2/o53aWALZjviAhDGaDgtg20ukldi++HMA3yRwu5gblz9ezPOOvU3OqgBzB kdBeNWMCQvSI32GcgmaM/LWCiLKlstyL2SZUXweeFU4wbMXvvmDkx7QnreWzovaD qAXINvtsQkHklYvRB+hG+gpb/s3i/6lsQUFZRqtLUbcund0Eiyok3zGU8XLVBFxg aJbGefxuZ7JpnzZNwCgQMuMIko2hEvdRYW3zKXMi7rstfUhsiehgXJWYUQucX8XA vKTJ+HMYEp++hNndDG7RQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvudejgeegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfei gffhvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddupdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhope hknhhiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudes ghhmrghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrd gtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtgho mhdprhgtphhtthhopehordgrlhgvgigrnhgurhgvrdhfvghlihhpvgesghhmrghilhdrtg homhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghp thhtohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepph hgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 16 Feb 2026 12:15:21 -0500 (EST) Date: Mon, 16 Feb 2026 12:15:20 -0500 From: Andres Freund To: Alexandre Felipe Cc: Tomas Vondra , Peter Geoghegan , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: References: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-16 05:30:00 +0000, Alexandre Felipe wrote: > ## Large table MacOS test > > I did a 1000x larger test in MacOS, with the sql script . prefetch had > negligible effect for random and periodic, and made sequential 16% slower > [a]. Quite surprised by that, I don't see that at all. > ## Small scale linux test. > > I have repeated the tests with a 10k table row on a linux system, this time > I tried using either a SSD or a HDD, with shared_buffers of either 128kB or > 128MB, and either psql or python with psycopg. I used a slightly different > table [b]. 128kB is a useless test. That's 16 buffers. Just about any real workload will trigger errors about having too many buffer pinned, independent of this patch. We support it because it's useful for some testing, not because anybody should ever use postgres that way. We should not spend a single second optimizing performance on such toy setups. Even 128MB borders on irrelevant for anything performance sensitive. It actually will limit how much AIO you can do (due to now allowing one session to pin all buffers). If you have a workload that is at all IO bound, you should use a significantly larger s_b. > Prefetch makes... > > SDD with low available buffers is significantly slower, except for > sequential reads. > SDD with cold OS reads is significantly faster for random access. > HDD faster, but not reliably (high variance). > > I am not questioning the usefulness of the patch, and I know that there is > a log of work already put into it. The reason why I decided to review it is > because I believe this can be one important step forward. But I hope that > it is not hard to agree that these (counter)examples suggest that there are > some edges to be pruned. Where I work, most of the queries will access > at most a few hundred lines and are expected to be perceived as > instantaneous. I think some of the tests do show something useful (like readahead not happening for some cases where it should). But testing with pointless configurations is not useful. > If the tests I am doing are pointless, should we consider having something > in the planner to prevent these scans from using prefetch? No, that'd just be wasted cycles. > Should we introduce centralized coordination for IO? I don't think it's any of the next few steps. The amount of contention you can create by doing that is significant, so you have to be really careful. There *are* interesting bits around more central coordination of IOs: I'd e.g. love allowing autovacuum to do IO more aggressively as long as the system otherwise is idle IO wise. That's much better than having an upper cap on how much IO AV is allowed to do at all time, because it allows to use spare IO capacity when available *and* scales down AV activity when the system is otherwise very busy. But it's decidedly nontrivial to do right, and it requires every IO to go through the AIO subsystem, which isn't yet the case (buffered writes & all WAL IO don't yet, for example). > As far as I know this is an area where we just let each query request what > they need and hope for the best. What happens if we have two sequential > scans in different tables? the disk access could interleave pages of the > two scans, falling into a random access pattern, right? Each stream would generate 128kB sized reads (that's the default io_combine_limit) that would then be interspersed between the two streams. Because of the larger size that's fine. Central coordination of IOs couldn't do anything better either. You have, after all, two streams processing data that are on different parts of the disk. So you will have some non-sequential IO. > Notes: > > [a] I did some profiling with sample [4], and tried to spot functions with > the highest increase or decrease in run time, but I was too confused, no > point in dumping raw logs here. > > [b] This time I used a (SELECT string_agg((i*j)::text, '+') FROM > generate_series(1, 50)) instead of repeat('x', 100), just to prevent it > from compressing to nothing when I try larger payloads, and hit the TOAST > thresholds. I removed the primary key `id` because it was annoying to take > 20 minutes to insert the data in the large scale test. FWIW, because your query uses length(payload), it actually ends up spending a lot of CPU time in that, reducing the effect size of IO boundedness. postgres[1077180][1]=# EXPLAIN (ANALYZE) SELECT length(payload) FROM prefetch_test_data_1000000 ORDER BY sequential; ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using idx_sequential_1000000 on prefetch_test_data_1000000 (cost=0.42..59733.43 rows=1000000 width=8) (actual time=0.046..543.910 rows=1000000.00 loops=1) │ │ Index Searches: 1 │ │ Buffers: shared hit=33985 │ │ Planning Time: 0.125 ms │ │ Execution Time: 576.192 ms │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ postgres[1077180][1]=# EXPLAIN (ANALYZE) SELECT id FROM prefetch_test_data_1000000 ORDER BY sequential; ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using idx_sequential_1000000 on prefetch_test_data_1000000 (cost=0.42..57233.43 rows=1000000 width=8) (actual time=0.036..165.802 rows=1000000.00 loops=1) │ │ Index Searches: 1 │ │ Buffers: shared hit=33985 │ │ Planning Time: 0.119 ms │ │ Execution Time: 199.965 ms │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ Greetings, Andres Freund