Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrkm1-004FOK-1o for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 22:39:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrklz-002daL-0a for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 22:39:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrkly-002daD-2l for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 22:39:26 +0000 Received: from fhigh-b8-smtp.messagingengine.com ([202.12.124.159]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vrklw-00000000r6r-37qD for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 22:39:26 +0000 Received: from phl-compute-12.internal (phl-compute-12.internal [10.202.2.52]) by mailfhigh.stl.internal (Postfix) with ESMTP id 7B6B37A0087; Sun, 15 Feb 2026 17:39:22 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-12.internal (MEProxy); Sun, 15 Feb 2026 17:39:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1771195162; x=1771281562; bh=T0Xvf3sTzX e4L6bkLPVpjTtqq5ZBG72wurm9PzN69vE=; b=eOMzJFw0ddQCaKazJBoMpHdeq9 KjsB8cZxtw5/G9jxHHGEqoKUy0y0FtzHKo0fGpWp1sd5+ZodmZ0FKsBoCD1rNfDy HjfRDRf4rNry4eN9X6V8kjN3k3k1XzxziKtKybu9ZZzt5iJDs1gf4zcA//m7/C56 nLhfLL1mjMaBf+8IABkFY2noI5CtVKVqbdh31U8T8L6aWtJKI3dfQKw1o2MOO8I6 rbtwplGnrs2mTjArE97c91s9JriNQcZDCRS26Wv3lI4aEwIN2ngAE7oDiS2NKpaw 7/oI8bcUQPlAl7hlWPDmO2MXG0qxbpCLs2A5QC66Uy9NxxcaqYGxOiLAeDqA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1771195162; x=1771281562; bh=T0Xvf3sTzXe4L6bkLPVpjTtqq5ZBG72wurm 9PzN69vE=; b=mGHtsS99QR2V5eHXDPHZw//t+QO8KwX7GmSex/lo6JVnL1KjcCb trtnHCbZXb7tX6XUNlFL/ssbRzHMhZXSNZ1DrozTNOv1K+lhjR7FKNOwfSCghXdu qvPPN48z6t95r+cqzxMFLWCNK2LDUYa7Mp1ppDuQqmhuwqG5a8hYbO6O3ffZgqT3 O7rjac3U698RmPCdq+eBbEtB8FExtGlAodHNgNsMYsQ6QaG1pyaac/wuzpIJxUTQ wpf9H0j1G7VczT7NKyG0su8wfCsHVjwqJKy4oubFSV3Bt8CtFno8hK6NX4ECZWsL DlyTtewCi3zBKUpefCXd/0MhdRfkB7AoOWA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvudehvdduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtsfdttddtvdenucfhrhhomheptehnughrvghs ucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrfgrth htvghrnhepfeffgfelvdffgedtveelgfdtgefghfdvkefggeetieevjeekteduleevjefh ueegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprg hnughrvghssegrnhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopeduuddpmhhouggv pehsmhhtphhouhhtpdhrtghpthhtohepphhgsegsohifthdrihgvpdhrtghpthhtohepkh hnihiihhhnihhksehgrghrrhgvthdrrhhupdhrtghpthhtohepsgihrghvuhiikedusehg mhgrihhlrdgtohhmpdhrtghpthhtohepughilhhiphgsrghlrghuthesghhmrghilhdrtg homhdprhgtphhtthhopehmvghlrghnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhm pdhrtghpthhtohepohdrrghlvgigrghnughrvgdrfhgvlhhiphgvsehgmhgrihhlrdgtoh hmpdhrtghpthhtoheprhhosggvrhhtmhhhrggrshesghhmrghilhdrtghomhdprhgtphht thhopehthhhomhgrshdrmhhunhhrohesghhmrghilhdrtghomhdprhgtphhtthhopehpgh hsqhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 15 Feb 2026 17:39:21 -0500 (EST) Date: Sun, 15 Feb 2026 17:39:21 -0500 From: Andres Freund To: Tomas Vondra Cc: Alexandre Felipe , Peter Geoghegan , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <7herwtpae3ptqdng3s7tcft4ljkc23fyocp3mbrvc7xyk7s2lk@uq3qbm4blizo> References: <9411f220-007d-4f1e-9c8f-ca8eb09e6788@vondra.me> <984dcf9e-ada0-4dff-ae58-1f97bc904ccb@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <984dcf9e-ada0-4dff-ae58-1f97bc904ccb@vondra.me> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-15 22:17:05 +0100, Tomas Vondra wrote: > I don't have access to a M1 machine (and it also does not say what type > of storage is it using, which seems pretty important for a patch aiming > to improve I/O behavior). But I tried running this on my ryzen machine > with local SSDs (in RAID0), and with the 100k rows (and fixed handling > of page cache) I get this: > > column_name io_method evict n master_ms off_ms on_ms effect_pct > periodic worker off 10 35.8 35.1 36.5 2.0 > periodic worker os 10 49.4 49.9 58.8 8.1 > periodic worker pg 10 39.5 39.9 47.1 8.3 > random worker off 10 35.9 35.6 35.7 0.2 > random worker os 10 49.0 49.0 42.6 -7.0 > random worker pg 10 39.6 39.9 40.9 1.2 > sequential worker off 10 28.2 27.9 27.7 -0.4 > sequential worker os 10 39.3 39.2 34.8 -6.0 > sequential worker pg 10 30.1 30.1 29.4 -1.3 > > column_name io_method evict n master_ms off_ms on_ms effect_pct > periodic io_uring off 10 35.9 35.8 35.8 -0.1 > periodic io_uring os 10 49.3 49.9 50.0 0.1 > periodic io_uring pg 10 40.1 39.8 41.7 2.4 > random io_uring off 10 35.6 35.2 35.7 0.8 > random io_uring os 10 49.1 48.9 46.1 -3.0 > random io_uring pg 10 39.8 40.1 42.6 3.1 > sequential io_uring off 10 28.0 27.8 28.0 0.4 > sequential io_uring os 10 39.8 39.1 40.7 1.9 > sequential io_uring pg 10 30.2 30.0 29.6 -0.8 > > This is on default config with io_workers=12 and data_checksums=off. I'm > not showing results for parallel query, because it's irrelevant. > > This also has timings for master, for worker and io_uring (which you > could not get on M1, at least no in MacOS). For "worker" the differences > are much smaller (within 10% in the worst case), and almost non-existent > for io_uring. Which suggests this is likely due to the "signal" overhead > associated with worker, which can be annoying for certain data patterns > (where we end up issuing an I/O for individual blocks at distance 1). I don't think this is just the signalling issue. For "periodic" I think it's the signalling issue triggered by the read stream distance being kept too low. Due to the small distance, the latency affects us much more. Any my system, with turbo boost etc disabled. worker w/ enable_indexscan_prefetch=0: Index Scan using idx_periodic_100000 on prefetch_test_data_100000 (cost=0.29..15101.09 rows=100000 width=208) (actual time=0.157..84.129 rows=100000.00 loops=1) Index Searches: 1 Buffers: shared hit=97150 read=3125 I/O Timings: shared read=31.274 Planning: Buffers: shared hit=97 read=7 I/O Timings: shared read=0.595 Planning Time: 0.944 ms Execution Time: 89.319 ms worker w/ enable_indexscan_prefetch=1: Index Scan using idx_periodic_100000 on prefetch_test_data_100000 (cost=0.29..15101.09 rows=100000 width=208) (actual time=0.158..115.279 rows=100000.00 loops=1) Index Searches: 1 Prefetch: distance=1.060 count=99635 stalls=3004 skipped=0 resets=0 pauses=0 ungets=0 forwarded=0 histogram [1,2) => 93627, [2,4) => 6008 Buffers: shared hit=97150 read=3125 I/O Timings: shared read=56.077 Planning: Buffers: shared hit=97 read=7 I/O Timings: shared read=0.612 Planning Time: 0.994 ms Execution Time: 120.575 ms Right, a regression. But note how low the distance is - no wonder the worker latency has a bad effect - we only have the downside, never the upside, as there's pretty much no IO concurrency. After applying this diff: @@ -1006,7 +1038,9 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) stream->oldest_io_index = 0; /* Look-ahead distance ramps up rapidly after we do I/O. */ - distance = stream->distance * 2; + distance = stream->distance * 2 + + 1 + ; distance = Min(distance, stream->max_pinned_buffers); stream->distance = distance; worker w/ enable_indexscan_prefetch=1 + patch: Index Scan using idx_periodic_100000 on prefetch_test_data_100000 (cost=0.29..15101.09 rows=100000 width=208) (actual time=0.157..82.673 rows=100000.00 loops=1) Index Searches: 1 Prefetch: distance=70.892 count=103109 stalls=5 skipped=0 resets=0 pauses=0 ungets=3474 forwarded=0 histogram [1,2) => 88975, [2,4) => 5, [4,8) => 11, [8,16) => 26, [16,32) => 28, [32,64) => 64, [64,128) => 104, [128,256) => 136, [256,512) => 602, [512,1024) => 13158 Buffers: shared hit=97150 read=3125 I/O Timings: shared read=19.711 Planning: Buffers: shared hit=97 read=7 I/O Timings: shared read=0.596 Planning Time: 0.951 ms Execution Time: 87.887 ms By no means a huge win compared to prefetching being disabled, but the regression does vanish. The problem that this fixes is that the periodic workload has cache hits frequently, which reduce the stream->distance by 1. Then, on a miss, we double the distance. But that means that if you have the trivial pattern of one hit and one miss, which this workload very often has, you *never* get above 1. I.e. we increase the distance as quickly as we decrease it. Greetings, Andres Freund