Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrjD2-002t8M-16 for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 20:59:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrjD1-002FxX-0n for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 20:59:15 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrjD0-002FxO-1H for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 20:59:14 +0000 Received: from fout-b3-smtp.messagingengine.com ([202.12.124.146]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vrjCx-00000000mgj-1r3l for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 20:59:13 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id 14FE11D00009; Sun, 15 Feb 2026 15:59:11 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Sun, 15 Feb 2026 15:59:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1771189150; x=1771275550; bh=s2XbtZ52J9RJEnSPdKonlFKWBnrLE6Rj8Hr5P1y+C4Y=; b= q3qPTvKbsnfegnzjHfZ8xpU9voIyk8o6KFhiewIXul9RZohOWIVboee0ldgTlE8H k7ihHdwjo5f+JU3G8muyKcQY3hMP17BdsLkqv7UN2ceXiISzFOOhZcvaocb72KlZ LO9C/debEgmeMHjRFHxs7suymiAzc/BihFSjwrsj6TlKB4S2/bAaDGSbPd5UhV8X h2PcbbrYSNq9TgsjMmrS9on6Jvj+6OYgZGUmCNl4e2sgJyAvqjGwkW4sTH7jDCUq eT1xu440U5ZLfkMqvRXcqfE4BhIONnwtw//2M7VvHn+ualqSt++AZUDtIfz48h4n U1x7/3PgGrJJmMiaqVIqiw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1771189150; x= 1771275550; bh=s2XbtZ52J9RJEnSPdKonlFKWBnrLE6Rj8Hr5P1y+C4Y=; b=Q dI2e6riuO/S07+4gqCA4UGbi6+KwD5OGJGVXRTo8f9pPmBRvM4V83JVEMfE3BJ/w w3HwbCkYR0dWqbejneHRPr8pl2IXh3uw6IpejOZjc/0ZqDL9N0t9UW+SoPMXH/Dg F+hP1qUPo9YFQTZvu/djAiSvjGhlXQB2ALNWBclGBNNPBZoZpndlKfL72CKGhszP KeMqcw5VuRd2GI5SfV5+KQpFbiuHMTw8ArdGRphRIBFESHw9w93TgoZ5fmjp+hPx NU9FVlEWyqqjbZ50n+odM2Pw+LHWIvQFniFXQJqjzBjw7npnJreU3LGvQBam4nu/ AIXXhKnvUlefOPJ/YQTTA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvudehtdduucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfei gffhvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddupdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhope hknhhiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudes ghhmrghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrd gtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtgho mhdprhgtphhtthhopehordgrlhgvgigrnhgurhgvrdhfvghlihhpvgesghhmrghilhdrtg homhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghp thhtohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepph hgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 15 Feb 2026 15:59:09 -0500 (EST) Date: Sun, 15 Feb 2026 15:59:09 -0500 From: Andres Freund To: Tomas Vondra Cc: Peter Geoghegan , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: References: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-15 14:34:07 -0500, Andres Freund wrote: > debug_io_direct=data, enable_indexscan_prefetch=1, w/ stream->distance * 2 + 1 > ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Index Scan using idx_periodic_100000 on prefetch_test_data_100000 (cost=0.29..15351.09 rows=100000 width=8) (actual time=0.316..176.703 rows=100000.00 loops=1) │ > │ Index Searches: 1 │ > │ Prefetch: distance=707.476 count=11158 stalls=88503 skipped=0 resets=0 pauses=26 ungets=0 forwarded=0 │ > │ histogram [2,4) => 5, [4,8) => 11, [8,16) => 26, [16,32) => 30, [32,64) => 54, [64,128) => 109, [128,256) => 221, [256,512) => 428, [512,1024) => 10274 │ > │ Buffers: shared hit=96875 read=3400 │ > │ I/O Timings: shared read=33.874 │ > │ Planning: │ > │ Buffers: shared hit=78 read=21 │ > │ I/O Timings: shared read=2.772 │ > │ Planning Time: 3.065 ms │ > │ Execution Time: 182.959 ms │ > └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > The stall stats are bogus, because they get increased even when we correctly > are not prefetching due to everything being in shared buffers. I think the > if (distance == 1) stats.nstalls++ > would need to be just before the WaitReadBuffers(). The histogram and distance are also somewhat misleading: They measure what the distance is at the time the next block is determined, but that's not really informative, as the distance can be much bigger than what we are actually doing IO wise (to allow for IO combining etc). The limit for the number of in-flight IOs will be the limiting factor in a case with random-ish IOs and it's also really what matters for performance. For a query with lots of random prefetching, run with effective_io_concurrency=8, compare: stats using stream->distance: Limit (cost=0.43..150968.68 rows=1000000 width=8) (actual time=0.138..5425.851 rows=1000000.00 loops=1) Buffers: shared hit=701118 read=301757 I/O Timings: shared read=3919.597 -> Index Scan using idx_random_10000000 on prefetch_test_data_10000000 (cost=0.43..1509682.87 rows=10000000 width=8) (actual time=0.137..5331.243 rows=1000000.00 loops=1) Index Searches: 1 Prefetch: distance=143.999 count=1297237 stalls=6 skipped=3 resets=0 pauses=0 ungets=297462 forwarded=2 histogram [16,32) => 9, [32,64) => 2, [64,128) => 2, [128,256) => 1297224 Buffers: shared hit=701118 read=301757 I/O Timings: shared read=3919.597 Planning: Buffers: shared hit=96 read=7 I/O Timings: shared read=0.812 Planning Time: 1.310 ms Execution Time: 5496.622 ms stats using stream->ios_in_progress: Limit (cost=0.43..150968.68 rows=1000000 width=8) (actual time=0.157..5427.138 rows=1000000.00 loops=1) Buffers: shared hit=701118 read=301757 I/O Timings: shared read=3919.857 -> Index Scan using idx_random_10000000 on prefetch_test_data_10000000 (cost=0.43..1509682.87 rows=10000000 width=8) (actual time=0.156..5332.607 rows=1000000.00 loops=1) Index Searches: 1 Prefetch: distance=6.864 count=1297237 stalls=6 skipped=3 resets=0 pauses=0 ungets=297462 forwarded=106 histogram [1,2) => 1164, [2,4) => 12634, [4,8) => 1283335 Buffers: shared hit=701118 read=301757 I/O Timings: shared read=3919.857 Planning: Buffers: shared hit=96 read=7 I/O Timings: shared read=0.809 Planning Time: 1.305 ms Execution Time: 5502.332 ms The former sure makes it look like that the bottleneck isn't the number of IOs in flight, after all the distance is quite deep. Yet, if you increase effective_io_concurrency: stats using stream->distance: Limit (cost=0.43..150968.68 rows=1000000 width=8) (actual time=0.138..2769.847 rows=1000000.00 loops=1) Buffers: shared hit=701960 read=301786 I/O Timings: shared read=1100.819 -> Index Scan using idx_random_10000000 on prefetch_test_data_10000000 (cost=0.43..1509682.87 rows=10000000 width=8) (actual time=0.137..2667.780 rows=1000000.00 loops=1) Index Searches: 1 Prefetch: distance=1038.301 count=1293137 stalls=0 skipped=3 resets=0 pauses=694 ungets=291801 forwarded=2 histogram [16,32) => 16, [32,64) => 18, [64,128) => 33, [128,256) => 2, [256,512) => 3, [512,1024) => 125045, [1024,2048) => 1168020 Buffers: shared hit=701960 read=301786 I/O Timings: shared read=1100.819 Planning: Buffers: shared hit=96 read=7 I/O Timings: shared read=0.819 Planning Time: 1.346 ms Execution Time: 2845.891 ms (14 rows) stats using stream->ios_in_progress: Limit (cost=0.43..150968.68 rows=1000000 width=8) (actual time=0.136..2767.839 rows=1000000.00 loops=1) Buffers: shared hit=701960 read=301786 I/O Timings: shared read=1102.115 -> Index Scan using idx_random_10000000 on prefetch_test_data_10000000 (cost=0.43..1509682.87 rows=10000000 width=8) (actual time=0.135..2665.838 rows=1000000.00 loops=1) Index Searches: 1 Prefetch: distance=61.054 count=1293134 stalls=0 skipped=3 resets=0 pauses=694 ungets=291798 forwarded=4 histogram [1,2) => 1, [2,4) => 2, [4,8) => 4, [8,16) => 9, [16,32) => 8737, [32,64) => 1284379 Buffers: shared hit=701960 read=301786 I/O Timings: shared read=1102.115 Planning: Buffers: shared hit=96 read=7 I/O Timings: shared read=0.819 Planning Time: 1.334 ms Execution Time: 2846.017 ms If you want to measure in the domain of blocks, rather than, I'd suggest at least using the number of currently pinned buffers, rather than the distance. But I think IOs is a much more useful measure. E.g. for prefetching a index scan on a sequential column, with: io_effective_io_concurrency=128 io_combine_limit=16 stats using stream->distance: Prefetch: distance=18.872 count=320174 stalls=30736 skipped=9687128 resets=0 pauses=7684 ungets=0 forwarded=0 histogram [1,2) => 3416, [2,4) => 23052, [4,8) => 53788, [8,16) => 115244, [16,32) => 124654, [32,64) => 20 Buffers: shared hit=27325 read=312500 I/O Timings: shared read=1794.060 stats using stream->pinned_buffers: Prefetch: distance=8.259 count=320174 stalls=30736 skipped=9687128 resets=0 pauses=7684 ungets=0 forwarded=29897 histogram [1,2) => 23052, [2,4) => 58070, [4,8) => 84497, [8,16) => 105873, [16,32) => 18785 Buffers: shared hit=27325 read=312500 I/O Timings: shared read=887.850 stats using stream->ios_in_progress: Prefetch: distance=0.059 count=320174 stalls=30736 skipped=9687128 resets=0 pauses=7684 ungets=0 forwarded=301389 histogram [1,2) => 18785 Buffers: shared hit=27325 read=312500 I/O Timings: shared read=881.723 Only the ios-in_progress one really shows that we are basically not doing any readahead, due to heapam pausing, thinking we're far enough ahead - but due to IO combining the batchdistance limit is reached before we can issue multiple IOs. io_effective_io_concurrency=128 io_combine_limit=4: stats using stream->ios_in_progress: Prefetch: distance=4.200 count=328882 stalls=1 skipped=9687128 resets=0 pauses=16391 ungets=0 forwarded=9 histogram [1,2) => 8, [2,4) => 129775, [4,8) => 199090 Buffers: shared hit=27325 read=312500 I/O Timings: shared read=300.284 FWIW, if I change the batchdistance <= 2 check to <= 8, I get good perf even with io_combine_limit=16: stats using stream->ios_in_progress: Prefetch: distance=2.605 count=315526 stalls=3 skipped=9687128 resets=0 pauses=3035 ungets=0 forwarded=50 histogram [1,2) => 72679, [2,4) => 170115, [4,8) => 72682 Buffers: shared hit=27325 read=312500 I/O Timings: shared read=125.902 but that was just an experiment. Greetings, Andres Freund