Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrkNM-003uKA-0f for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 22:14:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrkNL-002Z5j-0O for pgsql-hackers@arkaria.postgresql.org; Sun, 15 Feb 2026 22:13:59 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrkNK-002Z5b-0Z for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 22:13:58 +0000 Received: from fout-b4-smtp.messagingengine.com ([202.12.124.147]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vrkNH-00000000qwc-0kwk for pgsql-hackers@lists.postgresql.org; Sun, 15 Feb 2026 22:13:57 +0000 Received: from phl-compute-12.internal (phl-compute-12.internal [10.202.2.52]) by mailfout.stl.internal (Postfix) with ESMTP id 130121D0020F; Sun, 15 Feb 2026 17:13:52 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-12.internal (MEProxy); Sun, 15 Feb 2026 17:13:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1771193631; x=1771280031; bh=Ghj+SOcOjEhHq27aacQIECh2AcOrR8KxftK/UBPKGFk=; b= ZEKPimPQ9lOXLi56HbVoK4T4sbzLNQHMj8SACkU+PS/hWXUEvjctBY2Xewdcf/GO Wla+Z3TY1B85SDiWmgE6KvgzNPKdpbmWL83flvb9s/PFf+dIfoSdAzXKeJuz7OSl XvtXXd6OWIENbqR8ssdMMGwmGx+q61Sig2HcmOkfkzrH91sH3/2clFoxOqoRKrOg EahMXQxuOS5A4eeLZgaA0sPWp4e3MC4xy3GaeetLGEaC9IWkYQz1PH+cSa61xYWP j2frWRhcqQmkvzDjkPQqNBncxVuIC6ellDHIHaGHW+LEiADxkTo3vyJia1Vk4QXn 9W/A63SCf39+Nf0axr/6eg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1771193631; x= 1771280031; bh=Ghj+SOcOjEhHq27aacQIECh2AcOrR8KxftK/UBPKGFk=; b=V +7SjEmxruBDnc17U83BGURETAyxpQH4Tfhp9vvC3KdigidqUhN64QbUjwRGu7ggJ OpgUoEXRKYofCOc8yaYZ7OgtNsg0tO3PRpueFqXFdRKQTDK1ItXnPCe3NJKh6MpF K3wX1/pa6n5uTzhjYd0E2qR04rmu8A7X325s0oGAuCZtlP0loh03ZTPUbf+vAR+L vtCpCdGtd3Gt0kToUX4bkL9341FTD6LyOcanAmFmxxtGdx3msYncvQJP0M4XCIik 2shb99/Wq/2gktIPIKUr9wyqlGs/KyEuI60JyCMLg5v/yMiH53vJy9Hd/xSlMQSz hn8xLJb9lFO84nS4C53Fg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvudehudeiucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfei gffhvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddupdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhope hknhhiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudes ghhmrghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrd gtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtgho mhdprhgtphhtthhopehordgrlhgvgigrnhgurhgvrdhfvghlihhpvgesghhmrghilhdrtg homhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghp thhtohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepph hgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 15 Feb 2026 17:13:50 -0500 (EST) Date: Sun, 15 Feb 2026 17:13:50 -0500 From: Andres Freund To: Tomas Vondra Cc: Peter Geoghegan , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: References: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-15 22:35:35 +0100, Tomas Vondra wrote: > On 2/15/26 21:59, Andres Freund wrote: > > Hi, > > > > On 2026-02-15 14:34:07 -0500, Andres Freund wrote: > >> debug_io_direct=data, enable_indexscan_prefetch=1, w/ stream->distance * 2 + 1 > >> ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > >> │ QUERY PLAN │ > >> ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > >> │ Index Scan using idx_periodic_100000 on prefetch_test_data_100000 (cost=0.29..15351.09 rows=100000 width=8) (actual time=0.316..176.703 rows=100000.00 loops=1) │ > >> │ Index Searches: 1 │ > >> │ Prefetch: distance=707.476 count=11158 stalls=88503 skipped=0 resets=0 pauses=26 ungets=0 forwarded=0 │ > >> │ histogram [2,4) => 5, [4,8) => 11, [8,16) => 26, [16,32) => 30, [32,64) => 54, [64,128) => 109, [128,256) => 221, [256,512) => 428, [512,1024) => 10274 │ > >> │ Buffers: shared hit=96875 read=3400 │ > >> │ I/O Timings: shared read=33.874 │ > >> │ Planning: │ > >> │ Buffers: shared hit=78 read=21 │ > >> │ I/O Timings: shared read=2.772 │ > >> │ Planning Time: 3.065 ms │ > >> │ Execution Time: 182.959 ms │ > >> └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > >> > >> The stall stats are bogus, because they get increased even when we correctly > >> are not prefetching due to everything being in shared buffers. I think the > >> if (distance == 1) stats.nstalls++ > >> would need to be just before the WaitReadBuffers(). > > > > The histogram and distance are also somewhat misleading: They measure what the > > distance is at the time the next block is determined, but that's not really > > informative, as the distance can be much bigger than what we are actually > > doing IO wise (to allow for IO combining etc). The limit for the number of > > in-flight IOs will be the limiting factor in a case with random-ish IOs and > > it's also really what matters for performance. > > > > This EXPLAIN part was hacked together as something to help us during > development, and a lot of the information is wonky and not well defined. > Which is why we chose not to include it in the patches posted to > hackers, so I'm a bit confused which patch / branch you're looking at. I was looking at Peter's git tree and unreverted the stats, after seeing some odd performance. If found the stats quite valuable, without them it'd have been quite hard to figure out why right now lager sequential or periodic tables (from Alexandre's workload) have very subpar performance. > For stalls you're probably right. I'll think about it. Thx. > I'm not sure about the distance shown. What do you mean by "the distance > can be much bigger than what we are actually doing IO wise"? stream->distance is just a cap of how far we *may* look ahead, not how far we are currently looking ahead. E.g. if you have a stream that full tilt blazes ahead with 1 block random IOs, none of then in s_b, you'll soon have a distance that's large, as it gets doubled for every miss until hitting the cap (of io_combine_limit * effective_io_concurrency, capped by the buffer pin limit). But because you're doing random IO, you're just effective_io_concurrency IOs, not effective_io_concurrency * io_combine_limit. This gets even more extreme if you yield often, because that will lead to the distance staying relatively high, while preventing actually issuing much concurrent IO. > IIRC in that particular case we needed to know how far ahead is the > "prefetch position" (I mean, how many index entries are in between). Right - but that's not what looking at ->distance tells you :). I think you could use ->pinned_buffers for it, if you want to look at the number of blocks, not the number of IOs. > > FWIW, if I change the batchdistance <= 2 check to <= 8, I get good perf even > > with io_combine_limit=16: > > > > stats using stream->ios_in_progress: > > Prefetch: distance=2.605 count=315526 stalls=3 skipped=9687128 resets=0 pauses=3035 ungets=0 forwarded=50 > > histogram [1,2) => 72679, [2,4) => 170115, [4,8) => 72682 > > Buffers: shared hit=27325 read=312500 > > I/O Timings: shared read=125.902 > > > > but that was just an experiment. > > > > I'll take a close look tomorrow, but AFAICS we really aim to measure two > rather different things. I've been interested in "how far ahead are we > looking" and you're more interested in the number of I/Os initiated by > the stream. Which both seem interesting and necessary to understand > what's going on. When do you care about the distance purely in blocks, rather than IOs? If you can't actually have IO concurrency, due to io combining and yielding / low pin limits never actually allowing multiple IOs, you'll have no gain from AIO. Greetings, Andres Freund