Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umhVC-002AtG-R9 for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 23:36:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umhVA-00Aglw-7x for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 23:36:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umhV9-00Aglo-11 for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 23:36:56 +0000 Received: from fhigh-b2-smtp.messagingengine.com ([202.12.124.153]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1umhV6-000crV-2Q for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 23:36:54 +0000 Received: from phl-compute-10.internal (phl-compute-10.internal [10.202.2.50]) by mailfhigh.stl.internal (Postfix) with ESMTP id 8DC197A00BF; Thu, 14 Aug 2025 19:36:51 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-10.internal (MEProxy); Thu, 14 Aug 2025 19:36:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1755214611; x=1755301011; bh=Ajoctt0xpNQDzDPljC/40wm5zwEWHIFvHaAZRA8h6Gw=; b= Q18g//73MQ+oLJHlnn0fh4zZNpWwa4/qrihPieQ8qpjqE7I19UGiTlTL4TU3zxgp MTN5AhdEMlg7ADrGXFgEDzLlKLxAaC/d+GdOBk6QPLIVMieS1qHsCyHlZVG5Jdvo z0wJN7Lf3YauK/T86oWeWLFv17D/HsshcHJZLkXmofcg56k1F2M3q0+rCNpGzx89 zYwvDzXtgJGExGm66hUWlxLB23eYN0g36OTHTQVqn6PNapmtAJvt61fUybMJhqNx jmAFdBjkSIGkttoc9Lk4DcACwxWdeLam5X+O1+DFK1scRxoxdyf0v53l6Erp1Bh4 mbRDEWgO9f+O1Hzl7ReABw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1755214611; x= 1755301011; bh=Ajoctt0xpNQDzDPljC/40wm5zwEWHIFvHaAZRA8h6Gw=; b=j oOW+Gicp6NG7mvardKctOq8H/pIwZY3o/9v0DAM72Vp5yJVhkOC2q7tgwFeEx+z9 JXtP7yD9mzRr2acJQ38B/MzcTZZNyBkzkowh9L6irSD07/+6SSrzR0vP1UqAI7pN QjQLEQTrArQNKE0b6zthkTqhYTgVwieUkLn4bj7EIjHNiQNiLCKMJlS4mDlCloyy F0S48lGW8WaTQelTqZR5eyh9/LxntnSLDhBdvxA9l7f+r27s+wEhnSIZDsqxvVF0 uBfxw93hysf5PkKxysVw9tRQhOCn7BqLkpM0ELKoqVnp4wq8OLTPtU3GhK+4RbAO nBOs/WdyVZzuR4Eyxj9yg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddugedvgedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfei gffhvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddtpdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhope hknhhiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudes ghhmrghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrd gtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtgho mhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpth htohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhg shhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhgpdhrtg hpthhtohepghhkohhkohhlrghtohhssehprhhothhonhhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 14 Aug 2025 19:36:50 -0400 (EDT) Date: Thu, 14 Aug 2025 19:36:49 -0400 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn> References: <7c308720-5cd4-4508-8cc0-6e55a6352731@vondra.me> <6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > > > We can optimize that by deferring the StartBufferIO() if we're encountering a > > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure > > > real-world queries will often encounter the pattern of the same block being > > > read in by a read stream multiple times in close proximity sufficiently often > > > to make that worth it. > > > > We definitely need to be prepared for duplicate prefetch requests in > > the context of index scans. > > Can you (or anybody else) think of a quick and dirty way of working > around the problem on the read stream side? I would like to prioritize > getting the patch into a state where its overall performance profile > "feels right". From there we can iterate on fixing the underlying > issues in more principled ways. I think I can see a way to fix the issue, below read stream. Basically, whenever AsyncReadBuffers() finds a buffer that has ongoing IO, instead of waiting, as we do today, copy the wref to the ReadBuffersOperation() and set a new flag indicating that we are waiting for an IO that was not started by the wref. Then, in WaitReadBuffers(), we wait for such foreign started IOs. That has to be somewhat different code from today, because we have to deal with the fact of the "foreign" IO potentially having failed. I'll try writing a prototype for that tomorrow. I think to actually get that into a committable shape we need a test harness (probably a read stream controlled by an SQL function that gets an array of buffers). > FWIW it wouldn't be that hard to require the callback (in our case > index_scan_stream_read_next) to explicitly point out that it knows > that the block number it's requesting has to be a duplicate. It might > make sense to at least place that much of the burden on the > callback/client side. The problem actually exists outside of your case. E.g. if you have multiple backends doing a synchronized seqscan on the same relation, performance regresses, because we often end up synchronously waiting for IOs started by another backend. I don't think it has quite as large an effect for that as it has here, because the different scans basically desynchronize whenever it happens due to the synchronous waits slowing down the waiting backend a lot), limiting the impact somewhat. Greetings, Andres Freund