Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umw4S-0004WD-6t for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Aug 2025 15:10:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umw4R-0005L4-LC for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Aug 2025 15:10:20 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umw4Q-0005Km-3R for pgsql-hackers@lists.postgresql.org; Fri, 15 Aug 2025 15:10:19 +0000 Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1umw3s-0000Ab-0o for pgsql-hackers@lists.postgresql.org; Fri, 15 Aug 2025 15:10:17 +0000 Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfhigh.stl.internal (Postfix) with ESMTP id 895077A015F; Fri, 15 Aug 2025 11:09:43 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-01.internal (MEProxy); Fri, 15 Aug 2025 11:09:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1755270583; x=1755356983; bh=PtP0KO/P7VCITHi58wP2lByaOwQzOVbn3K698xVB3QI=; b= MmDvChF6qgw/mdxUq8P2wj/iBZbaok866Tkqva5siNVa8XB9NspyTqnEERp20G/u e/xHwa7cPyCHMqObKY5y/zgNLcRgRzhcThJMIsH/XEMTaQrhGiZy2vgmaDkfndvw IAVFd5pB60I1qhY3NytkzUGnhPDGbzN0RMwarrKFLTToI3NCfhiD7zXmJoQAQmlA +LZNVV4wDytVSy62HNPO99FtErxW6d9VSv45HVQYiXKBvUSrxE4xVa+I09Yl5d5j n5QAP5z4DKicKPmNLDDvneqAJujqWuVyMxYAfFm2VH1O4FrkNzeUJ4r5DVtU8Bmy a1xeBVSDQq/ewYD7rKNpRg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1755270583; x= 1755356983; bh=PtP0KO/P7VCITHi58wP2lByaOwQzOVbn3K698xVB3QI=; b=c 4VNEw8iT+a2rWEGXXqOiGKiA7sntmssfqIGhNYCzOW5eUMCgARHBkuWeuSOWB9iw YyEJRoq4P9TRxbm96tfCPGfDK58r1H1ncau+YhS4XKuUi6d0JSIW8t1joBnV8Mn/ j0lEChtM8n6lXhdwECrvFvDvIDfujK6awg1CZ0dKr3xqPEWq3DYhjHQDWh+Kr5vC fqLmwfUzVZHtm6bNdDAis/n7ZucHTkUqn4fE/Anp+xAXXu4Ze+0bccXcAnVGZ3hF 03DkwHyGQrFB4KU2Rto10gn7/YLNIGOOY9s9FKjk/zT5Acsa5z1w/GBSjfiQ5UOt mn2x80fBX+yY3O0PhWHLA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddugeegvdelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehmkefsredttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpeejudejfeffhffhgfdvteefteekveeflefgleejjeejtefhteehuefhvedt ffevueenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddtpdhmohgu vgepshhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhope hknhhiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudes ghhmrghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrd gtohhmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtgho mhdprhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpth htohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhg shhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhgpdhrtg hpthhtohepghhkohhkohhlrghtohhssehprhhothhonhhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 15 Aug 2025 11:09:42 -0400 (EDT) Date: Fri, 15 Aug 2025 11:09:41 -0400 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev> References: <6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok> <5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="j4imdtjxqt5tb6mo" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5v2wuxg65l5e3s6uf373zskcqqoukmraxiucnvgn4t7b5cmeqx@5mhqsurdj6xn> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --j4imdtjxqt5tb6mo Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Hi, On 2025-08-14 19:36:49 -0400, Andres Freund wrote: > On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: > > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan wrote: > > > > We can optimize that by deferring the StartBufferIO() if we're encountering a > > > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure > > > > real-world queries will often encounter the pattern of the same block being > > > > read in by a read stream multiple times in close proximity sufficiently often > > > > to make that worth it. > > > > > > We definitely need to be prepared for duplicate prefetch requests in > > > the context of index scans. > > > > Can you (or anybody else) think of a quick and dirty way of working > > around the problem on the read stream side? I would like to prioritize > > getting the patch into a state where its overall performance profile > > "feels right". From there we can iterate on fixing the underlying > > issues in more principled ways. > > I think I can see a way to fix the issue, below read stream. Basically, > whenever AsyncReadBuffers() finds a buffer that has ongoing IO, instead of > waiting, as we do today, copy the wref to the ReadBuffersOperation() and set a > new flag indicating that we are waiting for an IO that was not started by the > wref. Then, in WaitReadBuffers(), we wait for such foreign started IOs. That > has to be somewhat different code from today, because we have to deal with the > fact of the "foreign" IO potentially having failed. > > I'll try writing a prototype for that tomorrow. I think to actually get that > into a committable shape we need a test harness (probably a read stream > controlled by an SQL function that gets an array of buffers). Attached is a prototype of this approach. It does seem to fix this issue. New code disabled: #### backwards sequential table #### ┌──────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10291 read=49933 │ │ I/O Timings: shared read=213.277 │ │ Planning: │ │ Buffers: shared hit=91 read=19 │ │ I/O Timings: shared read=2.124 │ │ Planning Time: 3.269 ms │ │ Execution Time: 1023.279 ms │ └──────────────────────────────────────────────────────────────────────┘ (10 rows) New code enabled: #### backwards sequential table #### ┌──────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10291 read=49933 │ │ I/O Timings: shared read=217.225 │ │ Planning: │ │ Buffers: shared hit=91 read=19 │ │ I/O Timings: shared read=2.009 │ │ Planning Time: 2.685 ms │ │ Execution Time: 602.987 ms │ └──────────────────────────────────────────────────────────────────────┘ (10 rows) With the change enabled, the sequential query is faster than the random query: #### backwards random table #### ┌────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_randomized_pk on t_randomized (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=6085 read=77813 │ │ I/O Timings: shared read=347.285 │ │ Planning: │ │ Buffers: shared hit=127 read=5 │ │ I/O Timings: shared read=1.001 │ │ Planning Time: 1.751 ms │ │ Execution Time: 820.544 ms │ └────────────────────────────────────────────────────────────────────────────────────────────┘ (10 rows) Greetings, Andres Freund --j4imdtjxqt5tb6mo Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename*0="v1-0001-bufmgr-aio-Prototype-for-not-waiting-for-already-.pat"; filename*1="ch.txt"