Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5PxA-003EPX-24 for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 15:15:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5Px8-00EcKs-04 for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 15:15:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5Px7-00EcKk-0W for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 15:15:26 +0000 Received: from fout-a2-smtp.messagingengine.com ([103.168.172.145]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w5Px4-000000015Qx-1ALv for pgsql-hackers@postgresql.org; Wed, 25 Mar 2026 15:15:25 +0000 Received: from phl-compute-02.internal (phl-compute-02.internal [10.202.2.42]) by mailfout.phl.internal (Postfix) with ESMTP id 7C2B6EC0273; Wed, 25 Mar 2026 11:15:20 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-02.internal (MEProxy); Wed, 25 Mar 2026 11:15:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1774451720; x=1774538120; bh=e1acYOpklLmE5KHGnPS5tnYLIctasDPvokNl5QF3E3Q=; b= rHgC7CTl99cLHEbvv67ZSjVYz00FH0/sf/UWh6MfUwLJlkdNep31qroz9g9V6vto Kcibj4JZY3qnqapq+i0Xla2Wf0zE3oOsaMNNSC14V4qkU+Li82GW6wl+1HIr1c3P Gm5LfFzeNI/bIK57WNmvN4pkt4/NLnZ/aVhkv+BBCtQSqhvm1R7VgjWFbFifaswh a1aFPRNVDzMY8KOoizhxuDk5FpsK9wuv0/0Ue55kWMCXyycKBrRJSMrZ5bhDzut7 2kviOnqK5xWS+h9mfY9Uj+4T8Cyz3/SuJKMHTiA8qLqZaZR2GgIdBC7EhmUu/nG+ tnbFZZ0nwnZbwSUUL3GhvQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1774451720; x= 1774538120; bh=e1acYOpklLmE5KHGnPS5tnYLIctasDPvokNl5QF3E3Q=; b=d 89ER4tJJJXksXs+t9FCr7ZWvjM6ava+YY1E4pNmVO3v3+p6smFLtzqwCSQbc9ZkL AXkv8jD8oBmjT8/lJriFCLRTiZAljSrCYzfdwXdU3vnN8MZ+vyQ/jwwsn5H1yvgP 1AN/knE/bkZAV+iFI0Th8pjXX/QUJ1/OUpC95Pdx1lArduWirc5lYU1jhBvZgsj4 foTzFkcboLTbVw5ljrXJP3uXRdy16sKZRICssohcS3WfuvlQZh+AaxtlW6qNysAe t97BHZCKDh3hm5lGrXfQO1p9eT9wkZ6MC0fhpYUjdqEkPJ0Yn/22foFsmCQcLZF/ 57bGMjMmR+iNyfTSUcgVQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdefvdegjeelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfei gffhvdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpe grnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepiedpmhhouggv pehsmhhtphhouhhtpdhrtghpthhtohepphhgsegsohifthdrihgvpdhrtghpthhtohepth hvsehfuhiiiiihrdgtiidprhgtphhtthhopegshigrvhhuiiekudesghhmrghilhdrtgho mhdprhgtphhtthhopehmvghlrghnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhmpd hrtghpthhtohepthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthht ohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 25 Mar 2026 11:15:19 -0400 (EDT) Date: Wed, 25 Mar 2026 11:15:18 -0400 From: Andres Freund To: Melanie Plageman Cc: Nazir Bilal Yavuz , Thomas Munro , pgsql-hackers@postgresql.org, Peter Geoghegan , Tomas Vondra Subject: Re: Don't synchronously wait for already-in-progress IO in read stream Message-ID: References: <42rdu4q44kvsq53fz5qgzuawqpaytvnemsnquynlfch5mqhc2m@6ytnlgivtzro> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-03-20 15:50:59 -0400, Melanie Plageman wrote: > On Thu, Mar 19, 2026 at 6:22 PM Andres Freund wrote: > > > > Thinking about it more, I also got worried about the duplicating of > > logic. It's perhaps acceptable with the patches as-is, but we'll soon need > > something very similar for AIO writes. Then we'd end up with like 5 variants, > > because we'd still need the existing StartBufferIO() for some cases where we > > do want to wait (e.g. the edge case in ExtendBufferedRelShared()). > > > > In the attached prototype I replaced your patch introducing > > PrepareHeadBufferReadIO()/ PrepareAdditionalBufferReadIO() with one that > > instead revises StartBufferIO() to have the enum return value you introduced > > and a PgAioWaitRef * argument that callers that would like to asynchronously > > wait for the IO to complete (others pass NULL). There are some other cleanups > > in it too, see the commit message for more details. > > I've come around to this. The aspect I like least is that io_wref is > used both as an output parameter _and_ a decision input parameter. Do you have an alternative suggestion? We could add a dedicated parameter for that, but then it just opens up different ways of calling the function with the wrong arguments. > Some callers never want to wait on in-progress IO (and don't have to), > while others must eventually wait but can defer that waiting as long > as they have a wait reference. If they can't get a wait reference, > they have no way to wait later, so they must wait now. The presence of > io_wref indicates this difference. > > I think it's important to express that less mechanically than in your > current header comment and comment in the else block of > StartSharedBufferIO() where we do the waiting. Explaining first—before > detailing argument combinations—why a caller would want to pass > io_wref might help. I'm not entirely sure what you'd like here. Would the following comment do the trick? * In several scenarios the buffer may already be undergoing I/O in this or * another backend. How to best handle that depends on the caller's * situation. It might be appropriate to wait synchronously (e.g., because the * buffer is about to be invalidated); wait asynchronously, using the buffer's * IO wait reference (e.g., because the caller is doing readahead and doesn't * need the buffer to be ready immediately); or to not wait at all (e.g., * because the caller is trying to combine IO for this buffer with another * buffer). * * How and whether to wait is controlled by the wait in io_wref parameters. In * detail: * * > However, I do think you need to enumerate the different combinations > of wait and io_wref (as you've done) to make it clear what they are. > > I, for example, find it very confusing what wait == false and io_wref > not NULL would mean. If IO is in progress on the buffer and the > io_wref is not valid yet, the caller would get the expected > BUFFER_IO_IN_PROGRESS return value but io_wref won't be set. I could > see callers easily misinterpreting the API and passing this > combination when what they want is wait == true and io_wref not NULL > -- because they don't want to synchronously wait. Hm. I started out proposing that we should just add an assert documenting this is a nonsensical combination. But when writing the comment for that I realized that it theoretically could make sense to pass in wait == false and io_wref != NULL, if you wanted to get a wait reference, but would not want to do a WaitIO() if there's no wait reference set. I don't think that's something we need right now, but ... > I don't have any good suggestions despite thinking about it, though. > > Two other things about 0007: > > for (int i = nblocks_done + 1; i < operation->nblocks; i++) > { > /* Must be consecutive block numbers. */ > Assert(BufferGetBlockNumber(buffers[i - 1]) == > BufferGetBlockNumber(buffers[i]) - 1); > > status = StartBufferIO(buffers[nblocks_done], true, false, NULL); > > Copy-paste bug above, should be StartBufferIO(buffers[i],... > > I would mention that currently BUFFER_IO_IN_PROGRESS is not used in > the first StartBufferIO() case, so that is dead code as of this commit Whaaat. Why did this even pass tests??? I guess it just always failed to start IO because there already was IO on the buffer and that was good enough to get through. Clearly a testing gap. Not entirely trivial to test though :(. Greetings, Andres Freund