Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vo2ep-00EwbS-25 for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Feb 2026 16:56:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vo2eo-000AYu-1a for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Feb 2026 16:56:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vo2eo-000AYm-0M for pgsql-hackers@lists.postgresql.org; Thu, 05 Feb 2026 16:56:42 +0000 Received: from mail-dy1-x132c.google.com ([2607:f8b0:4864:20::132c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vo2el-00000001Dg2-2hkO for pgsql-hackers@postgresql.org; Thu, 05 Feb 2026 16:56:41 +0000 Received: by mail-dy1-x132c.google.com with SMTP id 5a478bee46e88-2b74f839bdfso1411031eec.1 for ; Thu, 05 Feb 2026 08:56:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770310596; cv=none; d=google.com; s=arc-20240605; b=h0Cj9IwPDHUpuGrd77UdIDVvG/Qms569Nk53mbBQuYeNZo9NNy3rXD6z4q7pi/szst BACOpFH9mUi4I/fHPZaap1Les48GgKcnc1BGSD6QIriNUQxLnXPQ0LvkwehWAuKR+M0G s76ZX1QFyvPV0UIg1boCxDieIsC3OHy7GIuO1gXuGi7gmANwV/1gZvIhKEOkufG3dihg yuU/76ihrx40YpcbESG6JCcWIQLkHl98zRFDmHKX3DLj3zdzgphQYeZTcWE1K+mAn4Jz RixeRzCgmO1rXBpbCtye4Cbn/IHqhtB/ib6K/6SA0WZFqPYuFah2yVeqSFAThAdG69po sPGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=J6T2Gf0wZt43XOAtCqXxK9XSqvHalU8V3Hcp85XNUNc=; fh=Axk88V8DagfAhH3G55yjyXVh6wBqCRQmg0wWFdSddu4=; b=fCewOmVQ3XSo/UZP06Ggtpj6pTnx34zyHW5u+9c1m8RT8q4Tr69gouWsyBd7Ld9wpW SX1Id8laqbftQFCt7T7GwJBWiZhMFkiQkmsj58VUTLl5CdBMEwmkIZAvUM+ZxWK8qVoD eI89KtoKIl2rjBGeyegttayu887QVkhkN57/vcYaK5x7Ejk+YnTiBO/qr5mDTNE6TZqf LuBb4DpBF3atJL68v6tjhdLEwQcsSQfE+R1lM8LVTy36u5MJxdJJXYttshg+s/O1oBE8 vonRoJr2ti1yR5wgZEwRQ9xTCr+31srHYef+2wvSBa9R6ddp5PRyPWJCeLdRwru4y5oM fYfQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770310596; x=1770915396; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=J6T2Gf0wZt43XOAtCqXxK9XSqvHalU8V3Hcp85XNUNc=; b=c9xlwc/8cn78RiPgE4MIYeRzByk4LG7hI6A0wY8poQ9wM/qSkbKUicRfPolb9s62oj vpCoM1Um0tIWKV8MWchpd2Lk3PelmIHYSRAJKK5rQz04jJK1KiAHlRXm0LndyA5aUn0i +qdsYDSgMjXdehbURpdTo9Et5QniRuJCR+SS/IejiFVaUgZZt4T5dkSAGeV2AV+W8E/d 8gDKOpu1k2j5/+pehUA6eWTgvbHaJdH1HyquF8eK/XY7yiazJUfCMxGpx2iq2dAaMuJG qczo1EfvVv+fYNuITW+MsrtYf0QgSRaBV2q3MF0dLYJCJa/KN44s7Dj4ozR3UQqVidJa uMHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770310596; x=1770915396; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=J6T2Gf0wZt43XOAtCqXxK9XSqvHalU8V3Hcp85XNUNc=; b=nKVpFkA9iFE4CNUVG4aAxvO8veWD/P/s072Xh4DSVzzmkw8SIcE8bCgad53tUL8Amb 3bSsxkXgb22bG+jjFlhkGMgVUFRdjLLNOB8gbqn/1pO86TcKyoGntuwLQnLq8V0r6wtn kB5LN2D56P1l5b/0b8XXort88cR9ZUpdCVuptpUppg2TXj/Yerh0VAmTrZlQ5qc6cBKD VCZqXI9TsDDiZ+CzWjuVCXphy5asTgjJbi231eH/kitdI+IJese/IIoh5/s+SalQULOy T7jBpBPDMvHhX6s7VkF35mv+ktDCucDzVyHwwDW81DKwinpDblT5jv5HbHiqEdNXviU/ Z0ag== X-Forwarded-Encrypted: i=1; AJvYcCUQDaV6JmcVGVt/c6itmmI/WO5M4HeQjNttNem3GfIe7I4hYWM+a2Jpxt3Y4sudKURf1wEREpqmTylB5NGA@postgresql.org X-Gm-Message-State: AOJu0YwT+ODdvK1ZpyQoY1GBRGPivwBti13gMBCkczKt9/YW0n85+tKW emi2GA7eWQ/r7Dr0g52sR4hcGbikG/4REa4OqtlIL66NLEudIuZku1mJdo/aF5MjsO5v1OPuUkp pP3W9EW5ZbJLA5PALQ+RZv3QWGm62alk= X-Gm-Gg: AZuq6aKz+iEVs9eETL/ruBZ71XnGBdAGjLmmbeqwXltr9fQo0C58kmxnDBf2khIohl6 aGlHbUB9npPUn6L0BsADMC009a9bxXOONnIZC4x2a2W8E8jrahi5XEhgm0TJRh+chG4FtZsvEks paQubOhjh3PimnCfE83d8ORJ56n2pgpuFYvti186R6XXMP1RveuJt8foAmzOQLu86vr0MUn61Of F7+6xiS68TGqrkL+mDVfbytIMS05IMAFzN2ctuTtN5FN0WtdO2N2idOtoJMDbJx7XIQmAL0HqvQ Bzym1Q== X-Received: by 2002:a05:7300:730c:b0:2b7:c285:837d with SMTP id 5a478bee46e88-2b845e15046mr1620751eec.4.1770310596296; Thu, 05 Feb 2026 08:56:36 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Nazir Bilal Yavuz Date: Thu, 5 Feb 2026 19:56:24 +0300 X-Gm-Features: AZwV_QhR-_LXsmitiUC8GAjOyB6T7ruV08RViVrBAwnQ6imV-DVfeqZh-yda284 Message-ID: Subject: Re: Don't synchronously wait for already-in-progress IO in read stream To: Melanie Plageman Cc: Thomas Munro , Andres Freund , pgsql-hackers@postgresql.org, Peter Geoghegan , Tomas Vondra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, Thank you for working on this! On Sat, 24 Jan 2026 at 00:04, Melanie Plageman wrote: > > On Sun, Nov 9, 2025 at 5:21=E2=80=AFPM Thomas Munro wrote: > > > > I suppose (or perhaps vaguely recall from an off-list discussion?) > > that you must have considered merging the new > > is-it-already-in-progress check into ReadBuffersCanStartIO(). I > > suppose the nowait argument would become a tri-state argument with a > > value that means "don't wait for an in-progress read, just give me the > > IO handle so I can 'join' it as a foreign waiter", with a new output > > argument to receive the handle, or something along those lines, and I > > guess you'd need a tri-state result, and perhaps s/Can/Try/ in the > > name. That'd remove the double-check (extra header lock-unlock cycle) > > and associated race that can cause that rare synchronous wait (which > > must still happen sometimes in the duelling concurrent scan use > > case?), at the slight extra cost of having to allocate and free a > > handle in the case of repeated blocks (eg the index->heap scan use > > case), but at least that's just backend-local list pushups and doesn't > > do extra work otherwise. Is there some logical problem with that > > approach? Is the code just too clumsy? > > Attached v3 basically does what you suggested above. Now, we should > only have to wait if the backend encounters a buffer after another > backend has set BM_IO_IN_PROGRESS but before that other backend has > set the buffer descriptor's wait reference. > > 0001 and 0002 are Andres' test-related patches. 0003 is a change I > think is required to make one of the tests stable (esp on the BSDs). > 0004 is a bit of preliminary refactoring and 0005 is Andres' foreign > IO concept but with your suggested structure and my suggested styling. > I could potentially break out more into smaller refactoring commits, > but I don't think it's too bad the way it is. I confirm that I am able to produce the regression that Andres mentioned with the patches excluding 0005, and 0005 fixes the regression. > A few things about the patch that I'm not sure about: > > - I don't know if pgaio_submit_staged() is in all the right places > (and not in too many places). I basically do it before we would wait > when starting read IO on the buffer. In the permanent buffers case, > that's now only when BM_IO_IN_PROGRESS is set but the wait reference > isn't valid yet. This can't happen in the temporary buffers case, so > I'm not sure we need to call pgaio_submit_staged(). I agree with you, I think we don't need to call pgaio_submit_staged() for the temporary buffers case. > - StartBufferIO() is no longer invoked in the AsyncReadBuffers() path. > We could refactor it so that it works for AsyncReadBuffers(), but that > would involve returning something that distinguishes between > IO_IN_PROGRESS and IO already done. And StartBufferIO()'s comment > explicitly says it wants to avoid that. > If we keep my structure, with AsyncReadBuffers() using its own helper > (PrepareNewReadBufferIO()) instead of StartBufferIO(), then it seems > like we need some way to make it clear what StartBufferIO() is for. > I'm not sure what would collectively describe its current users, > though. It also now has no non-test callers passing nowait as true. > However, once we add write combining, it will, so it seems like we > should leave it the way it is to avoid churn. However, other > developers might be confused in the interim. I don't have a comment for this. > - In the 004_read_stream tests, I wonder if there is a way to test > that we don't wait for foreign IO until WaitReadBuffers(). We have > tests for the stream accessing the same block, which in some cases > will exercise the foreign IO path. But it doesn't distinguish between > the old behavior -- waiting for the IO to complete when starting read > IO on it -- and the new behavior -- not waiting until > WaitReadBuffers(). That may not be possible to test, though. Won't 'stream accessing the same block test' almost always test the new behavior (not waiting until WaitReadBuffers())? Having dedicated tests for both cases would be helpful, though. My review: 0001: 0001 LGTM. --------------- 0002: diff --git a/src/test/modules/test_aio/t/004_read_stream.pl b/src/test/modules/test_aio/t/004_read_stream.pl +foreach my $method (TestAio::supported_io_methods()) +{ + $node->adjust_conf('postgresql.conf', 'io_method', 'worker'); + $node->start(); + test_io_method($method, $node); + $node->stop(); +} This seems wrong, we always test io_method=3Dworker. I think we need to replace 'worker' with the $method. Also, we can add check below to the test_io_method function in the 004_read_stream.pl: ``` is($node->safe_psql('postgres', 'SHOW io_method'), $io_method, "$io_method: io_method set correctly"); ``` Other than that, 0002 LGTM. --------------- 0003: > 0003 is a change I > think is required to make one of the tests stable (esp on the BSDs). 0003 LGTM. --------------- > 0004 is a bit of preliminary refactoring and 0005 is Andres' foreign > IO concept but with your suggested structure and my suggested styling. > I could potentially break out more into smaller refactoring commits, > but I don't think it's too bad the way it is. 0004: Nitpick but I prefer something like TrackBufferHit() or CountBufferHit() as a function name instead of ProcessBufferHit(). ProcessBufferHit() gives the impression that the function is doing a job more than it currently does. Other than that, 0004 LGTM. --------------- 0005: 0005 LGTM. However, I am still looking into the AIO code. I wanted to share my review so far. --------------- -- Regards, Nazir Bilal Yavuz Microsoft