Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vIDmi-000A2b-Bm for pgsql-hackers@arkaria.postgresql.org; Sun, 09 Nov 2025 22:21:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vIDmg-00AWKw-Hh for pgsql-hackers@arkaria.postgresql.org; Sun, 09 Nov 2025 22:21:18 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vIDmg-00AWKf-6M for pgsql-hackers@lists.postgresql.org; Sun, 09 Nov 2025 22:21:18 +0000 Received: from mail-pj1-x1035.google.com ([2607:f8b0:4864:20::1035]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vIDme-006GzL-0g for pgsql-hackers@postgresql.org; Sun, 09 Nov 2025 22:21:17 +0000 Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-340c2dfc1daso389839a91.2 for ; Sun, 09 Nov 2025 14:21:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762726875; x=1763331675; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZQILIfzPq4z/vrn7eQESNuoHjar59yezg5EzadmStts=; b=KCW+RuQVEpfWACWvix39uLM8rjLTLxprH6vkioJcu0xcUgUB2eNw/pCBjjGuf8OUCv LBWKiH8YZGi1vxBUcKzp0CY4lBcy4EQscCcGSnfhBH9kV8KY5qMsDuatt8SXY4khfF9E Pr7sAJWLOlnEfSk6DJekVoHTJYNCmEpHF+3nUW+q8YRyn9c0ZIB9Dh7QAo5Ye8G5rXP5 6PGgHXEkel4x47uynCFhq7US+wBMWZ0BuDrcjTc+/Mev0UFMciWOg+Iq3Mgr+Y7hE66T w/FvW4K5Nor71/r1Wo8862QOosaYbiaPkWZ8e7bXigA6QN8Lcc58i7XyP/f4pyUVwozq tPuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762726875; x=1763331675; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ZQILIfzPq4z/vrn7eQESNuoHjar59yezg5EzadmStts=; b=KSW3Q8AYdmRhTUharHXNdrghwS8q8D9F9AiIWoADonuRZwRQRAZ/VptWll5f1MNwio eR94Y5noEtNOeIFm8jiwwzp0pXTjijdC9KqE0iTNRB40hhppVVDfkd1UxWQqX5rplO2R 69GYdt+QqNPv4IH6+FKp014Kb/l6P5k2Pc4pFVZTZaNqdmDYHkIOINZIYpqMUG1xBLFl mnbhZz/9YL8J2JeZFZNePq2uU2v4fxnwHII0mJ27ZEmC2W1H8gi7N4XB6yqu+XXEPGyW VymSpr68PM3V/qKTfwogXLRua9MEyKHnwFrFf5Gg5EEaZJrF/zLGlFF9CrX3V5bAkV54 jsVg== X-Gm-Message-State: AOJu0YxYMX127iAi6c2s+/fFoTw2Dhy32x56RtDHIYPsYiykJ13S8EVY Z48OOM6rGr99Wv3VTeXVncTApxIsJQ7Q3EkMjWsYpLeC2l+pjSfFC1Q7PASsn/cyOtzekyX2hsj ddp9evAJO5wBEl/WWStrPgyx2WAid430= X-Gm-Gg: ASbGnctVFiSyPN8haePaX9SI7MDcoc0rQfqLP9zAUZ8RJAJ+IaoCXqJohqK5Y77gRhD M1+CB/hqRHlvY+BPPhVfauzV4ILWhK++Yp4jlh7nVOatluNvLyXn6/XdhC9j3/cMC/prJU/jL9f 7s0MA0lpr2fguFKzcRCe6Vr1TzTbohCuSSe+XhGui7iLs3UrrxZMGFT3M9R12loK1wW4jinQTbO YhS8YVdIzgzq6M/M5BqQumea+NDRsDgxTUwUPygD2SjZdYJUUUFudetTlBXWocMOnKhKeOMYrct 5vGNBJS51/frQZnJAN0Fl170KQkOy+Fcr5QiIR1owLj6Jq8CrVO0 X-Google-Smtp-Source: AGHT+IEX5eSlyXXhpsgBsAWWB65Ea5AZyg0ghq0xoAAthiGTUkqu7XLWwr6GaS+jY/8fzbaDm9iEJJK0lAkUhLLjiWo= X-Received: by 2002:a17:902:ea0c:b0:298:f0c:6d36 with SMTP id d9443c01a7336-2980f0c6f48mr13671265ad.5.1762726875129; Sun, 09 Nov 2025 14:21:15 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Thomas Munro Date: Mon, 10 Nov 2025 11:20:38 +1300 X-Gm-Features: AWmQ_bnizhyWGeJaWmi3ByDbLRHHNmoyC5YALJp5mk7L-SXUomTugDNRLXxN35w Message-ID: Subject: Re: Don't synchronously wait for already-in-progress IO in read stream To: Andres Freund Cc: pgsql-hackers@postgresql.org, Peter Geoghegan , Tomas Vondra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Sep 12, 2025 at 9:46=E2=80=AFAM Andres Freund = wrote: + * It's possible that another backend starts IO on the buffer between = this + * check and the ReadBuffersCanStartIO(nowait =3D false) below. In tha= t case + * we will synchronously wait for the IO below, but the window for tha= t is + * small enough that it won't happen often enough to have a significan= t + * performance impact. + */ + if (ReadBuffersIOAlreadyInProgress(operation, buffers[nblocks_done])) ... /* * Check if we can start IO on the first to-be-read buffer. * - * If an I/O is already in progress in another backend, we want to wai= t - * for the outcome: either done, or something went wrong and we will - * retry. + * If a synchronous I/O is in progress in another backend (it can't be + * this backend), we want to wait for the outcome: either done, or + * something went wrong and we will retry. */ if (!ReadBuffersCanStartIO(buffers[nblocks_done], false)) "..., or an asynchronous IO was started after ReadBuffersIOAlreadyInProgress() (unlikely), ..."? I suppose (or perhaps vaguely recall from an off-list discussion?) that you must have considered merging the new is-it-already-in-progress check into ReadBuffersCanStartIO(). I suppose the nowait argument would become a tri-state argument with a value that means "don't wait for an in-progress read, just give me the IO handle so I can 'join' it as a foreign waiter", with a new output argument to receive the handle, or something along those lines, and I guess you'd need a tri-state result, and perhaps s/Can/Try/ in the name. That'd remove the double-check (extra header lock-unlock cycle) and associated race that can cause that rare synchronous wait (which must still happen sometimes in the duelling concurrent scan use case?), at the slight extra cost of having to allocate and free a handle in the case of repeated blocks (eg the index->heap scan use case), but at least that's just backend-local list pushups and doesn't do extra work otherwise. Is there some logical problem with that approach? Is the code just too clumsy?