Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6G9g-0047IC-2c for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 22:59:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w6G9f-00Cf78-0s for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Mar 2026 22:59:51 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6G9e-00Cf70-2q for pgsql-hackers@lists.postgresql.org; Fri, 27 Mar 2026 22:59:51 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w6G9c-00000001Z1s-2p5F for pgsql-hackers@postgresql.org; Fri, 27 Mar 2026 22:59:51 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-66a9a2187a5so4126847a12.0 for ; Fri, 27 Mar 2026 15:59:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774652388; cv=none; d=google.com; s=arc-20240605; b=DUlRF604muPooiH8PH58T0IkOCQFQTPL8MrhbCJ2b2yBn+8cgExCBB+Fpcrv295J8X YkD7FQ0D4HPNr2JcMUfkZ0Na3exHnNNqVStCUwQwnorI+yOrgdA2KenTLcZOJmZDs7fl 4RO7lbmou8z+hl5lDW2NBh+gxmQdWKrId501ZZ6kvid2ub1yhMvLXlXffzuMfYweCWNN F2c8//5nMJjFRo/FR3OlOUxGNJQMLYsm4jWP0/EAVb1ETnnqtQlup3AFVDsqEObIDbl2 JR3J12KdvaqhWwSGKTyVSkShOIrrghBlPs0TI5XITEJHQMJnvJYpxbfje8ha96NMQcxE QNBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=UPt1VQuwgXyGyNGx29Hb5077gIfB22Py8IQMR5oZslo=; fh=LwwinJ608iGi75ucoQtCMWpz4+kDziLSsTISwNYX740=; b=Jd+0NCoJjYGc79GdzrkgoHHew0TamNLjlvTR9KP9UVqaL8eyZKT9lkcreaCU5keYGt /YF6VMMM2jnBPg/6Lw6N/qHLV98wfXZLCQbn7G8jog8OG3pSvAkJDMQX7MQc/+vclwND 12OWuKNCI7VFfvTrIoWJxIS/D1VxEK9zSMvSx/r1xvMe00HpWamBG5miNiG/ak70NPyT Xnr8y8bP96/+MgUixhfWRKAXHHEBDvsBFqigl7WYwWKI/mK6uCEvSiyhsB5FcJuUEE5e WovsLDq9IEcIY7l3NPX2LmVhPGk665w3ErOSas71ZW/CJtg4UZqXLyXU2zhqWtMyLacy GNsA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774652388; x=1775257188; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UPt1VQuwgXyGyNGx29Hb5077gIfB22Py8IQMR5oZslo=; b=gbbgINLv9ZOCRFA37yMw7tuBxqiZECh8MGWA3sm2jrc5ewjK1XIs0CfcClfQfmixMy vldDrXQDHAeh6CfRbSi0uDRlElHNZ4r9F3PtwCa3dK2n+19lBjK/ZnpDKPrjADQxvifb xqMMzc8rIFewOmVUaoKUuL+0FTA6Nj1g6w+xsqIaO7VWx8sGeRib/v9E7wYX0nToUoax lQMMFYlFSWTFAyS0zWg8SEmSsc1kORQCKWSei0fsONvYeKw67d79H92KP/gjL2aKaO1p 29e29NcfHZQMFnfNOvdIONTzfMUge6jyw4WJAX9RvMCTx1yXI13w3/NxXHIBXtNa1JdY BqYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774652388; x=1775257188; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UPt1VQuwgXyGyNGx29Hb5077gIfB22Py8IQMR5oZslo=; b=pbGVwz+1LCbGW6p5SxEs4Jj9fqdV3jJxRKwfU5fcZ6eu8rj9xbYJTRT+y51Y8W15No 3sZbLULxB3xKqfFS2ymuZCbr/RFyBAvjfsf9oQ8CtscFq9dvRVqXKNi4XdnAykZo7j3C WX3kcFZp16EbUcXqBNLlgxO0BPevo1ld5YEcyxdpRij31zmbC/9Vmu5oEXWLVL5/N/gh Oh6I20iWatly54BHOlr86YLL6PyiP1k35jRgPXCKQJBinZFGM2BbI2yuGAeSrHBkP4nD HCrBthzyEHiuG/J7/kEzHqJacW1g1zwAgBdcLbuAREevkjhVWmohrlUhLWxCsnFX8gsB PcHQ== X-Forwarded-Encrypted: i=1; AJvYcCXvLPXD5uNpHBq1gW5yCpWwC3n88PxVNxiXvkfQQXyHoPuzm6r6U7oV0MoX2uqvTkHWNX6W7KT/XPmUjlZ1@postgresql.org X-Gm-Message-State: AOJu0Yy+Atyzdvd4PDLJc5szOtlrr4nmazB6bIPFs+J8WOluUZ8h9LzD r6rZefA3HC02uXbbaTEl2DTK4pClfHVmSK5kDdhJ+ra/hgr6pqn7ilkR3XZ3hfZ88uode3P3oBE QcPYn97edCYBBuc6AlI26J7kvFEjJSVc= X-Gm-Gg: ATEYQzwrVVYGgxEkVuvZLN1UqFq1yp1ZExl/MmEdj5zYfqpyjtshRa/d8VEC3/BF5/7 gSXSAxg7zA8DcaimDWTM3K4Y0L21pcajOYy0cYkDHmm+pvKSI9rFcz8i069BDpWJRsQeu6qH1gN mjVxXHRMWQZdFYDAPvz6oXxseQuDxdF+fjjbO94jjIFSU/8hfcudLC3t+GGanMvTaGKFCOYtTaw x8oRdJy/n/bH46JPeJhU1D1cGbTt3Iwkw2gei1ZIn5PGd9BoqnUeWDN/gHitjbnj5nsPk2kQiSO xJoURJWSFwjzhzUie3AWccW7pzjW496xdCAQyi+VUmMCps78tBouX1I+Cq4OJEnNGM1Rw0u1nNW HuA0Z6Cwf X-Received: by 2002:a05:6402:a0d6:b0:668:8085:28f4 with SMTP id 4fb4d7f45d1cf-66b28c527c4mr2331027a12.16.1774652387418; Fri, 27 Mar 2026 15:59:47 -0700 (PDT) MIME-Version: 1.0 References: <42rdu4q44kvsq53fz5qgzuawqpaytvnemsnquynlfch5mqhc2m@6ytnlgivtzro> In-Reply-To: From: Melanie Plageman Date: Fri, 27 Mar 2026 18:59:36 -0400 X-Gm-Features: AQROBzCh5WQPVvwEC2uJikzaz14ZGrUN-ejNXaYvABmzg4Bnfgf0mq2mT-Ru6P4 Message-ID: Subject: Re: Don't synchronously wait for already-in-progress IO in read stream To: Andres Freund Cc: Nazir Bilal Yavuz , Thomas Munro , pgsql-hackers@postgresql.org, Peter Geoghegan , Tomas Vondra Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Mar 27, 2026 at 1:29=E2=80=AFPM Andres Freund = wrote: > > Hi, > > I think I forgot to update the thread in my last message to note that I > had committed some of the preliminary changes. <--snip--> > I've also done a bunch of cleanup in the commits. A few typos in commit > messages and the actual code changes and a few larger changes in the test= code > & infrastructure. Mostly as part of allowing the aforementioned testing > (read_buffers() now only waits at the end, to make some of the tests > possible), but also just making the modified code a bit cleaner. Review of 0004: I noticed I use the word "backend" _a lot_ in this commit message. Here is an attempt at using it less: aio: Don't wait for already in-progress IO When a backend attempts to start a read IO and finds the first buffer already has I/O in progress, previously it waited for that I/O to complete before initiating reads for any of the subsequent buffers. Although it must wait for the I/O to finish when acquiring the buffer, there's no reason to wait when setting up the read operation. Waiting at this point prevents starting I/O on subsequent buffers and can significantly reduce concurrency. This matters in two workloads: when multiple backends scan the same relation concurrently, and when a single backend requests the same block multiple times within the readahead distance. Waiting each time an in-progress read is encountered effectively degenerates the access pattern into synchronous I/O. To fix this, when encountering an already in-progress IO for the head buffer, the wait reference is now recorded and waiting is deferred until WaitReadBuffers(), when the buffer actually needs to be acquired. In rare cases, it may still be necessary to wait synchronously at IO start time: if another backend has set BM_IO_IN_PROGRESS on the buffer but has not yet set the wait reference. Such windows should be brief and uncommon. Also, I'd say the tests (and the original prototype) qualify you for co-authorship of the patch, but you do you. Code and test stuff: in query_wait_block() - $node->poll_query_until('postgres', - qq(SELECT wait_event FROM pg_stat_activity WHERE pid =3D $pid), - $waitfor); + my $waitquery; + if ($wait_current_session) + { + $waitquery =3D + qq(SELECT wait_event FROM pg_stat_activity WHERE pid =3D $pid); + } + else + { + $waitquery =3D + qq(SELECT wait_event FROM pg_stat_activity WHERE wait_event =3D '$waitfor'); + } + + note "polling for completion with $waitquery"; + $node->poll_query_until('postgres', $waitquery, $waitfor); I guess you need WHERE wait_event =3D $waitfor to keep from getting more than one row returned and then failing to parse properly. But I got tripped up on this thinking won't poll_query_until() already give you that? I started thinking maybe wait_current_session should be a hash so we can pass it with a name and it will make the query_wait_block() call sites less inscrutable, but maybe that's over-engineering. # Because no IO wref was assigned, block 2 should not report foreign IO pump_until($psql_a->{run}, $psql_a->{timeout}, \$psql_a->{stdout}, qr/0\|1\|t\|f\|2\n2\|3\|t\|f\|3/); you mean block 3 # Because no IO wref was assigned, block 2 should not report foreign IO pump_until($psql_a->{run}, $psql_a->{timeout}, \$psql_a->{stdout}, should say block 3 # Tests for StartReadBuffers() that dependent on injection point support s/dependent/depend You could change the first test # Test if a read buffers encounters AIO in progress by another backend, it # recognizes that other IO as a foreign IO. To have 0 as the foreign IO (which is a slightly different code path than non-head blocks being the foreign IO) and then you still basically have coverage of a non-head block being a foreign IO in the following test case that looks for multiple contiguous blocks being foreign IO. In test_read_buffers_inject: # recognizes that other IO as a foreign IO. This time we encounter the # foreign IO multiple times. I find "foriegn IO multiple times" hard to parse. I prefer something like "multiple buffers undergoing foreign IO" # B: Read block 2 and wait for the completion hook to be reached (which cou= ld # be in B itself or in an IO worker) should say blocks 2-3 I wonder if it is also worth testing a failed foreign IO (i.e. operation->foreign_io && !(buf_state & BM_VALID) in WaitReadBuffers()). I don't know that it is much different than the other failed IO_IN_PROGRESS cases you are already testing. Otherwise, I think we're ready to go! - Melanie