Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8J6a-000PA2-2e for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 14:33:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8J5a-006SLG-2C for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 14:32:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8J5a-006SKo-14 for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 14:32:06 +0000 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8J5Y-00000000CRx-3kiP for pgsql-hackers@postgresql.org; Thu, 02 Apr 2026 14:32:05 +0000 Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-66e2f664a50so590214a12.2 for ; Thu, 02 Apr 2026 07:32:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775140322; cv=none; d=google.com; s=arc-20240605; b=kJYTJK8Sy0NJ02h/resJDnw/tu0i42x8cPIEv0VfoJz0ZaTyR/y0E8rhuxLzPsdk5r C3KRgXjRFQWeTbcyqg4SA7dM970dsECItbsBeKOjmsPFwEYpkRGsDg0cCR+Yss1jOTgI fATnYyL13LftSi/LU8dECXDbeiPH4cA5bPOjmox5SrsMAZ+Ymq8JHe9sYcGpz4Kfi3HH l3RtIZoblCQVqTHkf3BqKqpSg5tJ+pHB5yIVxvlP1v7P3pHXIHFaEPQZtGCqi33UVQgg a6AKDydBGK/g1EnWm6C64BTOunfHleIAD6Tz4CiEcjMZKlY4T/mKLl3ygdgVTRaSLzRT xbLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jDOq7qoSq2m9cDVTd9vp3XcinivjVNL/Xz/wgy1QuVk=; fh=62GROJdbzln/QsGjE7a71YRuD5BIah4s1ki9oo1aJ7c=; b=XElxlEQnwyiWyjh0WvlmDyyFwQ8DEsUlDKEE7K4iKnwsCQjCVvrc6tHyyFt2jkXQXO EymJRFmFSNl0tyo5WvFEo8zSduMyM0IxyQQFue1We+da68j4RZ3yj86L2otWeywUmDLa SyE7If73tKFa8MyQMBQocMf5XdQ863pNVM0XO4Dm6bDFTiCW/nQpsdaDe/EyzXa71tEE Dpsw2868OMJWU/ofywAJfimLzNUSXl6Elq9Fzg/3iGlX8C4vBW1OZ03ChRDemTOvqkQn RPIYZaYCQ9iRPvabKzwBIEoCn0cyOsLQ6YkH0fqeinie1r+bJtKYPzxXC9Rd8zYh2MDA /4qQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775140322; x=1775745122; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jDOq7qoSq2m9cDVTd9vp3XcinivjVNL/Xz/wgy1QuVk=; b=VcvcuBvIXZPogVFz/BaeJaxzm5/BjhmIu02dnyLN8eDj02Ed8h8l7sCNMJSdgru8tm 6b3Do+FJeOpvcRJQsvR+tNBB5Jh7FBSOpg2eRwkNxUQOPqN6ElxvnCWwW5kZxq/Y87a0 mYGptXpoBBn261RWUAYhrkEGHpM0mYDxznVAOWSMjN82wvhLaJ+ZS/iayEoHfConxW2R l4Bii6N2re7T5Ah0GpoP5XcuIaU2j4+FwTYzQDJEur/RJB7xNdidZ11yS6Cm5QayTImS 1ADEJ0WgOFr/+//awxSknM2resJS7A0xXN23/CT9nAfDeNTBHHuXbaN4OjpiSlc2H8wW jLOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775140322; x=1775745122; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=jDOq7qoSq2m9cDVTd9vp3XcinivjVNL/Xz/wgy1QuVk=; b=J1/60LRAP5ZM4F1mw+ooKWga02NubPiMLGmAOIHEWj7nhpPfzgAJ3HTmRFX8NcCZf8 6wJsWts7xyZ3VFyW1ZZ/JVGWRfl8cYm4Ivrui/K8CKQEBFa7kI0PpIiJpkcqDdyFnTlh fqg8s8vREzwDMOcLs+8fp8jNbMdiWW4zNSmPAzPtW6jDZKf4QrMsg4RcSMJL9qo4sC64 +eX+aBUU9WpQaffVmYhwuOpZb1RFj6bgHnM3zYcyJygUe7GsRyztNzGIrdvFmnxZ5vVW 2Mc3WwkxxWdw0jEDXZXiYsGN9/FtUtudKgkiOHyqWfS/yXpU10CXuXTwSb2f9qw1Yxw6 aIVQ== X-Gm-Message-State: AOJu0Yx/qjGUrmd7vxCXEasa2T0ufojx9XFoF7pIsiveyWOTXu/+EuUT a5SW3l2n1Fln8fBVXXLddH9soLKuGXTC7s3n9zsytN1s21R5OMzNhQPkEU1KmJIu8IGI4+KoTek WsSCxWpqMnadT+VxGHv/depDOrFPZM8Q= X-Gm-Gg: ATEYQzyGJlY9v+bZ313b1nT2deTT9qREOO1XR9PQunAML6wC88qmeFLrqkmrlhbvXhU CkB4BzhedBR7vKv9mRKrWMrWe0vJ4t2QVwHMfxg0A6awz96pCT3NWl1yhOwlqhnpZ3oRf6yz2Yu KC2hNc1nrhoHD9FLB43pw7NWKyGZUexOltqY/HNCGYyvgThFtSHfzmmoAzu/kO+wI7ekOV43myQ C7OQPsJ7PrapXJMDFmBeZ554aWw7qJ83F/1pRiQQPwE2mHypT0VLuQL8UzCXLoLSdHtblowoSHj 33W/QbHnouYIKQO4RvdaJgCL1dr7SM95wz3eqBb1UJoKEVMlnnLAYokkE8uvG0G4WqQ4dE7Te95 YghsNi4yj X-Received: by 2002:a05:6402:1462:b0:66a:72e5:6af5 with SMTP id 4fb4d7f45d1cf-66e03ddd518mr2038977a12.17.1775140321834; Thu, 02 Apr 2026 07:32:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Melanie Plageman Date: Thu, 2 Apr 2026 10:31:50 -0400 X-Gm-Features: AQROBzA0iO0R8TXtxg2lYRIyKtxWeCHwscEGhv1WTkm6fSHNtGVM85fH0Kf81L0 Message-ID: Subject: Re: AIO / read stream heuristics adjustments for index prefetching To: Andres Freund Cc: pgsql-hackers@postgresql.org, Thomas Munro , Peter Geoghegan , Tomas Vondra , Nazir Bilal Yavuz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 31, 2026 at 12:02=E2=80=AFPM Andres Freund = wrote: > > 0005+0006: Only increase distance when waiting for IO > > Until now we have increased the read ahead distance whenever there we > needed to do IO (doubling the distance every miss). But that will oft= en be > way too aggressive, with the IO subsystem being able to keep up with = a > much lower distance. > > The idea here is to use information about whether we needed to wait f= or IO > before returning the buffer in read_stream_next_buffer() to control > whether we should increase the readahead distance. > > This seems to work extremely well for worker. > > Unfortuntely with io_uring the situation is more complicated, because > io_uring performs reads synchronously during submission if the data i= s the > kernel page cache. This can reduce performance substantially compare= d to > worker, because it prevents parallelizing the copy from the page cach= e. > There is an existing heuristic for that in method_io_uring.c that add= s a > flag to the IO submissions forcing the IO to be processed asynchronou= sly, > allowing for parallelism. Unfortunately the heuristic is triggered b= y the > number of IOs in flight - which will never become big enough to tgrig= ger > after using "needed to wait" to control how far to read ahead. On some level, relying on worker mode overhead feels fragile. If worker overhead decreases=E2=80=94say, by moving to IO worker threads=E2=80= =94we won't be able to rely on this to keep the distance to an advantageous level. If io_uring async copying is advantageous even when the consumer never needs to wait, then it seems like parallelizing copying to/from the kernel buffer cache will always be advantageous to do at some level. The case where it is not (as you've stated before) is when the consumer doesn't need the extra blocks, so it is just wasted time spent acquiring them. So, it feels odd to try and find a heuristic that allows the readahead distance to increase even when the consumer is not having to wait. I'm not saying we should do this for this release, but I'm just wondering if in the medium term, we should try to find a better way to identify the situation where async processing is not beneficial because the blocks won't be needed. > So 0005 expands the io_uring heuristic to also trigger based on the s= izes > of IOs - but that's decidedly not perfect, we e.g. have some experime= nts > showing it regressing some parallel bitmap heap scan cases. It may b= e > better to somehow tweak the logic to only trigger for worker. > > As is this has another issue, which is that it prevents IO combining = in > situations where it shouldn't, because right now using the distance t= o > control both. See 0008 for an attempt at splitting those concerns. Yea, I think running ahead far enough to get bigger IOs needs to happen and can't be based on the consumer having to wait. - Melanie