Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uttF5-00FZMk-Q6 for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Sep 2025 19:34:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uttF4-00BFTc-I6 for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Sep 2025 19:34:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uttF4-00BFTT-4L for pgsql-hackers@lists.postgresql.org; Wed, 03 Sep 2025 19:34:02 +0000 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uttF1-000NgO-0v for pgsql-hackers@lists.postgresql.org; Wed, 03 Sep 2025 19:34:01 +0000 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-3df2f4aedc7so175182f8f.2 for ; Wed, 03 Sep 2025 12:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1756928037; x=1757532837; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VRB0mgb1bTCi6hE9I90yzZw+2em/sMrnccS7h8ti1co=; b=xge3MZptcd1Nw2pQXB9bnuBXOZ0nk6uDAYTVmYY0sUnCoiqc27pMJ1Hf+H/VvscELa yI0d+u8TKMSO2lWeWhwxoif7yEBLYJmQ1k2dMfyfmU9eKmy+z0uEI95GfA6UFLXk90BD Ei4airbi9KSm7kdnDL7sVplVrUPOe27po+K505mmygJvaWROwOs/nWuBrXCQICq85GB5 /nBZzxq6Fs8NGF3V32vWNq8Zt699UeD515eymeJY4eX3o8rKvSmWgYqUqWqOdbyYDnho 5yDaiLohoK7rTadibj/lYB51+82/nzND9ohSzU3RJjRk4Qjjd3TR6JLf1vHgSsfsLcma +aCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756928037; x=1757532837; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VRB0mgb1bTCi6hE9I90yzZw+2em/sMrnccS7h8ti1co=; b=wT3Hu19ro8wfbWMDNpMVRicKV0PLwgU2MbcJagBMI1E/jtUb1YmMEl1G1elPkQ8PFk lL9/JCVtLgj/BEazqe2wZD54lQnR9b+f1bglAej44Ta55k5RPB3NDkMWXVb/geW/SCe3 T82oPfZn/a4xOP0TOyUg9ADf7fZkfUhXTjpJunN1d0JN8J9RD+QuR7gDav3S/8Arl05J jcaMEuXUH/ChglXGwonRhklMGKHwO/d6dfVGM6cWVBDvEAQF/uOfRglw4osji2ZNxSPt +H8H220CAwjo0Xk0mtLx7ekNo+PcP7bz5zRDXbQ+5M4NQ7Y9DmG/pJUnWQMAJ21lAfYl Nl/Q== X-Forwarded-Encrypted: i=1; AJvYcCXlqfCl5PzUgP2rgUu/XXF7GBXzmEGqnPwEAr7TGoxUSVi3QHrUIutrxuPaynAjXxwH8wchqsPjPU4XaJw9@lists.postgresql.org X-Gm-Message-State: AOJu0Yz+ZKdgo0JRPNl8nbVxfxFgX3GUhjZCsDLc2d15dTWs62Ah1m6o VsspT/wbjcIbOrawNfV4xgQr2ORA6DIvWpIDHJ+lSm/+Xeb4owwL3mvYJ4dvAGd5Kpdr2qCj5Tg k2ntj+Fv7E8oNUdD3yNSSjYSw9MkXDrSG4nSYrY/9DA== X-Gm-Gg: ASbGncvZ4VThDNcUtCybGZFMM8ySv74OmuxiUZqSWk1OS3Rq/MzQbrgKwIVxvJmaDTl TJBxmn0umNeMKI1fWi+0IkrumDRWEuXQJ0mVZuxqPBm5sWjW19hS61R5HsqYCb59uwtc/UF9aS0 cgyLrZHdRZKOXsF7aq8s/UWP5dD5biV0huo4hl2jHAKOwZpDi88h8bFKH/sicnrg5dnC1DGwesB 1aLb9ouBjk0Vm5Kb0BHqKDHVwqH X-Google-Smtp-Source: AGHT+IGc1a0NoO7QkHAbvAiBkaONx95vJvbnypfWuJKVNy1vWlzv1lmUWL8sn7DrDyRA6P3D3qOGraJRw8YhuktmrWU= X-Received: by 2002:a05:6000:42c9:b0:3d1:e22d:df2f with SMTP id ffacd0b85a97d-3d1e22e12d1mr9090943f8f.23.1756928036967; Wed, 03 Sep 2025 12:33:56 -0700 (PDT) MIME-Version: 1.0 References: <1c9302da-c834-4773-a527-1c1a7029c5a3@vondra.me> <6d59c277-c440-4d1f-a46e-157958c06a5f@vondra.me> <5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i> <931afce3-8c86-4c96-9861-0ffa17c6560f@vondra.me> In-Reply-To: From: Peter Geoghegan Date: Wed, 3 Sep 2025 15:33:30 -0400 X-Gm-Features: Ac12FXxM8R45l5TnGPMsaMXj2mlgOUgyfD5kp9QfzQiMlySI4V6OdXvLeY42WOc Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Sep 3, 2025 at 2:47=E2=80=AFPM Andres Freund w= rote: > I still don't think I fully understand why the impact of this is so large= . The > branch misses appear to be the only thing differentiating the two cases, = but > with resowners neutralized, the remaining difference in branch misses see= ms > too large - it's not like the sequence of block numbers is more predictab= le > without prefetching... > > The main increase in branch misses is in index_scan_stream_read_next... I've been working on fixing the same regressed query, but using a completely different (though likely complementary) approach: by adding a test to index_scan_stream_read_next that detects when prefetching isn't favorable. If it isn't favorable, then we stop prefetching entirely (we fall back on regular sync I/O). Although this experimental approach is still very rough, it seems promising. It ~100% fixes the problem at hand, without really creating any new problems (at least as far as our testing has been able to determine, so far). The key idea is to wait until a few batches have already been read, and then test whether the index-tuple-wise "distance" between readPos (the read position) and streamPos (the stream position used by index_scan_stream_read_next) remained excessively low within index_scan_stream_read_next. If, after processing 20 batches/leaf pages, readPos and streamPos still read from the same batch *and* have a low index-tuple-wise position within that batch (they're within 10 or 20 items of each other), we expect "thrashing", which makes prefetching unfavorable -- and so we just stop using our read stream. It's worth noting that (given the current structure of the patch) it is inherently impossible to do something like this from within the read stream. We're suppressing duplicate heap block requests iff the blocks are contiguous within the index. So read stream just doesn't see anything like what I'm calling the "index-tuple-wise distance" between readPos and streamPos. Note that the baseline behavior for the test case (the behavior with master, or with prefetching disabled) appears to be very I/O bound, due to readahead. I've confirmed this using iostat. So "synchronous" I/O isn't very synchronous here. (Prefetching actually does make sense when this query is run with direct I/O, but that's far slower with or without the use of explicit prefetching, so that likely doesn't tell us much.) -- Peter Geoghegan