Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vw4Br-000qP1-17 for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 20:11:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vw4Bo-006WVM-2u for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 20:11:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vw4Bo-006WV9-1K for pgsql-hackers@lists.postgresql.org; Fri, 27 Feb 2026 20:11:56 +0000 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vw4Bk-00000001ckb-2gBp for pgsql-hackers@lists.postgresql.org; Fri, 27 Feb 2026 20:11:55 +0000 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-4806cc07ce7so26406765e9.1 for ; Fri, 27 Feb 2026 12:11:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772223111; cv=none; d=google.com; s=arc-20240605; b=gPRDuPUhDEpeg8yczkWnqpeiCC0jP9AYYuCmx6Ybupt2gnzmNoInFvT92/mqPr7zBh CG1g9CNk+APmFhZ6uuFSt+REqJmJXLn1frnsYw0y3bGbY67B7dW2vT80+saaMZH5a3Hs pO8hh8EUVeMLpBKuLMmAtKJtfUVzdAei+0NQL5kTtBQr3Q7dVpbfuj+DyU8JYypHYeGo fWfI0n6AIJ74ks+b7Vcyet0r2eFSPkW2iOE5AixJgnQlVx+3ZaHCoasuTGOcCofHAls1 8NDMfmcAp0e/TJH6LI4gGy2q7CU/IXcQlS/iz9tSAccgQMGtR9QL4kaPFwPMI5Cm5UH8 3Imw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=dV+wnOX5HVj6FLQA/qImYETh+V5wpDHeGxai6U4j2yI=; fh=QkE7Bi27ergWGGJIpRwQ4w+0PSsoYhmkB0bbepF4vkA=; b=SQCGZw83WvjOu1mFvhtolnfSQ3SJE0qWhKRIqXN3yKlNQaaW8GyulZX5vp7quljSIe x5f8FZgHIZf+SNE16BsX8zO9qTCnL/l1avhIxXT9N+j3HefoGxmiQKTMUTJDl44FgmCx 6EWSmS3FYu5T8IE8nlVLFckWsDFT5lOgy961r1atOIWtYz/LGAzLyr9IUYFu5f54Dzkj YnWpsf1CaLvW2Yf87FVh7xMFIbChs4qtzwzDBZ7b65AOwooM09/l2S7o8pRdZp6INI55 ctbwsozVRK4VgEV0qNxY40MrK9TljxnvXpkNylyIODm0HPyWg+s7g4aewVH0tM5rAoTL il9w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1772223111; x=1772827911; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dV+wnOX5HVj6FLQA/qImYETh+V5wpDHeGxai6U4j2yI=; b=fnib0iUzHBJ//QKDCnWz4Dk1yJIOMf6jzLAFSoh9mtAU9+CInJIOUsCNWeXmamcKzd FbUkoTG2/6XAM09B/tECObKeOc2HuBvW9wk2JIrwjuPV3iQQhC9u3vBFZ/blm+VivrP7 LP5WgFKVnQ3bpwEzU40/n5bVuWagoTnxl9vBia93ZkvuvKSDBQf0tVeklRnh5qiaO3Tw vmcZyelabI/+F1PfVqaO3Rl6B+esliTarCAKGMZ6z92un6mgbjnQgWUK51N3hUxU9FT1 a/97ifUADflCrKoTwmAFkwM2hYs1r65I5lHlczXPLLhro1W9HBs3ra099x3VGm5qK21W uSqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772223111; x=1772827911; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dV+wnOX5HVj6FLQA/qImYETh+V5wpDHeGxai6U4j2yI=; b=MXpojBxUZv+3P+fH1BqeQls7aXicDzqPUXVYv7b1HPJy4+R7VarBodAJWODSk26GIu DVolespPjJAa8/Jbtb5r88ji7F80om6KHqj8b04tJWsowvtUad6UrDnmyrOcY415NEUY Aj0naA4WEfv0PazaLnoZUfwpsuzgv92NJJpovWgfJwoZ2iC5iys1yySfKURx9SJOlBNF NmHLSgLW6P0sFU8p6zi0BxQvWJrSAc0ei1a7Rfl8W1hw6/6dKvfEAnjN+cmFZmCmQwWc xuG0bbiyNkJJgtjsYRIV+UObZjzfa9q2GsZMwq4+4BtLCMYrwXqsTHvWqn/0bmg3U/cT UP6Q== X-Forwarded-Encrypted: i=1; AJvYcCW/aA2MjkCiaToyFT6dnuVvf2HiUItSDztYmgHwXzges39Y9kSZwoxb4LEKaFEiluxj1XTmgyWK0SYvnJhP@lists.postgresql.org X-Gm-Message-State: AOJu0YyuOT7hUemvPRlP0DZxwl/hFVpntIxSVOjhmu1uLauTs90MCtV9 CNco770oZUzkodtEEZP6zYp5gnH14Q5MsAk4B+vl3q23IeftvHOpGWRWn5goUPVAjtnk7ldEWD3 OOAccRTvqBWmerSg5wLG0k6bbSeiHtBAOgJCRmDGSmQ== X-Gm-Gg: ATEYQzyQhCejI4Qb05LeRigeRJGuiKATKDRRVb7MhV9Y5nyAbxC7QrWMdiMc/enfGFE cxqxtyBzD/kWwVhoXR84r37DQU4x1M0T6HN26eJ4iU4VtIhtQlfvb+7M13Wiq/jbu7yRW6NvmxL JhMr6A2Uh0QUfkeuMpb0nPVjfuslduBSkrbGlrF9frDHDOc/CB6dWGRD+f3kYFHefhaQXxi65Vd xuKWde2S4kybe5V8q/Sv3gQKzTfVQHnYS27/snRUJNYrODGKrphE+PIZC28I5xxHprVGFG2/3ec 9AyfDEw= X-Received: by 2002:a05:600c:b86:b0:480:4a90:1b06 with SMTP id 5b1f17b1804b1-483c9c196a5mr62925445e9.34.1772223110624; Fri, 27 Feb 2026 12:11:50 -0800 (PST) MIME-Version: 1.0 References: <64a2re223ajj4popowsyu4xekbuvvyfwkrihn5yzyrkwsmsuvp@2lls3tpww5dl> <52512325-b1f2-4fff-819e-f68122b2e427@vondra.me> <64mfcfv7iihc4pmqlxarii4esnmqry52ckz5m7lmwylnfnuxuz@oxh4ioxkjtep> In-Reply-To: From: Peter Geoghegan Date: Fri, 27 Feb 2026 15:11:24 -0500 X-Gm-Features: AaiRm50xPyiVnmdV0_Xq579z5YCITCZBucIVDlkM_6TsdeGEDejAEbMECjGgznI Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Feb 26, 2026 at 11:18=E2=80=AFPM Andres Freund = wrote: > Note how the increase in scanned heap pages actually *decreases* the over= all > time rather substantially. > > It's quite visible, both in iostat, and a query like > SELECT pid, target_desc, off, length FROM pg_aios \watch 0.5 > > that for the first query has basically no IO concurrency, the second has = very > intermittent IO concurrency and the third one has nice IO concurrency. > > > If I disable the yield logic, the fillfactor=3D90 case is good: I can recreate your results. Including the part where you found that the problem would go away once yields were completely disabled. I can certainly understand why you're suspicious of the yielding mechanism. I wonder if I gave undue weight to the merge join query I showed you [1] (and one or two others like it). Declaring that the underlying merge join/yielding issue is not worth the complexity required to yield would certainly be convenient. Yielding *isn't* helpful for the vast majority of individual queries, so I'm certainly tempted. But I can't help but feel nervous about the large disparity in the number of *index* pages read by that particular query, once the yielding mechanism is disabled. Just in case there's any doubt: I'm flexible about whether a yielding mechanism is needed and how it should work. Ideally we can come up with a design that gives us the best of all possible worlds -- but everything is on the table. It's not that I'm attached to the idea of yielding; I'm just nervous about one or two funny looking cases [1]. With that being said, it seems as if yielding isn't the only factor in play here. I also notice that even master exhibits roughly the same performance disparity (also while using direct I/O, though with shared_buffers set to 16GB rather than your 2GB): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D EXPLAIN OUTPUT (best run, master) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D --- Fillfactor 90 --- Index Scan using pgbench_accounts_ff90_pkey on pgbench_accounts_ff90 Index Searches: 1 Buffers: shared hit=3D27325 read=3D181819 I/O Timings: shared read=3D16822.256 Planning Time: 0.035 ms Execution Time: 18048.198 ms --- Fillfactor 50 --- Index Scan using pgbench_accounts_ff50_pkey on pgbench_accounts_ff50 Index Searches: 1 Buffers: shared hit=3D27325 read=3D333334 I/O Timings: shared read=3D30685.965 Planning Time: 0.028 ms Execution Time: 32005.962 ms --- Fillfactor 25 --- Index Scan using pgbench_accounts_ff25_pkey on pgbench_accounts_ff25 Index Searches: 1 Buffers: shared hit=3D27325 read=3D666667 I/O Timings: shared read=3D10278.124 Planning Time: 0.034 ms Execution Time: 11796.573 ms While fillfactor 90 is fastest, fillfactor 25 is almost 3x faster than fillfactor 50, despite performing about twice as many reads. I have to imagine this relates to my Samsung 980 Pro SSD performing its own read-ahead, in a way that works inconsistently across workloads. Note again that this effect with master only appears when shared_buffers is set to 16GB. With your 2GB shared_buffers setting, master takes 17930.381 ms for FF 90, 31822.473 ms for FF 50, and 61094.676 ms for FF 25 (which is at least consistent-ish in the way that one would expect). For context, here is how the patch compares to master with shared_buffers=3D16GB (here master uses the same query execution/query plans as those shown above) once the patch/Pfetch's yielding is disabled: FF Heap Pages Master Pfetch ON ON/Master ---------------------------------------------------- 90 181819 18048.2 1465.0 0.081x 50 333334 32006.0 1682.2 0.053x 25 666667 11796.6 1928.4 0.163x I also noticed that the patch isn't at all sensitive to whether shared_buffers is set to 2GB or 16GB -- not once yielding is disabled like this. Obviously that insensitivity is desirable, which argues for removing yielding. For context, with standard v11/with yielding enabled, the "2GB vs 16GB" matters to quite a surprising degree: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Patch + yielding, 2GB shared_buffers =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D FF Heap Pages Pfetch ON ------------------------------ 90 181819 4276.5 50 333334 1523.3 25 666667 6805.7 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Patch + yielding, 16GB shared_buffers =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D FF Heap Pages Pfetch ON ------------------------------ 90 181819 4384.6 50 333334 1683.2 25 666667 2002.0 Notice that with 16GB shared_buffers, the perverse effect from yielding is even more pronounced! I'm not sure how relevant this later point about "shared_buffers sensitivity with yielding" really is. Nor am I sure if the effect with master (and the possible role of device-level readahead) is all that significant. I'm pointing all of this out in the hope that you can offer an explanation that'll help me to improve my own intutions about this stuff. [1] https://postgr.es/m/CAH2-Wzk-89uCvdJ1Q6NsM6LvDvUEt6Qy66T6A60J=3DD_voWxZ= Dg@mail.gmail.com -- Peter Geoghegan