Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uak05-009AiL-Op for pgsql-hackers@arkaria.postgresql.org; Sat, 12 Jul 2025 23:51:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uak00-00C4JF-Vh for pgsql-hackers@arkaria.postgresql.org; Sat, 12 Jul 2025 23:51:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uak00-00C4J7-I8 for pgsql-hackers@lists.postgresql.org; Sat, 12 Jul 2025 23:51:21 +0000 Received: from mail-wr1-x42d.google.com ([2a00:1450:4864:20::42d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uajzy-007GKB-00 for pgsql-hackers@lists.postgresql.org; Sat, 12 Jul 2025 23:51:20 +0000 Received: by mail-wr1-x42d.google.com with SMTP id ffacd0b85a97d-3a4fb9c2436so1984500f8f.1 for ; Sat, 12 Jul 2025 16:51:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1752364275; x=1752969075; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RgyGjDaGGPWi2qPwoVuPKUlGqm5pvPfJmP7xTT8JyYY=; b=kHdBJoZ0QoagESuSxE5lF5/BJvWls4r7mvw6jOZRjy7L2ruSg+iIGDPfo11yAkLqps Y5El5PyvcWXRCpj6mesvtalmfwaZURR5ddRsAGEsdZc1wdl2xw3E0CBMIf8OG+oK9hLs cRVuPGci5TVdyQyGAj2XTRlIplhj3iOM7ZKXudOe9Aywz6sGVF6VGz4bfwk1JwBq5tiC DQtt61ypTleOg/DxI1GFsm94gKgcwCV+EKQp2Q/vMterqUWriCMD/0s9rijQNq/qYUDr SVNarl77CUN24gtyZdr/UnOFY7F79Iu/+nLqZxJAMaZpXZXKXZ5HCxPN0VsrQUhpbHmb Aa+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752364275; x=1752969075; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RgyGjDaGGPWi2qPwoVuPKUlGqm5pvPfJmP7xTT8JyYY=; b=osgwEzfRoyOcX11dM1khFNcFq+HXT4E+MAUIby0zPiCtMwVNUPZHO5ll3Uq3L9agFc g1mmmYwb6xOKVDgIPLMp6EGOEsGzChCTUJwEx5O1sqXLpL47hOUtz7VnpH4ZuyYrcHni Hcp+7OVpDwxE3ceJg0uw2Ny8r0b++6EvXIUdS1vtqTBQD9OY3RIDPHhUMOECK8w1pnr1 kgM/dbn+ctLVkxz1ZfSJRUxYfEZ7JEqF8XAlUHQZHyPNA/8sF0RUjvp7/2eAIOdIIajD 1WbNhtE/sOIk9Glr/FrpB+UWPM/3rU1rl4pS8tL6ViqZPXyB6YnwDHVdxom+j2NSQvMd obCQ== X-Forwarded-Encrypted: i=1; AJvYcCUvCM/5G3JUyTemvExWcCYZ4vmXc5tiBHIl6RtjgRB0vYaqxVAhby0yLdpk2suvBq6+yEOjAtC+V61kar/P@lists.postgresql.org X-Gm-Message-State: AOJu0YyOEMNjo1BF/U5NzDp7ATS12IXNymA9nBkFgfHez6VcxAlvqu08 xJGsNJk0rhjWPhkEdb9Ql5pWqRsHIuZYxTMJqsZI4k7l/xpPfxL2qAfBYxUKmMzQrM7O7FhxCpy nzsgJqgoPcKWoVU5vWrlooLeKPOBx2mDXr9k3LRlUjw== X-Gm-Gg: ASbGncttKHIfOkZPDVmMBQowZst8xGWHqCoj5IC2QqT0laz0slWJtC1vn0bpNqPSV/F aGGDc0Na08/17VSda++CscFgE7tPOYDfuLU9ZcpBMhlSvM2Xd4XKHFC4QT8+8KTqgJQacnsLqYu dZm/vd6Xce51EoUvnT6lCtzaqKSVO+Sq+WVoRxkJ7JWY6gKXQDYhtAKxiGNmH31NetevTBE/slJ dDb0lg= X-Google-Smtp-Source: AGHT+IHGqadCkqC4K6AsTkmC4WkHnWRQbJostCpn0D1nHhVLaWvGKpTvlMsuoTnHqpX5Qk/EeiPjiNYOX0h7t/sdo5w= X-Received: by 2002:adf:f3c7:0:b0:3a4:f63b:4bfc with SMTP id ffacd0b85a97d-3b5f18cec80mr6173552f8f.34.1752364274755; Sat, 12 Jul 2025 16:51:14 -0700 (PDT) MIME-Version: 1.0 References: <57d0e292-73d5-4ab9-9855-110ee9cbd90a@vondra.me> <32c15a30-6e25-4f6d-9191-76a19482c556@vondra.me> In-Reply-To: <32c15a30-6e25-4f6d-9191-76a19482c556@vondra.me> From: Peter Geoghegan Date: Sat, 12 Jul 2025 19:50:48 -0400 X-Gm-Features: Ac12FXwswJdi2-xnqEwaVrRYykWCAp3KRyDc1_F806DhNvWNxrMlmmpA8R0WrOQ Message-ID: Subject: Re: index prefetching To: Tomas Vondra Cc: Andres Freund , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, May 1, 2025 at 7:02=E2=80=AFPM Tomas Vondra wrote= : > There's two "fix" patches trying to make this work - it does not crash, > and almost all the "incorrect" query results are actually stats about > buffer hits etc. And that is expected to change with prefetching, not a > bug. But then there are a bunch of explains where the number of index > scans changed, e.g. like > > - Index Searches: 5 > + Index Searches: 4 > > And that is almost certainly a bug. > > I haven't figured this out yet, and I feel a bit lost again :-( For the benefit of other people reading this thread: I sent Tomas a revised version of this "complex" patch this week, fixing all these bugs. It only took me a few hours, and I regret not doing that work sooner. I also cleaned up nbtree aspects of the "complex" patch considerably. The nbtree footprint was massively reduced: 17 files changed, 422 insertions(+), 685 deletions(-) So there's a net negative nbtree code footprint. We're effectively just moving things out of nbtree that are already completely index-AM-generic. I think that the amount of code that can be removed from nbtree (and other AMs that currently use amgettuple) will be even higher if we go this way. > The one real limitation of the simpler approach is that prefetching is > limited to a single leaf page - we can't prefetch from the next one, > until the scan advances to it. But based on experiments comparing this > simpler and the "complex" approach, I don't think that really matters > that much. I haven't seen any difference for regular queries. Did you model/benchmark it? > The one case where I think it might matter is queries with array keys, > where each array key matches a single tuple on a different leaf page. > The complex patch might prefetch tuples for later array values, while > the simpler patch won't be able to do that. If an array key matches > multiple tuples, the simple patch can prefetch those just fine, of > course. I don't know which case is more likely. We discussed this in Montreal, but I'd like to respond to this point again on list: I don't think that array keys are in any way relevant to the design of this patch. Nothing I've said about this project has anything to do with array keys, except when I was concerned about specific bugs in the patch. (Bugs that I've now fixed in a way that is wholly confined to nbtree.) The overarching goal of my work on nbtree array scans was to make them work just like other scans to the maximum extent possible. Array scans "where each array key matches a single tuple on a different leaf page" are virtually identical to any other scan that'll return only one or two tuples from each neighboring page. You could see a similar pattern with literally any kind of key. Again, what I'm concerned about is coming up with a design that gives scans maximum freedom to reorder work (not necessarily in the first committed version), so that we can keep the read stream busy by giving it sufficiently many heap pages to read: a truly adaptive design, that weighs all relevant costs. Sometimes that'll necessitate eagerly reading leaf pages. There is nothing fundamentally complicated about that idea. Nothing in index AMs cares about how or when heap accesses take place. Again, it just *makes sense* to centralize the code that controls the progress of ordered/amgettuple scans. Every affected index AM is already doing virtually the same thing as each other. They're all following the rules around index locking/pinning for amgettuple [1]. Individual index AMs are *already* required to read leaf pages a certain way, in a certain order *relative to the heap accesses*. All for the benefit of scan correctness (to avoid breaking things in a way that relates to heapam implementation details). Why wouldn't we want to relieve all AMs of that responsibility? Leaving it up to index AMs has resulted in subtle bugs [2][3], and AFAICT has no redeeming quality. If affected index AMs were *forced* to do *exactly* the same thing as each other (not just *oblidged* to do *almost* the same thing), it would make life easier for everybody. [1] https://www.postgresql.org/docs/current/index-locking.html [2] https://commitfest.postgresql.org/patch/5721/ [3] https://commitfest.postgresql.org/patch/5542/ --=20 Peter Geoghegan