Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rZyrf-007Tq8-Vv for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Feb 2024 19:54:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rZyre-0026nt-FG for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Feb 2024 19:54:46 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rZyre-0026nk-0H for pgsql-hackers@lists.postgresql.org; Tue, 13 Feb 2024 19:54:46 +0000 Received: from mail-ua1-x92c.google.com ([2607:f8b0:4864:20::92c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rZyrb-0072L6-2M for pgsql-hackers@lists.postgresql.org; Tue, 13 Feb 2024 19:54:45 +0000 Received: by mail-ua1-x92c.google.com with SMTP id a1e0cc1a2514c-7d5cbc4a585so1419559241.3 for ; Tue, 13 Feb 2024 11:54:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1707854081; x=1708458881; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PClgBTVzb35QZXK78OFzUhumKJRGa9hx31l2hHfom3k=; b=ffkoCcJe9nx5y4XSK45b4SwDYqrHWV0yM92EvqhmUTx86LfXwa9zAPEgAHOEhEFSCV RIvSmeT6sQ2aYnu9Ot95DdbFX7UrK/xUemHtfhtRbfm5ppot1Lojvzya/SwITIs3fMxY xcS2BWGDfy0qGn2G8f8r8XqSmcC4P50DLH7ew5JPZruVJ0dsFGMeHJYCJWjl6JSXk3UR F/2ZYEay39bS892GqrXo1wh3VEp9nDPPYPLmO48oEWg1Svtc1R++mTOK45VDgWVI+Uwb oIbfetIiQ9UY3W4TJA/DLzDaxo7UlfTGuzSQ9waV7KN6nMH3Yqh2oLjm/yqGeE6Dkpif QRVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707854081; x=1708458881; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PClgBTVzb35QZXK78OFzUhumKJRGa9hx31l2hHfom3k=; b=Db/GBMwfPxgDi5CPXgd3K4zlza/krvV+/CEltITvUd/lVmoaMx/AlUhgKxexFUuTzo 4tWC3fKeSXzWkjUnEVS8KGv7Ar6vjZKLxR34Tzgba4/D4faUAtLgbKZAxAuKwhYL+ouN QLSpIgvrSk9HlS1NK2lzAUmbRWRMRTCL66KJyMTvBwHjtKgCtgQmQhPi+IyqBH32/6lW JH6jyrx8C49ay/KTGLCbBy0Z1vMSNOik7ZGJrjw1n79DqnbXAxBdAAZOJbJFuDZUtz78 lRfKyKLhm+dkaCvE6ay5NbAUuWUJ4aiEz0ZX7PLz6ypPMKKVGNMj2ouUpNIvyYj3kxJd b0SA== X-Forwarded-Encrypted: i=1; AJvYcCVSWgb7y9w9jFnvf2V0MobRPmTMJsr8ab16GrJ6uqYKxIA87P4lRLWIKqTGddOXFtdoYakULewl45t4iLfzVYSE270wJ/n/qss6JFN9FC+7ML4l X-Gm-Message-State: AOJu0YzoheF5fJ6AFq4PTaPvJz2LUhYjATPAv9jyRUIJ5dfQuTCoh5bn kLNWBY5bGHwXAsSqsbhZ73v1h2G3jod6KfXQZiuzrySbPwWafkbp3zn+pYrbKBpj13MliiSHEZc Oya1zvJC5AQJ64Uz+u1LmWRF38cIJ0InpX3Ykhw== X-Google-Smtp-Source: AGHT+IGCCNnAY3I1hVbP1EnBwpAKr1uvTGU9mztSRumiM6Sa7wIm67c1ZCYLKahxlDcgCihSvZEId4Qs9PwKDq/5h64= X-Received: by 2002:a05:6102:5492:b0:46d:3976:2112 with SMTP id bk18-20020a056102549200b0046d39762112mr608182vsb.3.1707854080922; Tue, 13 Feb 2024 11:54:40 -0800 (PST) MIME-Version: 1.0 References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <280dc83c-a16f-4424-1319-95e7e3f798bd@enterprisedb.com> <98ba4b25-fae8-c1f4-1597-8093375a1986@enterprisedb.com> <20231221134314.wf2rs62d37u62j7t@alap3.anarazel.de> <20231221154352.ijtg6wloa3nowivh@alap3.anarazel.de> <482ec3ff-52ad-415d-96fd-f3832a894023@enterprisedb.com> <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com> <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> In-Reply-To: <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> From: Peter Geoghegan Date: Tue, 13 Feb 2024 14:54:14 -0500 Message-ID: Subject: Re: index prefetching To: Tomas Vondra Cc: Melanie Plageman , Robert Haas , Andres Freund , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Feb 13, 2024 at 2:01=E2=80=AFPM Tomas Vondra wrote: > On 2/7/24 22:48, Melanie Plageman wrote: > I admit I haven't thought about kill_prior_tuple until you pointed out. > Yeah, prefetching separates (de-synchronizes) the two scans (index and > heap) in a way that prevents this optimization. Or at least makes it > much more complex :-( Another thing that argues against doing this is that we might not need to visit any more B-Tree leaf pages when there is a LIMIT n involved. We could end up scanning a whole extra leaf page (including all of its tuples) for want of the ability to "push down" a LIMIT to the index AM (that's not what happens right now, but it isn't really needed at all right now). This property of index scans is fundamental to how index scans work. Pinning an index page as an interlock against concurrently TID recycling by VACUUM is directly described by the index API docs [1], even (the docs actually use terms like "buffer pin" rather than something more abstract sounding). I don't think that anything affecting that behavior should be considered an implementation detail of the nbtree index AM as such (nor any particular index AM). I think that it makes sense to put the index AM in control here -- that almost follows from what I said about the index AM API. The index AM already needs to be in control, in about the same way, to deal with kill_prior_tuple (plus it helps with the LIMIT issue I described). There doesn't necessarily need to be much code duplication to make that work. Offhand I suspect it would be kind of similar to how deletion of LP_DEAD-marked index tuples by non-nbtree index AMs gets by with generic logic implemented by index_compute_xid_horizon_for_tuples -- that's all that we need to determine a snapshotConflictHorizon value for recovery conflict purposes. Note that index_compute_xid_horizon_for_tuples() reads *index* pages, despite not being aware of the caller's index AM and index tuple format. (The only reason why nbtree needs a custom solution is because it has posting list tuples to worry about, unlike GiST and unlike Hash, which consistently use unadorned generic IndexTuple structs with heap TID represented in the standard/generic way only. While these concepts probably all originated in nbtree, they're still not nbtree implementation details.) > > Having disabled kill_prior_tuple is why the mvcc test fails. Perhaps > > there is an easier way to fix this, as I don't think the mvcc test > > failed on Tomas' version. > > > > I kinda doubt it worked correctly, considering I simply ignored the > optimization. It's far more likely it just worked by luck. The test that did fail will have only revealed that the kill_prior_tuple wasn't operating as expected -- which isn't the same thing as giving wrong answers. Note that there are various ways that concurrent TID recycling might prevent _bt_killitems() from setting LP_DEAD bits. It's totally unsurprising that breaking kill_prior_tuple in some way could be missed. Andres wrote the MVCC test in question precisely because certain aspects of kill_prior_tuple were broken for months without anybody noticing. [1] https://www.postgresql.org/docs/devel/index-locking.html --=20 Peter Geoghegan