Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAYlJ-00GROE-2Q for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 18:03:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tAYlG-00GdGb-A9 for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 18:03:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAYlF-00GdGS-OU for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 18:03:38 +0000 Received: from mail-wm1-x32d.google.com ([2a00:1450:4864:20::32d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tAYlD-001LeA-0q for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 18:03:37 +0000 Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-4316f3d3c21so38557585e9.3 for ; Mon, 11 Nov 2024 10:03:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1731348213; x=1731953013; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2XKe8GpJLcfg0Cpgwypu54Q2xBoOhXObPZo6KzIiTUw=; b=PmndFjfPsf8QaxEgCxeYpWnzvdfKKrdumbznmFQfLK3FrS1uDuOlBvTYk7EIXojP16 g5mahhwDqXYsZ0prvj4TrEhDhmiybZ7R6gw/+3STFLMEkppVyxENvzmrww9KkkzX9azb DR9Wl5u7X9BIIlnXHWpUgslQI2FfqWni8d4Y0j/mlIsFjU0SEQ+Cppgd4fk489CS7/lI XHeH5fOosizcIzzwdwBu7/6JzBc/dV6O7u5oRoszegzDD2/4rUw7S9GW+wGP6jPOZCUW ErlZKHdIoMEuVD1ykA6BxnhZ1thIfL6C73tX7xle4aRHGaVQfnL739DfmgfWIeB0Zg0D B7fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731348213; x=1731953013; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2XKe8GpJLcfg0Cpgwypu54Q2xBoOhXObPZo6KzIiTUw=; b=fX1mCtugp+/suyY9L5iUnDGCxdm+NgRRgWeDCemDCEoxstPAy1ii8mA8DsgPgzZBQr vv6pki1LtF1b/m/1yqf/B7UCAlB7pW4t6uGu9JhYda/p6Q21ss7zswOp3lsWZD8NRbbm CXDr5wEPfnRzb5eaxjxBi8WpWpBkySCeRakjLwq/I44x7Re8sv8F9UWj08cVeUPR02z3 vMjVRjeNFCzDNrJ40gVQ/mQ2YnCrlV/y6bnHqAJBAcC0EKetrt5gJCht/klBikz056S3 WZB6XhMP050HtnoC6taPBWL1GjQRzQM+n6TrKAPtYyj+NqAXdcfCV0KQG/zkynVyHmCU jtzA== X-Forwarded-Encrypted: i=1; AJvYcCVRUdvxrK7pLeWV7gtEblMP8D44clwQCL7VKbMO57lVaewigj+W8lxjpyw5ST46tdgvl1vvsr96rBo0W2CR@lists.postgresql.org X-Gm-Message-State: AOJu0YxeGIRKVWfVxwzasJO2bcOKAW8hCYRLDGrESymcDU1gF7bNlXR+ OnD6i7WvE2hMmpOOZ0gPE8xVK79zm/DfDTmGEqM3rFX1682N8xD9bsTF+XaikNK+BydmsvAGGiq GJIur9kqKo4gl0Hy1DnAz+LYG4vv6rEzNzwNkiQ== X-Google-Smtp-Source: AGHT+IHyTC7NNHowiXeZVBeyjG6RNX/CmRkPRt2GbMd6m48jt6YJgnMnyCuCKVcaPMrQtRBwYkySYTO42UiiC54Kgx8= X-Received: by 2002:a05:600c:1c9a:b0:42c:ae30:fc4d with SMTP id 5b1f17b1804b1-432b74fc981mr116617085e9.7.1731348213282; Mon, 11 Nov 2024 10:03:33 -0800 (PST) MIME-Version: 1.0 References: <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de> <48d3ff87-a435-488f-b803-258dab6485d6@enterprisedb.com> <6761860c-c66e-400a-b45f-aa741f548771@vondra.me> <8b58ae08-9ed7-43aa-92b0-1976ea6385ed@vondra.me> <225323b8-2c93-4784-8616-c58931a776bc@vondra.me> In-Reply-To: From: Peter Geoghegan Date: Mon, 11 Nov 2024 13:03:07 -0500 Message-ID: Subject: Re: index prefetching To: Robert Haas Cc: Tomas Vondra , Andres Freund , Melanie Plageman , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Nov 11, 2024 at 12:23=E2=80=AFPM Robert Haas wrote: > > I think that holding onto pins and whatnot has almost nothing to do > > with the index AM as such -- it's about protecting against unsafe > > concurrent TID recycling, which is a table AM/heap issue. You can make > > a rather weak argument that the index AM needs it for _bt_killitems, > > but that seems very secondary to me (if you go back long enough there > > are no _bt_killitems, but the pin thing itself still existed). > > Much of this discussion is going over my head, but I have a comment on > this part. I suppose that when any code in the system takes a pin on a > buffer page, the initial concern is almost always to keep the page > from disappearing out from under it. That almost never comes up in index AM code, though -- cases where you simply want to avoid having an index page evicted do exist, but are naturally very rare. I think that nbtree only does this during page deletion by VACUUM, since it works out to be slightly more convenient to hold onto just the pin at one point where we quickly drop and reacquire the lock. Index AMs find very little use for pins that don't naturally coexist with buffer locks. And even the supposed exception that happens for page deletion could easily be replaced by just dropping the pin and the lock (there'd just be no point in it). I almost think of "pin held" and "buffer lock held" as synonymous when working on the nbtree code, even though you have this one obscure page deletion case where that isn't quite true (plus the TID recycle safety business imposed by heapam). As far as protecting the structure of the index itself is concerned, holding on to buffer pins alone does not matter at all. I have a vague recollection of hash doing something novel with cleanup locks, but I also seem to recall that that had problems -- I think that we got rid of it not too long back. In any case my mental model is that cleanup locks are for the benefit of heapam, never for the benefit of index AMs themselves. This is why we require cleanup locks for nbtree VACUUM but not nbtree page deletion, even though both operations perform precisely the same kinds of page-level modifications to the index leaf page. > There might be a few exceptions, > but hopefully not many. So I suppose what is happening here is that > index AM pins an index page so that it can read that page -- and then > it defers releasing the pin because of some interlocking concern. So > at any given moment, there's some set of pins (possibly empty) that > the index AM is holding for its own purposes, and some other set of > pins (also possibly empty) that the index AM no longer requires for > its own purposes but which are still required for heap/index > interlocking. That summary is correct, but FWIW I find the emphasis on index pins slightly odd from an index AM point of view. The nbtree code virtually always calls _bt_getbuf and _bt_relbuf, as opposed to independently acquiring pins and locks -- that's why "lock" and "pin" seem almost synonymous to me in nbtree contexts. Clearly no index AM should hold onto a buffer lock for more than an instant, so my natural instinct is to wonder why you're even talking about buffer pins or buffer locks that the index AM cares about directly. As I said to Tomas, yeah, the index AM kinda sometimes needs to hold onto a leaf page pin to be able to correctly perform _bt_killitems. But this is only because it needs to reason about concurrent TID recycling. So this is also not really any kind of exception. (_bt_killitems is even prepared to reason about cases where no pin was held at all, and has been since commit 2ed5b87f96.) > The second set of pins could possibly be managed in some > AM-agnostic way. The AM could communicate that after the heap is done > with X set of TIDs, it can unpin Y set of pages. But the first set of > pins are of direct and immediate concern to the AM. > > Or at least, so it seems to me. Am I confused? I think that this is exactly what I propose to do, said in a different way. (Again, I wouldn't have expressed it in this way because it seems obvious to me that buffer pins don't have nearly the same significance to an index AM as they do to heapam -- they have no value in protecting the index structure, or helping an index scan to reason about concurrency that isn't due to a heapam issue.) Does that make sense? --=20 Peter Geoghegan