Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAZej-00GW8T-MO for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 19:00:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tAZeg-00HBgM-C0 for pgsql-hackers@arkaria.postgresql.org; Mon, 11 Nov 2024 19:00:54 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tAZef-00HBgD-UD for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 19:00:54 +0000 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tAZed-001MFU-KH for pgsql-hackers@lists.postgresql.org; Mon, 11 Nov 2024 19:00:53 +0000 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-43155abaf0bso42571825e9.0 for ; Mon, 11 Nov 2024 11:00:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1731351650; x=1731956450; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1q3BH2558mdsdpaysCq4afl6xvOLv0UnJ1ulzIYntTQ=; b=1Y0kBl2xW+i3TCbOz7hDpvJQh+iuV8Sg8PB2IfGKy6/TgrqTsNzfGiEcZ+D5njLOyS 6AQlbcK/ZW80rs+Vl4perrQO+lQyElS8nmxdwzWpxJJDObwdBspWMwwEDXmMg92sG1x3 YYIe953Tiu910xi3bTN5mQy4Wbmsba9ltKcx+12g2Pmssg0rTb4MIJ2bpdtU5BUj3NWs 02FefUe7W3IPqymR1A6oUJeTFu0rR7YwNKMoD7AS3s8l3ZVhu93JnLIM5avhnuLAjlDS 5U28j2p+4nG84Zp6idxAq7/SWgeeFnJtRIUWFCoITMizbZerGh64/FaWbpGZ15bcN/qk Qkxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731351650; x=1731956450; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1q3BH2558mdsdpaysCq4afl6xvOLv0UnJ1ulzIYntTQ=; b=Y4x4a3fKL1n7d7mkHdgsEbHz/c02G5VGodRbd9Sd/wzHHSVJvMupJlafcjX3yyqW+E S81PeSOSQa7W3UXR0GEaomtxfgqRS3GBj1Oy8mZeOeIQl0pXt7V+0zEi8ZwvLiMQbBwM 5WRQOZ8KRAQ71C6DdzQr2cyU8Ogr2bYjq5mrtsUrFa6gfFJLwXxBQY6gxvsbI7+5recS Mi2gbRYT7jbpBblFsPM5NQI2omiIocAAgLMQ+5N2x9AzOxt1I6ozhWXY5p+OVm3zuX3z /L8IfFo5fI800tm34uZun3wdHIjw4sflrT3Ovz+5r7wZMhJ2Qv/gCKocob2saAWzxpQy //9Q== X-Forwarded-Encrypted: i=1; AJvYcCWla057kuGobxkTpahDg2owi8odcOEZIrk4tWg9fY7lwIyZ1zdekGDKN/tZnIFf2PDCy42uvhgsl0acPfes@lists.postgresql.org X-Gm-Message-State: AOJu0YwwgkQwRPEc0zTbRxXn15NWnxXAUsEdYSfhmPejkjv7of/PDbCI 5a7+lGlTOaS02dutnQDe4d60XJKYi5UzHc1jka+8RodnLMCiO6/YQO295JIAbsrOF0IQ40jVjCg UDWCR0cdi3dC4T+X3Z/HpbGoOqyinqrSnPCrJTA== X-Google-Smtp-Source: AGHT+IG90e1kmodVrVnBzT02qjbVdxXRwac+QHoTOU1gRhTYdA4lcXbpixfQ+G1CY78iJx2V+12QGyMwwN1kfhqMXig= X-Received: by 2002:a05:600c:1d0e:b0:42d:a024:d6bb with SMTP id 5b1f17b1804b1-432b7517ac2mr106483955e9.20.1731351649914; Mon, 11 Nov 2024 11:00:49 -0800 (PST) MIME-Version: 1.0 References: <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de> <48d3ff87-a435-488f-b803-258dab6485d6@enterprisedb.com> <6761860c-c66e-400a-b45f-aa741f548771@vondra.me> <8b58ae08-9ed7-43aa-92b0-1976ea6385ed@vondra.me> <225323b8-2c93-4784-8616-c58931a776bc@vondra.me> In-Reply-To: From: Peter Geoghegan Date: Mon, 11 Nov 2024 14:00:24 -0500 Message-ID: Subject: Re: index prefetching To: Robert Haas Cc: Tomas Vondra , Andres Freund , Melanie Plageman , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Nov 11, 2024 at 1:33=E2=80=AFPM Robert Haas = wrote: > That makes sense from the point of view of working with the btree code > itself, but from a system-wide perspective, it's weird to pretend like > the pins don't exist or don't matter just because a buffer lock is > also held. I can see how that could cause confusion. If you're working on nbtree all day long, it becomes natural, though. Both points are true, and relevant to the discussion. I prefer to over-communicate when discussing these points -- it's too easy to talk past each other here. I think that the precise reasons why the index AM does things with buffer pins will need to be put on a more rigorous and formalized footing with Tomas' patch. The different requirements/safety considerations will have to be carefully teased apart. > I had actually forgotten that the btree code tends to > pin+lock together; now that you mention it, I remember that I knew it > at one point, but it fell out of my head a long time ago... The same thing appears to mostly be true of hash, which mostly uses _hash_getbuf + _hash_relbuf (hash's idiosyncratic use of cleanup locks notwithstanding). To be fair it does look like GiST's gistdoinsert function holds onto multiple buffer pins at a time, for its own reasons -- index AM reasons. But this looks to be more or less an optimization to deal with navigating the tree with a loose index order, where multiple descents and ascents are absolutely expected. (This makes it a bit like the nbtree "drop lock but not pin" case that I mentioned in my last email.) It's not as if these gistdoinsert buffer pins persist across calls to amgettuple, though, so for the purposes of this discussion about the new batch API to replace amgettuple they are not relevant -- they don't actually undermine my point. (Though to be fair their existence does help to explain why you found my characterization of buffer pins as irrelevant to index AMs confusing.) The real sign that what I said is generally true of index AMs is that you'll see so few calls to LockBufferForCleanup/ConditionalLockBufferForCleanup. Only hash calls ConditionalLockBufferForCleanup at all (which I find a bit weird). Both GiST and SP-GiST call neither functions -- even during VACUUM. So GiST and SP-GiST make clear that index AMs (that support only MVCC snapshot scans) can easily get by without any use of cleanup locks (and with no externally significant use of buffer pins). > > I think that this is exactly what I propose to do, said in a different > > way. (Again, I wouldn't have expressed it in this way because it seems > > obvious to me that buffer pins don't have nearly the same significance > > to an index AM as they do to heapam -- they have no value in > > protecting the index structure, or helping an index scan to reason > > about concurrency that isn't due to a heapam issue.) > > > > Does that make sense? > > Yeah, it just really throws me for a loop that you're using "pin" to > mean "pin at a time when we don't also hold a lock." I'll try to be more careful about that in the future, then. > The fundamental > purpose of a pin is to prevent a buffer from being evicted while > someone is in the middle of looking at it, and nothing that uses > buffers can possibly work correctly without that guarantee. Everything > you've written in parentheses there is, AFAICT, 100% wrong if you mean > "any pin" and 100% correct if you mean "a pin held without a > corresponding lock." I agree. --=20 Peter Geoghegan