Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1urLpU-004heJ-5d for pgsql-hackers@arkaria.postgresql.org; Wed, 27 Aug 2025 19:29:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1urLpT-00HDes-9i for pgsql-hackers@arkaria.postgresql.org; Wed, 27 Aug 2025 19:29:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1urLpS-00HDek-7G for pgsql-hackers@lists.postgresql.org; Wed, 27 Aug 2025 19:29:07 +0000 Received: from fout-b3-smtp.messagingengine.com ([202.12.124.146]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1urLpQ-0024BX-0K for pgsql-hackers@postgresql.org; Wed, 27 Aug 2025 19:29:05 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.stl.internal (Postfix) with ESMTP id EEBF71D0006C; Wed, 27 Aug 2025 15:29:03 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Wed, 27 Aug 2025 15:29:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1756322943; x=1756409343; bh=RKYJ9K2yh8Xo8hQtDva6/RaK3hj0KJDylu9cmDBV7Ns=; b= YBrtLIoRS0iaVcQCqzG2Ypeq7tlAdueHvpyI5MxJ3eVtWfmFPhInWpm4ep8GpB1s K9YJ+IQNNCau9H+rTo+q9cmc5OoKTPGbD5x+jkvTQL5bjqoTKjdR2wo9MqJOo9xv v7/L+VwqOXlKyJns61DYYqjg+IlzOEDM+FXe57EbrRhPhCYTvXs47Id8XdfqOp+b deHLEDeyYxHnIvqDAkGQ0HOWmJGpeeeJ0Q/Rv7VaYygR/P9G+CnJBip5TbUE8Vkw V3lkYQFuVy77mmOMMPUY4yGKVauPI3pW8C8C3AaC7nLUcUkld+kRnJhbDTaW9SMq ++gGRGmx2cIIiPJrU+h7yw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756322943; x= 1756409343; bh=RKYJ9K2yh8Xo8hQtDva6/RaK3hj0KJDylu9cmDBV7Ns=; b=i cWh8SFWNdzgZ4i+Fp51sfwpuQnOj28BV1tczB8zg9U69PPrnYV0M8h6xzA8F5i70 3uQwo2KnaPmWuWKPMoh6NMqfsK0nTmkBh4LXMPEfokM1eoRjhBQ1WEUL7pYnMdAV pOt56ZUx1pfBY9vlqPIY+CnpvJAAOHygd3RcdsZeF72h5POY5f/T8jgkJhH3dYWm RktD8L10e+XDgsw5OAQbqZy4dHqPEoVrfmBa/YL1HnxEJJli0+shG+HN+cu5LchR kotfdl6p3+VXY5gx0iMdGGjV9EvnCWulye5URtG5qA5U71zCEQzv8H0EvAJtzPRc D2X1ZkUVqUexdS0kQQx2w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddujeekleelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpeekgfduteeugefggfefvdfhveeglefhuddukeeikeffheetheevjeekhfdu heduleenucffohhmrghinhepthgrghdrihhsnecuvehluhhsthgvrhfuihiivgeptdenuc frrghrrghmpehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdruggvpdhn sggprhgtphhtthhopeeipdhmohguvgepshhmthhpohhuthdprhgtphhtthhopehmvghlrg hnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhmpdhrtghpthhtoheprhhosggvrhht mhhhrggrshesghhmrghilhdrtghomhdprhgtphhtthhopehthhhomhgrshdrmhhunhhroh esghhmrghilhdrtghomhdprhgtphhtthhopehhlhhinhhnrghkrgesihhkihdrfhhipdhr tghpthhtohepnhhorghhsehlvggruggsohgrthdrtghomhdprhgtphhtthhopehpghhsqh hlqdhhrggtkhgvrhhssehpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 27 Aug 2025 15:29:03 -0400 (EDT) Date: Wed, 27 Aug 2025 15:29:02 -0400 From: Andres Freund To: Noah Misch Cc: Robert Haas , pgsql-hackers@postgresql.org, Melanie Plageman , Thomas Munro , Heikki Linnakangas Subject: Re: Buffer locking is special (hints, checksums, AIO writes) Message-ID: References: <20250827001449.fb.nmisch@google.com> <20250827191441.1c.nmisch@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20250827191441.1c.nmisch@google.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-08-27 12:14:41 -0700, Noah Misch wrote: > On Wed, Aug 27, 2025 at 12:18:27PM -0400, Andres Freund wrote: > > One way to do that would be to maintain a back-pointer from the BufferDesc to > > the BufferLookupEnt, since the latter *already* contains the BufferTag. We > > probably don't want to add another indirection to the buffer mapping hash > > table, otherwise we could deduplicate the other way round and just put padding > > between the modified and read-only part of a buffer desc. > > I think you're saying clients would save the back-pointer once and dereference > it many times, with each dereference of a saved back-pointer avoiding a shmem > read of BufferDesc.tag. Is that right? I was thinking that we'd not have BufferDesc.tag, instead just storing it solely in BufferLookupEnt. To get the tag of a BufferDesc, you'd every time have to follow the back-reference. But that's actually why it doesn't work - reading the back-reference pointer would have the same issue as just reading BufferDesc.tag... > > > On Tue, Aug 26, 2025 at 05:00:13PM -0400, Andres Freund wrote: > > > > On 2025-08-26 16:21:36 -0400, Robert Haas wrote: > > > > > On Fri, Aug 22, 2025 at 3:45 PM Andres Freund wrote: > > > > > > DOES ANYBODY HAVE A BETTER NAME THAN SHARE-EXCLUSIVE???!? > > > > > I would consider {AccessShare, Exclusive, AccessExclusive}. > > > > One thing I forgot to mention is that with the proposed re-architecture in > > place, we could subsequently go further and make pinning just be a very > > lightweight lock level, instead of that being a separate dedicated > > infrstructure. One nice outgrowth of that would be that that acquiring a > > cleanup lock would just be a real lock acquisition, instead of the dedicated > > limited machinery we have right now. > > > > Which would leave us with: > > - reference (pins today) > > - share > > - share-exclusive > > - exclusive > > - cleanup > > > > This doesn't quite seem to map onto the heavyweight lock levels in a sensible > > way... > > Could map it like this: > > AccessShare - pins today > RowShare - check tuple visibility (BUFFER_LOCK_SHARE today) > Share - set hint bits > ShareUpdateExclusive - clean/write out (borrowing Robert's idea) > Exclusive - add tuples, change xmax, etc. (BUFFER_LOCK_EXCLUSIVE today) > AccessExclusive - cleanup lock or evict the buffer I tend think having things like RowShare for buffer locking is confusing enough to actually make the similarity to the heavyweight locks to not be a win... > That has a separate level for hint bits vs. I/O, so multiple backends could > set hint bits. I don't know whether the benchmarks would favor maintaining > that distinction. I don't think it would - I actually found multiple backends setting the same hint bits to *hurt* performance a bit. But what's more important, we don't have the space for it, I think. Every lock that can be acquired multiple times needs a lock count of 18 bits. And we need to store the buffer state flags (10 bits). There's just not enough space in 64bit to have three 18bit counters as well as flag bits etc. > Compared to share-exclusive, I think I'd prefer a name that describes the use > cases, "set-hints-or-write" (or separate "write" and "set-hints" levels). I would too, I just couldn't come up with something that conveys the meanings in a sufficiently concise way :) > What do you think of that? I don't know whether that should win vs. names > like ShareUpdateExclusive, though. I think it'd be a win compared to the heavyweight lock names... Greetings, Andres Freund