Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfRGF-001tXI-0o for pgsql-hackers@arkaria.postgresql.org; Mon, 12 Jan 2026 23:23:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vfRFE-001mfi-0T for pgsql-hackers@arkaria.postgresql.org; Mon, 12 Jan 2026 23:22:44 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfRFC-001mfX-2w for pgsql-hackers@lists.postgresql.org; Mon, 12 Jan 2026 23:22:44 +0000 Received: from fout-b2-smtp.messagingengine.com ([202.12.124.145]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfRF9-00063r-29 for pgsql-hackers@postgresql.org; Mon, 12 Jan 2026 23:22:42 +0000 Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfout.stl.internal (Postfix) with ESMTP id E2AC71D00110; Mon, 12 Jan 2026 18:22:38 -0500 (EST) Received: from phl-frontend-03 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Mon, 12 Jan 2026 18:22:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1768260158; x=1768346558; bh=02+y4iQk/BEpjnIFWIrHXTBzf+muxITjEOdJJwvoweU=; b= I8rQA1CRrwxSs+/uroDpxItbAzfsL7FcdJDRTSLfgVuKrI/ge7ddGRw4s/NLr2Br y92ZamTG58Xd5ld+1SlSk+6kznmMpwDx4F7DfNonEQvxKppujzqaV/Fsxj9CUHDc Szx31+gRRTekdHNLsS2KE1nKJ4dUMIvOsnuJIoFqUaXF7eWBT2GRzgA0dVTlyCBh JYv7AwzqtoT9wGWg0gU5pqO4JFzuhuhHJKtrZ1/59GANvZqPyOR/P8Z6b4LoVYwg pL442DL0xpw/i9Oa0Ht4ITQ+IIWzro2BMd8ghyR802gcsO4//4X5OwBXX8u6jArC bwil+IJ5yXXAbjMo4nrbcw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1768260158; x= 1768346558; bh=02+y4iQk/BEpjnIFWIrHXTBzf+muxITjEOdJJwvoweU=; b=X GCQeubyv1a4kTrHl7IZO8EOF/CAlvZ4UWBT+pNkcZbB2RnGL7IAVsb6COOuVBQls EaZue+mBTMdjiPSeQ8NLwDgsoGx8alCs0GHUTpYfGPi0NDboepU6OOcINLZ3U8iA RdDI9T5ZOtFjNd46Xb+vrgIx2LO0c0Bbe0cwmq1heRxFf8tN6u4+SOoRX1yUWDFN zDle/IG3od7WcVjdw2oxFTG8ZBLQlly0HPrtEE3ojW7TRDj3x5zlZgA3THCEF3V4 HHiC+d6d63Akty0MkXJMgkH2zKy+vMs1sn0SvXINuXw9Y5gqoYG74bQobQ/+wfWn 7AV2gJyzJVq1gjd0iPLhg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdduudekjeeiucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpefghffgvefgffevledvheeivdeltdeuudeuffehfedufeelvdfhtdeifeeh hfegfeenucffohhmrghinhepshhtrghtvgdrrghspdhpohhsthhgrhdrvghsnecuvehluh hsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprghnughrvghssegr nhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopeelpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopegsohgvkhgvfihurhhmodhpohhsthhgrhgvshesghhmrghilhdrtgho mhdprhgtphhtthhopehmvghlrghnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhmpd hrtghpthhtohepmhhitghhrggvlhdrphgrqhhuihgvrhesghhmrghilhdrtghomhdprhgt phhtthhopehrvghshhhkvghkihhrihhllhesghhmrghilhdrtghomhdprhgtphhtthhope hrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpthhtohepthhhohhmrghs rdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohephhhlihhnnhgrkhgrsehikh hirdhfihdprhgtphhtthhopehnohgrhheslhgvrggusghorghtrdgtohhmpdhrtghpthht ohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 12 Jan 2026 18:22:38 -0500 (EST) Date: Mon, 12 Jan 2026 18:22:37 -0500 From: Andres Freund To: Melanie Plageman Cc: Kirill Reshke , Heikki Linnakangas , Matthias van de Meent , pgsql-hackers@postgresql.org, Thomas Munro , Noah Misch , Robert Haas , Michael Paquier Subject: Re: Buffer locking is special (hints, checksums, AIO writes) Message-ID: References: <1108f18d-cf7c-4f17-b29c-a119fe42f7e5@iki.fi> <5dwlfu2jyzkyf3nrlzxxblxctb6xio5es73ptgsahjnmfu5miu@772rc764hfhi> <4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-01-12 17:27:30 -0500, Melanie Plageman wrote: > On Mon, Jan 12, 2026 at 12:45 PM Andres Freund wrote: > > > > Also working on doing comment polishing of the later patches, found a few > > things, but not quite enough to be worth reposting yet. > > I looked at 0004 and 0005 and re-looked at 0006 - 0007. > > For 0004, I think you should clarify the commit message a bit. I had > trouble understanding when WAKE_IN_PROGRESS is set. So, before > RELEASE_OK was set all the time except when a process woke up and > hadn't run yet. Now, WAKE_IN_PROGRESS is only set when a process is > woken up but hasn't run yet. Personally, I just needed a bit more > specificity (maybe even a bit more formality and grammatically > correctness) from the commit message to get it. Is this better? lwlock: Invert meaning of LW_FLAG_RELEASE_OK Previously, a flag was set to indicate that a lock release should wake up waiters. Since waking waiters is the default behavior in the majority of cases, this logic has been inverted. The new LW_FLAG_WAKE_IN_PROGRESS flag is now set iff wakeups are explicitly inhibited. The motivation for this change is that in an upcoming commit, content locks will be implemented independently of lwlocks, with the lock state stored as part of BufferDesc.state. As all of a buffer's flags are cleared when the buffer is invalidated, without this change we would have to re-add the RELEASE_OK flag after clearing the flags; otherwise, the next lock release would not wake waiters. It seems good to keep the implementation of lwlocks and buffer content locks as similar as reasonably possible. Discussion: https://postgr.es/m/4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf > I agree that separating 0005 is helpful. Kewl. > 0007 looks basically fine to me. I'd comb through it with an AI tool > to catch a few nits I saw like an outdated reference to RELEASE_OK and > a missing word in the commit message, etc. Found a few that way and with some manual searching. > Otherwise, I mostly looked to see if the wakeup semantics seemed right > and if anything jumped out at me while skimming (i.e. I didn't go > through every line with a fine-toothed comb). > > The two things I came up with were: > > I wondered why this was needed (i.e. why it wasn't needed before) > @@ -6688,7 +7428,25 @@ ResOwnerReleaseBufferPin(Datum res) > if (BufferIsLocal(buffer)) > UnpinLocalBufferNoOwner(buffer); > else > + { > + PrivateRefCountEntry *ref; > + > + ref = GetPrivateRefCountEntry(buffer, false); > + > + /* > + * If the buffer was locked at the time of the resowner release, > + * release the lock now. This should only happen after errors. > + */ > + if (ref->data.lockmode != BUFFER_LOCK_UNLOCK) > + { > + BufferDesc *buf = GetBufferDescriptor(buffer - 1); > + > + HOLD_INTERRUPTS(); /* match the upcoming RESUME_INTERRUPTS */ > + BufferLockUnlock(buffer, buf); > + } > + > UnpinBufferNoOwner(GetBufferDescriptor(buffer - 1)); > + } > } It's needed because previously content locks were released as part of the LWLockReleaseAll() that are sprinkled across various error recovery paths. Now that content locks aren't implemented via lwlocks anymore, something new is needed. > is it related to your comment in the commit message > 2) Error recovery for content locks is implemented as part of the > already existing private-refcount tracking mechanism in combination > with resowners? Yes. > As for your FIXMEs, > + /* > + * FIXME: This is reusing the lwlock fields. That's not a correctness > + * issue, a backend can't wait for both an lwlock and a buffer content > + * lock at the same time. However, it seems pretty ugly, particularly > + * given that the field names have an lw* prefix. But duplicating the > + * fields also seems somewhat superfluous. > + */ > > personally I can live with reusing the lwlock fields now that it's > fairly well documented. Cool. That's the conclusion I also came to. So unless somebody pipes up soon, I'll remove the FIXME from the commit message and code. > + /* XXX: combine with fetch_and above? */ > + UnlockBufHdr(buf_hdr); > > Are you thinking about adding a helper that stops waiting and unlocks? I'm not sure what you mean by that? Just whether I plan to implement the FIXME? > > Perhaps move the locking code into a buffer_locking.h or such? Needs to be inline functions for efficiency unfortunately. > > So you mean put all of the static buffer locking functions you added > to bufmgr.c inline into a header file? Yes, that's what I was wondering about. > bufmgr.c is super long anyway, so it's not like making it separate > makes the file manageable. On the other hand, it's probably better to > not keep making it worse. Yea. OTOH I don't know if a header that's just included by one file is really an improvement :/ > For example, I find it really annoying that the helper function prototypes > for res owner and ref count related functions are grouped before their > implementations and then below that there is another seemingly arbitrary > group of prototypes and then their implementations. Like, what is the logic > there? I agree it's pretty awful this way. I don't know how the hell that happened, despite probably being the party to blame (4b4b680c3d6d). Nobody in that thread commented upon it, it was that way starting in the first version. Odd. I guess I should propose fixing that :/ Greetings, Andres Freund