Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwUTo-00A8ey-0C for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Mar 2026 00:16:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vwUTm-00Bogu-2s for pgsql-hackers@arkaria.postgresql.org; Sun, 01 Mar 2026 00:16:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwUTl-00Bogm-2M for pgsql-hackers@lists.postgresql.org; Sun, 01 Mar 2026 00:16:14 +0000 Received: from fhigh-b8-smtp.messagingengine.com ([202.12.124.159]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vwUTh-00000001nXJ-0i8h for pgsql-hackers@postgresql.org; Sun, 01 Mar 2026 00:16:13 +0000 Received: from phl-compute-04.internal (phl-compute-04.internal [10.202.2.44]) by mailfhigh.stl.internal (Postfix) with ESMTP id AEB077A014D; Sat, 28 Feb 2026 19:16:06 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Sat, 28 Feb 2026 19:16:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1772324166; x=1772410566; bh=i6va3d99N1aJFVqXxxD5x5oT9JTSOuC5Ty+ykbmd1ak=; b= SpEP6XTxRr2wy7+Hd9bNkY/qRd56PXBYwkc8AGJBGAKp8EoTW43WM/ySGEn5HT/m BzHyOgrQW589UfPRQRIIQLALaOgDUtcb7H2hvIBubPKvwZ54fN8sokl3mXcgsgBH M6NWMmCrbiKcpnTMt2qnad/d9f8eSYUEjAG6J0RrhMbZMnx2iLdNSKXFMqNQC9BU N5cEmOVH4HbU5+3IJFUsbH55kQ2pmhOURrq8P/muXN9M1QXY8z1M20s0D9BrdMlP f6fp1G6p5uk24bTzqLb8+69iyoxof9d+fpnnDs6q9UDpZM8PKXmA/GkX5W0IP89u tpVoNz07AzGzPIJm6Jxp9g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1772324166; x= 1772410566; bh=i6va3d99N1aJFVqXxxD5x5oT9JTSOuC5Ty+ykbmd1ak=; b=o QFVxs6dxIeE7tV9sJGMdY5ZF3Suw610/McPVvtnqdL+zK0hoqU3f9KAt3V6W0tKg OXJQ+heclX0TsUWfywvQustuy6B3pkrLo/sTo7JzIQPKvonvNhCngJ4wwehJkUqi M2L6XG9JoVCIAHkbD6WEsjoB+6EbFPuETafgqycfXqibEQkKlsqMoqGSheSDa1Co 0YyTSNgxDTDTLptIu8lTiajIo08WAufvyUJkHTRsnblPbidMJq5tqGEMyf7HqQ+Y oVldzkr+YqbWFsithsFQmqC35nfdOoM7eoeOdLg/B73ux/S3i13SwciFc7u87D+u z5XOGiwUY/0t5La15joOQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvheeffedvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrg htthgvrhhnpeeluefgueegfeevvddttddtieegveelkeetgfeuhefhvdfhueetudehhfdt gffgieenucffohhmrghinhepphhoshhtghhrrdgvshenucevlhhushhtvghrufhiiigvpe dtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdgu vgdpnhgspghrtghpthhtohepgedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepph hgsegsohifthdrihgvpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhm rghilhdrtghomhdprhgtphhtthhopehhlhhinhhnrghkrgesihhkihdrfhhipdhrtghpth htohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 28 Feb 2026 19:16:05 -0500 (EST) Date: Sat, 28 Feb 2026 19:16:04 -0500 From: Andres Freund To: Melanie Plageman Cc: pgsql-hackers@postgresql.org, Heikki Linnakangas , Peter Geoghegan Subject: Re: Unlogged rel fake lsn vs GetVictimBuffer() Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2026-02-28 13:58:48 -0500, Melanie Plageman wrote: > On Sat, Feb 28, 2026 at 1:09 PM Andres Freund wrote: > > > > I think we ought to fix it, even if it is currently harmless. It seems like > > it'd not be too hard for this to go wrong in the future, it'd e.g. make sense > > for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which > > would have bad consequences if it were done with a fake LSN. > > > > I also think it might be good for XLogNeedsFlush() to have an assertion > > verifying that the LSN in the realm of the possible. It's too easy to feed it > > something entirely bogus right now and never notice. > > I agree we should fix it. I noticed it while working on write > combining. I wrote the attached patch to clean up. It's more than you > were mentioning doing, but I think it makes the code nicer. You can of > course also add assertions to XLogNeedsFlush() too. Ah. > From dd75df0df8184ff87034df30982f763fe4cd15c8 Mon Sep 17 00:00:00 2001 > From: Melanie Plageman > Date: Wed, 7 Jan 2026 13:32:18 -0500 > Subject: [PATCH v0 03/15] Streamline buffer rejection for bulkreads of > unlogged tables > > Bulk-read buffer access strategies reject reusing a buffer from the > buffer access strategy ring if doing so would require flushing WAL. > Unlogged relations never require WAL flushes, so this check can be > skipped. It can also be skipped for buffer access strategies other than > bulkread. This avoids taking the buffer header lock unnecessarily. This under-sells the need for the change a bit, given that we'll just pass a bogus value to XLogNeedsFlush() today for unlogged rels. > Author: Melanie Plageman > Reviewed-by: Chao Li > Discussion: https://postgr.es/m/flat/0198DBB9-4A76-49E4-87F8-43D46DD0FD76%40gmail.com#1d8677fc75dc8b39f0eb5dd6bbafb65d > --- > src/backend/storage/buffer/bufmgr.c | 60 ++++++++++++++++++++------- > src/backend/storage/buffer/freelist.c | 13 +++++- > src/include/storage/buf_internals.h | 1 + > 3 files changed, 57 insertions(+), 17 deletions(-) > > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c > index 286742e6968..0b5adc9e4a6 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -2520,25 +2520,14 @@ again: > * If using a nondefault strategy, and writing the buffer would > * require a WAL flush, let the strategy decide whether to go ahead > * and write/reuse the buffer or to choose another victim. We need a > - * lock to inspect the page LSN, so this can't be done inside > + * content lock to inspect the page LSN, so this can't be done inside > * StrategyGetBuffer. > */ > - if (strategy != NULL) > + if (strategy && StrategyRejectBuffer(strategy, buf_hdr, from_ring)) > { > - XLogRecPtr lsn; > - > - /* Read the LSN while holding buffer header lock */ > - buf_state = LockBufHdr(buf_hdr); > - lsn = BufferGetLSN(buf_hdr); > - UnlockBufHdr(buf_hdr); > - > - if (XLogNeedsFlush(lsn) > - && StrategyRejectBuffer(strategy, buf_hdr, from_ring)) > - { > - LockBuffer(buf, BUFFER_LOCK_UNLOCK); > - UnpinBuffer(buf_hdr); > - goto again; > - } > + LockBuffer(buf, BUFFER_LOCK_UNLOCK); > + UnpinBuffer(buf_hdr); > + goto again; > } > > /* OK, do the I/O */ > @@ -3438,6 +3427,45 @@ TrackNewBufferPin(Buffer buf) > BLCKSZ); > } > > +/* > + * Returns true if the buffer needs WAL flushed before it can be written out. > + * > + * If the result is required to be correct, the caller must hold a buffer > + * content lock. If they only hold a shared content lock, we'll need to > + * acquire the buffer header spinlock, so they must not already hold it. > + */ > +bool > +BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked) > +{ > + uint32 buf_state = pg_atomic_read_u64(&bufdesc->state); > + Buffer buffer; > + char *page; > + XLogRecPtr lsn; > + > + /* > + * Unlogged buffers can't need WAL flush. See FlushBuffer() for more > + * details on unlogged relations with LSNs. > + */ > + if (!(buf_state & BM_PERMANENT)) > + return false; > + > + buffer = BufferDescriptorGetBuffer(bufdesc); > + page = BufferGetPage(buffer); > + > + if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked) > + lsn = PageGetLSN(page); > + else > + { > + /* Buffer is either share locked or not locked */ > + LockBufHdr(bufdesc); > + lsn = PageGetLSN(page); > + UnlockBufHdr(bufdesc); > + } I buy the exclusive_locked branch, but the rest seems a bit dubious: - BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and I don't think we use strategy - XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the relative cost of locking the buffer header isn't going to matter. > diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c > index b7687836188..37398865d19 100644 > --- a/src/backend/storage/buffer/freelist.c > +++ b/src/backend/storage/buffer/freelist.c > @@ -780,12 +780,17 @@ IOContextForStrategy(BufferAccessStrategy strategy) > * be written out and doing so would require flushing WAL too. This gives us > * a chance to choose a different victim. > * > + * The buffer must be pinned and content locked and the buffer header spinlock > + * must not be held. > + * > * Returns true if buffer manager should ask for a new victim, and false > * if this buffer should be written and re-used. > */ > bool > StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring) > { > + Assert(strategy); > + > /* We only do this in bulkread mode */ > if (strategy->btype != BAS_BULKREAD) > return false; > @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r > strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf)) > return false; > > + Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf))); > + Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED)); > + > + if (!BufferNeedsWALFlush(buf, false)) > + return false; > + > /* > - * Remove the dirty buffer from the ring; necessary to prevent infinite > + * Remove the dirty buffer from the ring; necessary to prevent an infinite > * loop if all ring members are dirty. > */ > strategy->buffers[strategy->current] = InvalidBuffer; It's a bit unfortunate that now you have an external function call from bufmgr.c (for StrategyRejectBuffer()) and then an external call back to bufmgr.c, just to then return back to freelist.c and then to bufmgr.c. And you need to re-read the buf_state in BufferNeedsWALFlush() again. I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it the old buf_state, to avoid having to re-fetch it. Greetings, Andres Freund