public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Unlogged rel fake lsn vs GetVictimBuffer()
7+ messages / 2 participants
[nested] [flat]

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
@ 2026-02-28 18:58 Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Melanie Plageman @ 2026-02-28 18:58 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

On Sat, Feb 28, 2026 at 1:09 PM Andres Freund <[email protected]> wrote:
>
> I think we ought to fix it, even if it is currently harmless. It seems like
> it'd not be too hard for this to go wrong in the future, it'd e.g. make sense
> for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which
> would have bad consequences if it were done with a fake LSN.
>
> I also think it might be good for XLogNeedsFlush() to have an assertion
> verifying that the LSN in the realm of the possible. It's too easy to feed it
> something entirely bogus right now and never notice.

I agree we should fix it. I noticed it while working on write
combining. I wrote the attached patch to clean up. It's more than you
were mentioning doing, but I think it makes the code nicer. You can of
course also add assertions to XLogNeedsFlush() too.

- Melanie


Attachments:

  [text/x-patch] buffer-rejection.patch (5.5K, 2-buffer-rejection.patch)
  download | inline diff:
From dd75df0df8184ff87034df30982f763fe4cd15c8 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 7 Jan 2026 13:32:18 -0500
Subject: [PATCH v0 03/15] Streamline buffer rejection for bulkreads of
 unlogged tables

Bulk-read buffer access strategies reject reusing a buffer from the
buffer access strategy ring if doing so would require flushing WAL.
Unlogged relations never require WAL flushes, so this check can be
skipped. It can also be skipped for buffer access strategies other than
bulkread. This avoids taking the buffer header lock unnecessarily.

Author: Melanie Plageman <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/0198DBB9-4A76-49E4-87F8-43D46DD0FD76%40gmail.com#1d8677fc75dc8b39f0eb5dd6bbafb65d
---
 src/backend/storage/buffer/bufmgr.c   | 60 ++++++++++++++++++++-------
 src/backend/storage/buffer/freelist.c | 13 +++++-
 src/include/storage/buf_internals.h   |  1 +
 3 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 286742e6968..0b5adc9e4a6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2520,25 +2520,14 @@ again:
 		 * If using a nondefault strategy, and writing the buffer would
 		 * require a WAL flush, let the strategy decide whether to go ahead
 		 * and write/reuse the buffer or to choose another victim.  We need a
-		 * lock to inspect the page LSN, so this can't be done inside
+		 * content lock to inspect the page LSN, so this can't be done inside
 		 * StrategyGetBuffer.
 		 */
-		if (strategy != NULL)
+		if (strategy && StrategyRejectBuffer(strategy, buf_hdr, from_ring))
 		{
-			XLogRecPtr	lsn;
-
-			/* Read the LSN while holding buffer header lock */
-			buf_state = LockBufHdr(buf_hdr);
-			lsn = BufferGetLSN(buf_hdr);
-			UnlockBufHdr(buf_hdr);
-
-			if (XLogNeedsFlush(lsn)
-				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
-			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				UnpinBuffer(buf_hdr);
-				goto again;
-			}
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			UnpinBuffer(buf_hdr);
+			goto again;
 		}
 
 		/* OK, do the I/O */
@@ -3438,6 +3427,45 @@ TrackNewBufferPin(Buffer buf)
 							  BLCKSZ);
 }
 
+/*
+ * Returns true if the buffer needs WAL flushed before it can be written out.
+ *
+ * If the result is required to be correct, the caller must hold a buffer
+ * content lock. If they only hold a shared content lock, we'll need to
+ * acquire the buffer header spinlock, so they must not already hold it.
+ */
+bool
+BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
+{
+	uint32		buf_state = pg_atomic_read_u64(&bufdesc->state);
+	Buffer		buffer;
+	char	   *page;
+	XLogRecPtr	lsn;
+
+	/*
+	 * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
+	 * details on unlogged relations with LSNs.
+	 */
+	if (!(buf_state & BM_PERMANENT))
+		return false;
+
+	buffer = BufferDescriptorGetBuffer(bufdesc);
+	page = BufferGetPage(buffer);
+
+	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked)
+		lsn = PageGetLSN(page);
+	else
+	{
+		/* Buffer is either share locked or not locked */
+		LockBufHdr(bufdesc);
+		lsn = PageGetLSN(page);
+		UnlockBufHdr(bufdesc);
+	}
+
+	return XLogNeedsFlush(lsn);
+}
+
+
 #define ST_SORT sort_checkpoint_bufferids
 #define ST_ELEMENT_TYPE CkptSortItem
 #define ST_COMPARE(a, b) ckpt_buforder_comparator(a, b)
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index b7687836188..37398865d19 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -780,12 +780,17 @@ IOContextForStrategy(BufferAccessStrategy strategy)
  * be written out and doing so would require flushing WAL too.  This gives us
  * a chance to choose a different victim.
  *
+ * The buffer must be pinned and content locked and the buffer header spinlock
+ * must not be held.
+ *
  * Returns true if buffer manager should ask for a new victim, and false
  * if this buffer should be written and re-used.
  */
 bool
 StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
 {
+	Assert(strategy);
+
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
@@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
 		strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
 		return false;
 
+	Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
+	Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
+
+	if (!BufferNeedsWALFlush(buf, false))
+		return false;
+
 	/*
-	 * Remove the dirty buffer from the ring; necessary to prevent infinite
+	 * Remove the dirty buffer from the ring; necessary to prevent an infinite
 	 * loop if all ring members are dirty.
 	 */
 	strategy->buffers[strategy->current] = InvalidBuffer;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 215f2dea95c..0899be8da48 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -552,6 +552,7 @@ extern void WritebackContextInit(WritebackContext *context, int *max_pending);
 extern void IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context);
 
 extern void TrackNewBufferPin(Buffer buf);
+extern bool BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked);
 
 /* solely to make it easier to write tests */
 extern bool StartBufferIO(BufferDesc *buf, bool forInput, bool nowait);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
@ 2026-03-01 00:16 ` Andres Freund <[email protected]>
  2026-03-07 17:33   ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Andres Freund @ 2026-03-01 00:16 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

Hi,

On 2026-02-28 13:58:48 -0500, Melanie Plageman wrote:
> On Sat, Feb 28, 2026 at 1:09 PM Andres Freund <[email protected]> wrote:
> >
> > I think we ought to fix it, even if it is currently harmless. It seems like
> > it'd not be too hard for this to go wrong in the future, it'd e.g. make sense
> > for XLogNeedsFlush() to XLogCtl->LogwrtRqst.Write and wake up walwriter. Which
> > would have bad consequences if it were done with a fake LSN.
> >
> > I also think it might be good for XLogNeedsFlush() to have an assertion
> > verifying that the LSN in the realm of the possible. It's too easy to feed it
> > something entirely bogus right now and never notice.
> 
> I agree we should fix it. I noticed it while working on write
> combining. I wrote the attached patch to clean up. It's more than you
> were mentioning doing, but I think it makes the code nicer. You can of
> course also add assertions to XLogNeedsFlush() too.

Ah.

> From dd75df0df8184ff87034df30982f763fe4cd15c8 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <[email protected]>
> Date: Wed, 7 Jan 2026 13:32:18 -0500
> Subject: [PATCH v0 03/15] Streamline buffer rejection for bulkreads of
>  unlogged tables
> 
> Bulk-read buffer access strategies reject reusing a buffer from the
> buffer access strategy ring if doing so would require flushing WAL.
> Unlogged relations never require WAL flushes, so this check can be
> skipped. It can also be skipped for buffer access strategies other than
> bulkread. This avoids taking the buffer header lock unnecessarily.

This under-sells the need for the change a bit, given that we'll just pass a
bogus value to XLogNeedsFlush() today for unlogged rels.


> Author: Melanie Plageman <[email protected]>
> Reviewed-by: Chao Li <[email protected]>
> Discussion: https://postgr.es/m/flat/0198DBB9-4A76-49E4-87F8-43D46DD0FD76%40gmail.com#1d8677fc75dc8b39f0eb5dd6bb...
> ---
>  src/backend/storage/buffer/bufmgr.c   | 60 ++++++++++++++++++++-------
>  src/backend/storage/buffer/freelist.c | 13 +++++-
>  src/include/storage/buf_internals.h   |  1 +
>  3 files changed, 57 insertions(+), 17 deletions(-)
> 
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 286742e6968..0b5adc9e4a6 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2520,25 +2520,14 @@ again:
>  		 * If using a nondefault strategy, and writing the buffer would
>  		 * require a WAL flush, let the strategy decide whether to go ahead
>  		 * and write/reuse the buffer or to choose another victim.  We need a
> -		 * lock to inspect the page LSN, so this can't be done inside
> +		 * content lock to inspect the page LSN, so this can't be done inside
>  		 * StrategyGetBuffer.
>  		 */
> -		if (strategy != NULL)
> +		if (strategy && StrategyRejectBuffer(strategy, buf_hdr, from_ring))
>  		{
> -			XLogRecPtr	lsn;
> -
> -			/* Read the LSN while holding buffer header lock */
> -			buf_state = LockBufHdr(buf_hdr);
> -			lsn = BufferGetLSN(buf_hdr);
> -			UnlockBufHdr(buf_hdr);
> -
> -			if (XLogNeedsFlush(lsn)
> -				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
> -			{
> -				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> -				UnpinBuffer(buf_hdr);
> -				goto again;
> -			}
> +			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> +			UnpinBuffer(buf_hdr);
> +			goto again;
>  		}
>  
>  		/* OK, do the I/O */
> @@ -3438,6 +3427,45 @@ TrackNewBufferPin(Buffer buf)
>  							  BLCKSZ);
>  }
>  
> +/*
> + * Returns true if the buffer needs WAL flushed before it can be written out.
> + *
> + * If the result is required to be correct, the caller must hold a buffer
> + * content lock. If they only hold a shared content lock, we'll need to
> + * acquire the buffer header spinlock, so they must not already hold it.
> + */
> +bool
> +BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
> +{
> +	uint32		buf_state = pg_atomic_read_u64(&bufdesc->state);
> +	Buffer		buffer;
> +	char	   *page;
> +	XLogRecPtr	lsn;
> +
> +	/*
> +	 * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
> +	 * details on unlogged relations with LSNs.
> +	 */
> +	if (!(buf_state & BM_PERMANENT))
> +		return false;
> +
> +	buffer = BufferDescriptorGetBuffer(bufdesc);
> +	page = BufferGetPage(buffer);
> +
> +	if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked)
> +		lsn = PageGetLSN(page);
> +	else
> +	{
> +		/* Buffer is either share locked or not locked */
> +		LockBufHdr(bufdesc);
> +		lsn = PageGetLSN(page);
> +		UnlockBufHdr(bufdesc);
> +	}

I buy the exclusive_locked branch, but the rest seems a bit dubious:

- BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and
  I don't think we use strategy

- XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the
  relative cost of locking the buffer header isn't going to matter.



> diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
> index b7687836188..37398865d19 100644
> --- a/src/backend/storage/buffer/freelist.c
> +++ b/src/backend/storage/buffer/freelist.c
> @@ -780,12 +780,17 @@ IOContextForStrategy(BufferAccessStrategy strategy)
>   * be written out and doing so would require flushing WAL too.  This gives us
>   * a chance to choose a different victim.
>   *
> + * The buffer must be pinned and content locked and the buffer header spinlock
> + * must not be held.
> + *
>   * Returns true if buffer manager should ask for a new victim, and false
>   * if this buffer should be written and re-used.
>   */
>  bool
>  StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
>  {
> +	Assert(strategy);
> +
>  	/* We only do this in bulkread mode */
>  	if (strategy->btype != BAS_BULKREAD)
>  		return false;
> @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
>  		strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
>  		return false;
>  
> +	Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
> +	Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
> +
> +	if (!BufferNeedsWALFlush(buf, false))
> +		return false;
> +
>  	/*
> -	 * Remove the dirty buffer from the ring; necessary to prevent infinite
> +	 * Remove the dirty buffer from the ring; necessary to prevent an infinite
>  	 * loop if all ring members are dirty.
>  	 */
>  	strategy->buffers[strategy->current] = InvalidBuffer;

It's a bit unfortunate that now you have an external function call from
bufmgr.c (for StrategyRejectBuffer()) and then an external call back to
bufmgr.c, just to then return back to freelist.c and then to bufmgr.c.  And
you need to re-read the buf_state in BufferNeedsWALFlush() again.

I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it
the old buf_state, to avoid having to re-fetch it.

Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
@ 2026-03-07 17:33   ` Melanie Plageman <[email protected]>
  2026-03-10 19:31     ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Melanie Plageman @ 2026-03-07 17:33 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

On Sat, Feb 28, 2026 at 7:16 PM Andres Freund <[email protected]> wrote:
>
> > +BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive_locked)
> > +{
> > +     uint32          buf_state = pg_atomic_read_u64(&bufdesc->state);
> > +     Buffer          buffer;
> > +     char       *page;
> > +     XLogRecPtr      lsn;
> > +
> > +     /*
> > +      * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
> > +      * details on unlogged relations with LSNs.
> > +      */
> > +     if (!(buf_state & BM_PERMANENT))
> > +             return false;
> > +
> > +     buffer = BufferDescriptorGetBuffer(bufdesc);
> > +     page = BufferGetPage(buffer);
> > +
> > +     if (!XLogHintBitIsNeeded() || BufferIsLocal(buffer) || exclusive_locked)
> > +             lsn = PageGetLSN(page);
> > +     else
> > +     {
> > +             /* Buffer is either share locked or not locked */
> > +             LockBufHdr(bufdesc);
> > +             lsn = PageGetLSN(page);
> > +             UnlockBufHdr(bufdesc);
> > +     }
>
> I buy the exclusive_locked branch, but the rest seems a bit dubious:
>
> - BufferIsLocal(buffer) is impossible, the buffer would not be permanent, and
>   I don't think we use strategy

Changed it to an assert.

> - XLogHintBitIsNeeded() isn't *that* cheap and if we dirtied the buffer, the
>   relative cost of locking the buffer header isn't going to matter.

BufferGetLSNAtomic() checks XLogHintBitIsNeeded() as part of what it
calls a "fast path". I suppose in the case of BufferNeedsWALFlush() we
expect it only to be called on dirty buffers, so perhaps we don't need
to worry about being "fast". But why would XLogHintBitIsNeeded() be
slower than the buffer header lock? Is accessing the ControlFile
variable expensive or is it just the external function call?

> > diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
>
> > @@ -795,8 +800,14 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
> >               strategy->buffers[strategy->current] != BufferDescriptorGetBuffer(buf))
> >               return false;
> >
> > +     Assert(BufferIsLockedByMe(BufferDescriptorGetBuffer(buf)));
> > +     Assert(!(pg_atomic_read_u64(&buf->state) & BM_LOCKED));
> > +
> > +     if (!BufferNeedsWALFlush(buf, false))
> > +             return false;
>
> It's a bit unfortunate that now you have an external function call from
> bufmgr.c (for StrategyRejectBuffer()) and then an external call back to
> bufmgr.c, just to then return back to freelist.c and then to bufmgr.c.  And
> you need to re-read the buf_state in BufferNeedsWALFlush() again.
>
> I'd probably just call BufferNeedsWALFlush() in GetVictimBuffer() and pass it
> the old buf_state, to avoid having to re-fetch it.

My idea was to avoid taking the buffer header lock if it is a
non-bulkread strategy -- which you can't do if you need to call
BufferNeedsWALFlush() from bufmgr.c. You have to check
BufferNeedsWALFlush() before StrategyRejectBuffer() because
StrategyRejectBuffer() sets the buffer to InvalidBuffer in the ring.
So you think that the external function call will be more expensive
than taking the buffer header lock?

And, I find the current StrategyRejectBuffer() kind of silly and hard
to understand -- its comment mentions needing WAL flush, but, of
course it doesn't check that at all. I've changed it as you suggest,
but I did really prefer it the other way from a code readability
perspective.

- Melanie


Attachments:

  [text/x-patch] v2-0001-Avoid-WAL-flush-checks-for-unlogged-buffers-in-Ge.patch (5.4K, 2-v2-0001-Avoid-WAL-flush-checks-for-unlogged-buffers-in-Ge.patch)
  download | inline diff:
From 7db7aa69f5c56aa2ae419d99b3308a6209f6f82a Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Sat, 7 Mar 2026 11:38:14 -0500
Subject: [PATCH v2] Avoid WAL-flush checks for unlogged buffers in
 GetVictimBuffer()

GetVictimBuffer() rejects a victim buffer if it is from a bulkread
strategy ring and reusing it would require flushing WAL. Unlogged table
buffers can have fake LSNs (e.g. unlogged GiST pages) and calling
XLogNeedsFlush() on a fake LSN is meaningless.

This isn't an issue currently because the bulkread strategy is not used
for relations with fake LSNs. However, fixing it avoids future issues
and avoids acquiring the buffer header lock for unlogged table buffers.
While we're at it, avoid taking the buffer header lock for buffers not
from the strategy ring.

Author: Melanie Plageman <[email protected]>
Reported-by: Andres Freund <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/fmkqmyeyy7bdpvcgkheb6yaqewemkik3ls6aaveyi5ibmvtxnd%40nu2kvy5rq3a6
---
 src/backend/access/transam/xlog.c     |  3 ++
 src/backend/storage/buffer/bufmgr.c   | 68 +++++++++++++++++++++------
 src/backend/storage/buffer/freelist.c |  4 +-
 3 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b9b678f3722..f846f3814b4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3182,6 +3182,9 @@ XLogNeedsFlush(XLogRecPtr record)
 			return true;
 	}
 
+	/* Protect against callers using fake LSNs */
+	Assert(record <= GetXLogInsertRecPtr());
+
 	/* Quick exit if already known flushed */
 	if (record <= LogwrtResult.Flush)
 		return false;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 5f3d083e938..6d16f872674 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -625,6 +625,8 @@ static bool PinBuffer(BufferDesc *buf, BufferAccessStrategy strategy,
 static void PinBuffer_Locked(BufferDesc *buf);
 static void UnpinBuffer(BufferDesc *buf);
 static void UnpinBufferNoOwner(BufferDesc *buf);
+static bool BufferNeedsWALFlush(BufferDesc *bufdesc, uint64 buf_state,
+								bool exclusive_locked);
 static void BufferSync(int flags);
 static int	SyncOneBuffer(int buf_id, bool skip_recently_used,
 						  WritebackContext *wb_context);
@@ -2522,22 +2524,13 @@ again:
 		 * lock to inspect the page LSN, so this can't be done inside
 		 * StrategyGetBuffer.
 		 */
-		if (strategy != NULL)
+		if (strategy && from_ring &&
+			BufferNeedsWALFlush(buf_hdr, buf_state, false) &&
+			StrategyRejectBuffer(strategy, buf_hdr, from_ring))
 		{
-			XLogRecPtr	lsn;
-
-			/* Read the LSN while holding buffer header lock */
-			buf_state = LockBufHdr(buf_hdr);
-			lsn = BufferGetLSN(buf_hdr);
-			UnlockBufHdr(buf_hdr);
-
-			if (XLogNeedsFlush(lsn)
-				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
-			{
-				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
-				UnpinBuffer(buf_hdr);
-				goto again;
-			}
+			LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+			UnpinBuffer(buf_hdr);
+			goto again;
 		}
 
 		/* OK, do the I/O */
@@ -3437,6 +3430,51 @@ TrackNewBufferPin(Buffer buf)
 							  BLCKSZ);
 }
 
+/*
+ * Returns true if the buffer needs WAL flushed before it can be written out.
+ *
+ * The caller must pass a buf_state value read from bufdesc->state.
+ *
+ * If the result is required to be correct, the caller must hold a buffer
+ * content lock. If they only hold a shared content lock, we'll need to
+ * acquire the buffer header spinlock, so they must not already hold it.
+ */
+static bool
+BufferNeedsWALFlush(BufferDesc *bufdesc, uint64 buf_state,
+					bool exclusive_locked)
+{
+	Buffer		buffer;
+	char	   *page;
+	XLogRecPtr	lsn;
+
+	/*
+	 * Unlogged buffers can't need WAL flush. See FlushBuffer() for more
+	 * details on unlogged relations with LSNs.
+	 */
+	if (!(buf_state & BM_PERMANENT))
+		return false;
+
+	buffer = BufferDescriptorGetBuffer(bufdesc);
+	Assert(!BufferIsLocal(buffer));
+	page = BufferGetPage(buffer);
+
+	if (exclusive_locked)
+	{
+		Assert(BufferIsLockedByMeInMode(buffer, BUFFER_LOCK_EXCLUSIVE));
+		lsn = PageGetLSN(page);
+	}
+	else
+	{
+		/* Buffer is either share locked or not locked. */
+		LockBufHdr(bufdesc);
+		lsn = PageGetLSN(page);
+		UnlockBufHdr(bufdesc);
+	}
+
+	return XLogNeedsFlush(lsn);
+}
+
+
 #define ST_SORT sort_checkpoint_bufferids
 #define ST_ELEMENT_TYPE CkptSortItem
 #define ST_COMPARE(a, b) ckpt_buforder_comparator(a, b)
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index b7687836188..6f31cd22289 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -786,6 +786,8 @@ IOContextForStrategy(BufferAccessStrategy strategy)
 bool
 StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
 {
+	Assert(strategy);
+
 	/* We only do this in bulkread mode */
 	if (strategy->btype != BAS_BULKREAD)
 		return false;
@@ -796,7 +798,7 @@ StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_r
 		return false;
 
 	/*
-	 * Remove the dirty buffer from the ring; necessary to prevent infinite
+	 * Remove the dirty buffer from the ring; necessary to prevent an infinite
 	 * loop if all ring members are dirty.
 	 */
 	strategy->buffers[strategy->current] = InvalidBuffer;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
  2026-03-07 17:33   ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
@ 2026-03-10 19:31     ` Melanie Plageman <[email protected]>
  2026-03-10 19:46       ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Melanie Plageman @ 2026-03-10 19:31 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

None of my points above are blockers, so barring any objections I plan
to push this later today/tomorrow. It hasn't been sitting out long but
it is pretty trivial and I don't think it has any correctness issues.

- Melanie





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
  2026-03-07 17:33   ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-10 19:31     ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
@ 2026-03-10 19:46       ` Melanie Plageman <[email protected]>
  2026-03-11 18:03         ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Melanie Plageman @ 2026-03-10 19:46 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

On Tue, Mar 10, 2026 at 3:31 PM Melanie Plageman
<[email protected]> wrote:
>
> None of my points above are blockers, so barring any objections I plan
> to push this later today/tomorrow. It hasn't been sitting out long but
> it is pretty trivial and I don't think it has any correctness issues.

Now, I'm thinking that I should allow BufferNeedsWALFlush() to be
called on local buffers. I removed it in v2 because Andres mentioned
it could never happen when called by StrategyRejectBuffer() (because
we don't use strategies on local buffers), but there's no reason
BufferNeedsWALFlush() can't be used more widely in the future.

- Melanie





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
  2026-03-07 17:33   ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-10 19:31     ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-10 19:46       ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
@ 2026-03-11 18:03         ` Melanie Plageman <[email protected]>
  2026-03-11 18:54           ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Melanie Plageman @ 2026-03-11 18:03 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

On Tue, Mar 10, 2026 at 3:46 PM Melanie Plageman
<[email protected]> wrote:
>
> Now, I'm thinking that I should allow BufferNeedsWALFlush() to be
> called on local buffers. I removed it in v2 because Andres mentioned
> it could never happen when called by StrategyRejectBuffer() (because
> we don't use strategies on local buffers), but there's no reason
> BufferNeedsWALFlush() can't be used more widely in the future.

Well, due to 82467f627bd478569de, this is now a tiny one-liner. Will
push in an hour or so barring objections.

- Melanie


Attachments:

  [text/x-patch] v3-0001-Avoid-WAL-flush-checks-for-unlogged-buffers-in-Ge.patch (1.6K, 2-v3-0001-Avoid-WAL-flush-checks-for-unlogged-buffers-in-Ge.patch)
  download | inline diff:
From 0e26a8f6b54121432f2b5ef78764c8586521c766 Mon Sep 17 00:00:00 2001
From: Melanie Plageman <[email protected]>
Date: Wed, 11 Mar 2026 13:57:13 -0400
Subject: [PATCH v3] Avoid WAL-flush checks for unlogged buffers in
 GetVictimBuffer()

GetVictimBuffer() rejects a victim buffer if it is from a bulkread
strategy ring and reusing it would require flushing WAL. Unlogged table
buffers can have fake LSNs (e.g. unlogged GiST pages) and calling
XLogNeedsFlush() on a fake LSN is meaningless.

This is a bit of future-proofing because currently the bulkread strategy
is not used for relations with fake LSNs.

Author: Melanie Plageman <[email protected]>
Reported-by: Andres Freund <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Earlier version reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/flat/fmkqmyeyy7bdpvcgkheb6yaqewemkik3ls6aaveyi5ibmvtxnd%40nu2kvy5rq3a6
---
 src/backend/storage/buffer/bufmgr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0546ee0193c..26b195b5359 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2527,8 +2527,9 @@ again:
 		{
 			XLogRecPtr	lsn = BufferGetLSN(buf_hdr);
 
-			if (XLogNeedsFlush(lsn)
-				&& StrategyRejectBuffer(strategy, buf_hdr, from_ring))
+			if (buf_state & BM_PERMANENT &&
+				XLogNeedsFlush(lsn) &&
+				StrategyRejectBuffer(strategy, buf_hdr, from_ring))
 			{
 				LockBuffer(buf, BUFFER_LOCK_UNLOCK);
 				UnpinBuffer(buf_hdr);
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Unlogged rel fake lsn vs GetVictimBuffer()
  2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-01 00:16 ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Andres Freund <[email protected]>
  2026-03-07 17:33   ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-10 19:31     ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-10 19:46       ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
  2026-03-11 18:03         ` Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
@ 2026-03-11 18:54           ` Melanie Plageman <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Melanie Plageman @ 2026-03-11 18:54 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Heikki Linnakangas <[email protected]>; Peter Geoghegan <[email protected]>

On Wed, Mar 11, 2026 at 2:03 PM Melanie Plageman
<[email protected]> wrote:
>
> Well, due to 82467f627bd478569de, this is now a tiny one-liner. Will
> push in an hour or so barring objections.

This has been committed.

- Melanie





^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2026-03-11 18:54 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-28 18:58 Re: Unlogged rel fake lsn vs GetVictimBuffer() Melanie Plageman <[email protected]>
2026-03-01 00:16 ` Andres Freund <[email protected]>
2026-03-07 17:33   ` Melanie Plageman <[email protected]>
2026-03-10 19:31     ` Melanie Plageman <[email protected]>
2026-03-10 19:46       ` Melanie Plageman <[email protected]>
2026-03-11 18:03         ` Melanie Plageman <[email protected]>
2026-03-11 18:54           ` Melanie Plageman <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox