public inbox for [email protected]  
help / color / mirror / Atom feed
From: Bharath Rupireddy <[email protected]>
To: Jingtang Zhang <[email protected]>
Cc: [email protected]
Cc: Nitin Jadhav <[email protected]>
Subject: Re: Use WALReadFromBuffers in more places
Date: Sat, 13 Sep 2025 22:04:58 -0700
Message-ID: <CALj2ACX+LKR7=3TkP83_9cdcXZd+9zhXWokjXyh5tTSi25+ogw@mail.gmail.com> (raw)
In-Reply-To: <CAPsk3_A7079UtVqm2WXXiwadGJ7DucpenmLwnXZgDgXee703Rw@mail.gmail.com>
References: <CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A@mail.gmail.com>
	<CAPsk3_CzLbMe-D07H5Vo6yWFvyXHh5is7AoPUCFcztrUmf1haw@mail.gmail.com>
	<CALj2ACVzL4uU=hxFpSfkqP4HeFCPucbBTEg6HNf_MPTYm52pHg@mail.gmail.com>
	<CAMm1aWYa1fGKcuG69xGJPNXLQ_9zHrPqhr-ZGdj4so6Exq66MQ@mail.gmail.com>
	<CALj2ACXa-2eEHHaNRwjcF1k9rtH=EJrWvbGJkucdSOD3zP-OUw@mail.gmail.com>
	<CAPsk3_A7079UtVqm2WXXiwadGJ7DucpenmLwnXZgDgXee703Rw@mail.gmail.com>

Hi,

On Tue, Oct 15, 2024 at 1:22 AM Jingtang Zhang <[email protected]> wrote:
>
> I've been back to this patch for a while recently. I witness that if a WAL
> writer works fast, the already flushed WAL buffers will be zeroed out and
> re-initialized for future use by AdvanceXLInsertBuffer in
> XLogBackgroundFlush, so that WALReadFromBuffers will miss even though the
> space of WAL buffer is enough. It is much more unfriendly for logical
> walsenders than physical walsenders, because logical ones consume WAL
> slower than physical ones due to the extra decoding phase. Seems that the aim
> of AdvanceXLInsertBuffer in WAL writer contradicts with our reading from
> WAL buffer. Any thoughts?

Thanks for looking at this. Yes, the WAL writers can zero out flushed
buffers before WALReadFromBuffers gets to them. However,
WALReadFromBuffers was intentionally designed as an opportunistic
optimization - it's a "try this first, quickly" approach before
falling back to reading from WAL files. The no-locks design ensures it
never gets in the way of backends generating WAL, which is critical
for overall system performance.

I rebased and attached the v3 patch. I discarded the test extension
patch that demonstrated WALReadFromBuffers' behavior (i.e., waiting
for WAL to be fully copied to WAL buffers with
WaitXLogInsertionsToFinish), as I believe the comment at the top of
WALReadFromBuffers is sufficient documentation. I can reintroduce the
test extension if there's interest.

Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v3-0001-Use-WALReadFromBuffers-in-more-places.patch (5.5K, 2-v3-0001-Use-WALReadFromBuffers-in-more-places.patch)
  download | inline diff:
From 60aa2e18df060fcc341847585aa8cd4dcddea5ed Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sun, 14 Sep 2025 04:54:12 +0000
Subject: [PATCH v3] Use WALReadFromBuffers in more places

Commit 91f2cae introduced WALReadFromBuffers but used it only for
physical replication walsenders. There are several other callers
that use the read_local_xlog_page page_read callback, and logical
replication walsenders can also benefit from reading WAL from WAL
buffers using the new function. This commit extends the use of
WALReadFromBuffers to these callers.

Author: Bharath Rupireddy
Reviewed-by: Jingtang Zhang, Nitin Jadhav
Discussion: https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com
---
 src/backend/access/transam/xlogutils.c | 23 +++++++-
 src/backend/replication/walsender.c    | 77 +++++++++++++++++---------
 2 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 38176d9688e..3e5b4f75feb 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -876,6 +876,7 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
 	int			count;
 	WALReadError errinfo;
 	TimeLineID	currTLI;
+	Size		bytesRead;
 
 	loc = targetPagePtr + reqLen;
 
@@ -995,9 +996,25 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
 		count = read_upto - targetPagePtr;
 	}
 
-	if (!WALRead(state, cur_page, targetPagePtr, count, tli,
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < count)
+	{
+		if (!WALRead(state,
+					 cur_page + bytesRead,
+					 targetPagePtr + bytesRead,
+					 count - bytesRead,
+					 tli,
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = count;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == count);
 
 	/* number of valid bytes in the buffer */
 	return count;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 59822f22b8d..937936c3550 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1036,6 +1036,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	WALReadError errinfo;
 	XLogSegNo	segno;
 	TimeLineID	currTLI;
+	Size		bytesRead;
 
 	/*
 	 * Make sure we have enough WAL available before retrieving the current
@@ -1073,16 +1074,29 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 	else
 		count = flushptr - targetPagePtr;	/* part of the page available */
 
-	/* now actually read the data, we know it's there */
-	if (!WALRead(state,
-				 cur_page,
-				 targetPagePtr,
-				 count,
-				 currTLI,		/* Pass the current TLI because only
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+	targetPagePtr += bytesRead;
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < count)
+	{
+		if (!WALRead(state,
+					 cur_page + bytesRead,
+					 targetPagePtr,
+					 count - bytesRead,
+					 currTLI,	/* Pass the current TLI because only
 								 * WalSndSegmentOpen controls whether new TLI
 								 * is needed. */
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = count;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == count);
 
 	/*
 	 * After reading into the buffer, check that what we read was valid. We do
@@ -3176,7 +3190,7 @@ XLogSendPhysical(void)
 	Size		nbytes;
 	XLogSegNo	segno;
 	WALReadError errinfo;
-	Size		rbytes;
+	Size		bytesRead;
 
 	/* If requested switch the WAL sender to the stopping state. */
 	if (got_STOPPING)
@@ -3392,24 +3406,33 @@ XLogSendPhysical(void)
 	enlargeStringInfo(&output_message, nbytes);
 
 retry:
-	/* attempt to read WAL from WAL buffers first */
-	rbytes = WALReadFromBuffers(&output_message.data[output_message.len],
-								startptr, nbytes, xlogreader->seg.ws_tli);
-	output_message.len += rbytes;
-	startptr += rbytes;
-	nbytes -= rbytes;
-
-	/* now read the remaining WAL from WAL file */
-	if (nbytes > 0 &&
-		!WALRead(xlogreader,
-				 &output_message.data[output_message.len],
-				 startptr,
-				 nbytes,
-				 xlogreader->seg.ws_tli,	/* Pass the current TLI because
-											 * only WalSndSegmentOpen controls
-											 * whether new TLI is needed. */
-				 &errinfo))
-		WALReadRaiseError(&errinfo);
+	/* First attempt to read from WAL buffers */
+	bytesRead = WALReadFromBuffers(&output_message.data[output_message.len],
+								   startptr,
+								   nbytes,
+								   xlogreader->seg.ws_tli);
+
+	startptr += bytesRead;
+
+	/* If we still have bytes to read, get them from WAL file */
+	if (bytesRead < nbytes)
+	{
+		if (!WALRead(xlogreader,
+					 &output_message.data[output_message.len + bytesRead],
+					 startptr,
+					 nbytes - bytesRead,
+					 xlogreader->seg.ws_tli,	/* Pass the current TLI
+												 * because only
+												 * WalSndSegmentOpen controls
+												 * whether new TLI is needed. */
+					 &errinfo))
+		{
+			WALReadRaiseError(&errinfo);
+		}
+		bytesRead = nbytes;		/* All requested bytes read */
+	}
+
+	Assert(bytesRead == nbytes);
 
 	/* See logical_read_xlog_page(). */
 	XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize);
-- 
2.47.3



view thread (12+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Use WALReadFromBuffers in more places
  In-Reply-To: <CALj2ACX+LKR7=3TkP83_9cdcXZd+9zhXWokjXyh5tTSi25+ogw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox