public inbox for [email protected]
help / color / mirror / Atom feedFrom: Bharath Rupireddy <[email protected]>
To: Jeff Davis <[email protected]>
Cc: Jingtang Zhang <[email protected]>
Cc: [email protected]
Cc: Nitin Jadhav <[email protected]>
Subject: Re: Use WALReadFromBuffers in more places
Date: Fri, 20 Mar 2026 09:00:00 -0700
Message-ID: <CALj2ACUyvZ=MF82MEeJOyAFHOxFJ02jgGg4gbUCb3iemMgn4cA@mail.gmail.com> (raw)
In-Reply-To: <CALj2ACWn0nS=ejzHwdnnmWhRtWT6=+NAMbcMB5w37sCw97_zBA@mail.gmail.com>
References: <CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A@mail.gmail.com>
<CAPsk3_CzLbMe-D07H5Vo6yWFvyXHh5is7AoPUCFcztrUmf1haw@mail.gmail.com>
<CALj2ACVzL4uU=hxFpSfkqP4HeFCPucbBTEg6HNf_MPTYm52pHg@mail.gmail.com>
<CAMm1aWYa1fGKcuG69xGJPNXLQ_9zHrPqhr-ZGdj4so6Exq66MQ@mail.gmail.com>
<CALj2ACXa-2eEHHaNRwjcF1k9rtH=EJrWvbGJkucdSOD3zP-OUw@mail.gmail.com>
<CAPsk3_A7079UtVqm2WXXiwadGJ7DucpenmLwnXZgDgXee703Rw@mail.gmail.com>
<CALj2ACX+LKR7=3TkP83_9cdcXZd+9zhXWokjXyh5tTSi25+ogw@mail.gmail.com>
<[email protected]>
<CALj2ACW4BvWL_PyvS-ZF5Z70bymLPJWLHXVLbGCZoAwjA6EzeA@mail.gmail.com>
<[email protected]>
<CALj2ACWn0nS=ejzHwdnnmWhRtWT6=+NAMbcMB5w37sCw97_zBA@mail.gmail.com>
On Wed, Sep 24, 2025 at 12:32 PM Bharath Rupireddy
<[email protected]> wrote:
>
> > On Wed, 2025-09-24 at 07:26 -0700, Bharath Rupireddy wrote:
> > > Right. Reading unflushed WAL buffers for replication was one of the
> > > motivations. But, in general, WALReadFromBuffers has more benefits
> > > since it lets WAL buffers act as a cache for reads, avoiding the need
> > > to re-read WAL from disk for (both physical and logical) replication.
> > > For example, it makes the use of direct I/O for WAL more realistic
> > > and
> > > can provide significant performance benefits [1].
>
> Thanks for looking into this. I did performance analysis with WAL directo I/O to see how reading from WAL buffers affects walsenders: https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40ma.... Following is from that thread. Please let me know if you have any specific cases in mind. I'm happy to run the same test for logical replication.
>
> It helps WAL DIO; since there's no OS
> page cache, using WAL buffers as read cache helps a lot. It is clearly
> evident from my experiment with WAL DIO patch [1], see the results [2]
> and attached graph. As expected, WAL DIO brings down the TPS, whereas
> WAL buffers read i.e. this patch brings it up.
>
> [2] Test case is an insert pgbench workload.
> clients HEAD | WAL DIO | WAL DIO & WAL BUFFERS READ | WAL BUFFERS READ
> 1 1404 1070 1424 1375
> 2 1487 796 1454 1517
> 4 3064 1743 3011 3019
> 8 6114 3556 6026 5954
> 16 11560 7051 12216 12132
> 32 23181 13079 23449 23561
> 64 43607 26983 43997 45636
> 128 80723 45169 81515 81911
> 256 110925 90185 107332 114046
> 512 119354 109817 110287 117506
> 768 112435 105795 106853 111605
> 1024 107554 105541 105942 109370
> 2048 88552 79024 80699 90555
> 4096 61323 54814 58704 61743
Thank you all for reviewing this. Please find the attached rebased
patch for further review.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
Attachments:
[application/x-patch] v4-0001-Use-WALReadFromBuffers-in-more-places.patch (5.5K, 2-v4-0001-Use-WALReadFromBuffers-in-more-places.patch)
download | inline diff:
From b5f6fc083caaa3648f8abfdc370d0289e637931f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 20 Mar 2026 06:44:48 +0000
Subject: [PATCH v4] Use WALReadFromBuffers in more places
Commit 91f2cae introduced WALReadFromBuffers but used it only for
physical replication walsenders. There are several other callers
that use the read_local_xlog_page page_read callback, and logical
replication walsenders can also benefit from reading WAL from WAL
buffers using the new function. This commit extends the use of
WALReadFromBuffers to these callers.
Author: Bharath Rupireddy
Reviewed-by: Jingtang Zhang, Nitin Jadhav
Discussion: https://www.postgresql.org/message-id/CALj2ACVfF2Uj9NoFy-5m98HNtjHpuD17EDE9twVeJng-jTAe7A%40mail.gmail.com
---
src/backend/access/transam/xlogutils.c | 23 +++++++-
src/backend/replication/walsender.c | 77 +++++++++++++++++---------
2 files changed, 70 insertions(+), 30 deletions(-)
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index 5fbe39133b8..c4c677f69fd 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -876,6 +876,7 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
int count;
WALReadError errinfo;
TimeLineID currTLI;
+ Size bytesRead;
loc = targetPagePtr + reqLen;
@@ -995,9 +996,25 @@ read_local_xlog_page_guts(XLogReaderState *state, XLogRecPtr targetPagePtr,
count = read_upto - targetPagePtr;
}
- if (!WALRead(state, cur_page, targetPagePtr, count, tli,
- &errinfo))
- WALReadRaiseError(&errinfo);
+ /* First attempt to read from WAL buffers */
+ bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+ /* If we still have bytes to read, get them from WAL file */
+ if (bytesRead < count)
+ {
+ if (!WALRead(state,
+ cur_page + bytesRead,
+ targetPagePtr + bytesRead,
+ count - bytesRead,
+ tli,
+ &errinfo))
+ {
+ WALReadRaiseError(&errinfo);
+ }
+ bytesRead = count; /* All requested bytes read */
+ }
+
+ Assert(bytesRead == count);
/* number of valid bytes in the buffer */
return count;
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 08253103cb3..95255948eca 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1054,6 +1054,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
WALReadError errinfo;
XLogSegNo segno;
TimeLineID currTLI;
+ Size bytesRead;
/*
* Make sure we have enough WAL available before retrieving the current
@@ -1091,16 +1092,29 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
else
count = flushptr - targetPagePtr; /* part of the page available */
- /* now actually read the data, we know it's there */
- if (!WALRead(state,
- cur_page,
- targetPagePtr,
- count,
- currTLI, /* Pass the current TLI because only
+ /* First attempt to read from WAL buffers */
+ bytesRead = WALReadFromBuffers(cur_page, targetPagePtr, count, currTLI);
+
+ targetPagePtr += bytesRead;
+
+ /* If we still have bytes to read, get them from WAL file */
+ if (bytesRead < count)
+ {
+ if (!WALRead(state,
+ cur_page + bytesRead,
+ targetPagePtr,
+ count - bytesRead,
+ currTLI, /* Pass the current TLI because only
* WalSndSegmentOpen controls whether new TLI
* is needed. */
- &errinfo))
- WALReadRaiseError(&errinfo);
+ &errinfo))
+ {
+ WALReadRaiseError(&errinfo);
+ }
+ bytesRead = count; /* All requested bytes read */
+ }
+
+ Assert(bytesRead == count);
/*
* After reading into the buffer, check that what we read was valid. We do
@@ -3219,7 +3233,7 @@ XLogSendPhysical(void)
Size nbytes;
XLogSegNo segno;
WALReadError errinfo;
- Size rbytes;
+ Size bytesRead;
/* If requested switch the WAL sender to the stopping state. */
if (got_STOPPING)
@@ -3435,24 +3449,33 @@ XLogSendPhysical(void)
enlargeStringInfo(&output_message, nbytes);
retry:
- /* attempt to read WAL from WAL buffers first */
- rbytes = WALReadFromBuffers(&output_message.data[output_message.len],
- startptr, nbytes, xlogreader->seg.ws_tli);
- output_message.len += rbytes;
- startptr += rbytes;
- nbytes -= rbytes;
-
- /* now read the remaining WAL from WAL file */
- if (nbytes > 0 &&
- !WALRead(xlogreader,
- &output_message.data[output_message.len],
- startptr,
- nbytes,
- xlogreader->seg.ws_tli, /* Pass the current TLI because
- * only WalSndSegmentOpen controls
- * whether new TLI is needed. */
- &errinfo))
- WALReadRaiseError(&errinfo);
+ /* First attempt to read from WAL buffers */
+ bytesRead = WALReadFromBuffers(&output_message.data[output_message.len],
+ startptr,
+ nbytes,
+ xlogreader->seg.ws_tli);
+
+ startptr += bytesRead;
+
+ /* If we still have bytes to read, get them from WAL file */
+ if (bytesRead < nbytes)
+ {
+ if (!WALRead(xlogreader,
+ &output_message.data[output_message.len + bytesRead],
+ startptr,
+ nbytes - bytesRead,
+ xlogreader->seg.ws_tli, /* Pass the current TLI
+ * because only
+ * WalSndSegmentOpen controls
+ * whether new TLI is needed. */
+ &errinfo))
+ {
+ WALReadRaiseError(&errinfo);
+ }
+ bytesRead = nbytes; /* All requested bytes read */
+ }
+
+ Assert(bytesRead == nbytes);
/* See logical_read_xlog_page(). */
XLByteToSeg(startptr, segno, xlogreader->segcxt.ws_segsize);
--
2.47.3
view thread (12+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Use WALReadFromBuffers in more places
In-Reply-To: <CALj2ACUyvZ=MF82MEeJOyAFHOxFJ02jgGg4gbUCb3iemMgn4cA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox