MIME-Version: 1.0
References: 
 <CAOZWJqPc+s_vA-UfWWLR0s6Mt+DCffjXXVyLHJNJiuMrDLTYcA@mail.gmail.com>
 <CAOJCrX91Xf3HU5J0Vn_FdrRDpMevNiZUEN3oAWwk4J1H0ibo-Q@mail.gmail.com>
 <ce7a0632877f187246a96bde9c41e590c8986703.camel@cybertec.at>
 <CAOJCrX-3S-afnX=DqTwb=+SS8-_0Gexqs_D+z12jNbg8xZ5ccw@mail.gmail.com>
In-Reply-To: 
 <CAOJCrX-3S-afnX=DqTwb=+SS8-_0Gexqs_D+z12jNbg8xZ5ccw@mail.gmail.com>
From: OMPRAKASH SAHU <sahuop2121@gmail.com>
Date: Fri, 31 Oct 2025 13:17:48 +0530
Message-ID: 
 <CAOZWJqNR3dxnwn+HGPszQB8BY67_E=eoa7SzArL=t=PMOtUAMQ@mail.gmail.com>
Subject: Re: WAL replay is too slow on secondary server
To: Shubhang Joshi <shubhangjoshi2405@gmail.com>
Cc: Laurenz Albe <laurenz.albe@cybertec.at>, pgsql-admin@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000bfa5be06426f95d0"
Archived-At: 
 <https://www.postgresql.org/message-id/CAOZWJqNR3dxnwn%2BHGPszQB8BY67_E%3Deoa7SzArL%3Dt%3DPMOtUAMQ%40mail.gmail.com>
Precedence: bulk

--000000000000bfa5be06426f95d0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Everyone,

Thankyou for the suggestions.

I have changed few things from DB side on secondary only till yesterday it
seems fine I will be monitoring it further

Below are the changes:

wal_decode_buffer_size
maintenance_io_concurrency
bgwriter_delay

I checked with AWS support as well if micro bursting had happening but
allocation is enough as per them.


Regards,
OM


On Fri, 31 Oct 2025, 09:54 Shubhang Joshi, <shubhangjoshi2405@gmail.com>
wrote:

> Hi OM,
> Hi Laurenz,
>
> Thank you for your insights.
>
> I apologize for my previous suggestion regarding network speed; upon
> further review, it was not the correct cause in this scenario.
>
> Based on the current observations and system metrics, the accumulation of
> WAL on the standby server points to disk I/O limitations during replay=E2=
=80=94not
> network speed. CPU and RAM usage remain low, and WAL traffic is reaching
> the replica without delay, but replay/apply on disk is slow.
>
> The root cause appears to be disk subsystem performance and the
> single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing
> disk throughput or reconfiguring memory may help, but network latency doe=
s
> not seem to be affecting this scenario.
>
> Regards,
> Shubhang
>
> On Thu, 30 Oct 2025 at 17:45, Laurenz Albe <laurenz.albe@cybertec.at>
> wrote:
>
>> On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:
>> > On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, <sahuop2121@gma=
il.com>
>> wrote:
>> > > We have a postgresql cluster setup using patroni.
>> > > The DB is being used for heavy transactional application, now the
>> problem is that on replica server WAL replay is too slow.
>> > > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS
>> volume of wal directory and 10k &800 on data directory.
>> > >
>> > > but the WAL is being accumulated on the replica as usual and applyin=
g
>> wal is having no improvement.
>> >
>> > Please check the network speed =E2=80=94 we faced a similar issue earl=
ier, and
>> it turned out to be related to network performance.
>> > Kindly verify the network latency with your network team as well.
>>
>> If WAL is piling up on the standby, how can network speed be the problem=
?
>>
>> Yours,
>> Laurenz Albe
>>
>

--000000000000bfa5be06426f95d0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Hi Everyone,<div dir=3D"auto"><br></div><div dir=3D"auto"=
>Thankyou for the suggestions.</div><div dir=3D"auto"><br></div><div dir=3D=
"auto">I have changed few things from DB side on secondary only till yester=
day it seems fine I will be monitoring it further=C2=A0</div><div dir=3D"au=
to"><br></div><div dir=3D"auto">Below are the changes:</div><div dir=3D"aut=
o"><br></div><div dir=3D"auto">wal_decode_buffer_size</div><div dir=3D"auto=
">maintenance_io_concurrency</div><div dir=3D"auto">bgwriter_delay</div><di=
v dir=3D"auto"><br></div><div dir=3D"auto">I checked with AWS support as we=
ll if micro bursting had happening but allocation is enough as per them.</d=
iv><div dir=3D"auto"><br></div><div dir=3D"auto"><br></div><div dir=3D"auto=
">Regards,</div><div dir=3D"auto">OM</div><div dir=3D"auto"><br></div><div =
dir=3D"auto"><br></div><div dir=3D"auto"><br></div></div><br><div class=3D"=
gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On=
 Fri, 31 Oct 2025, 09:54 Shubhang Joshi, &lt;<a href=3D"mailto:shubhangjosh=
i2405@gmail.com">shubhangjoshi2405@gmail.com</a>&gt; wrote:<br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"ltr"><p style=3D"box-sizing:border-box=
;border-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font-family:fkG=
roteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;S=
egoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto Sans&quot;=
,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;=
Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragino Sans&quot=
;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;Yu Gothic&q=
uot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,Meiryo;font=
-size:16px;letter-spacing:0.16px">Hi OM,<br style=3D"box-sizing:border-box;=
border-width:0px;border-style:solid">Hi Laurenz,</p><p style=3D"box-sizing:=
border-box;border-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font-=
family:fkGroteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFo=
nt,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto =
Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&qu=
ot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragino=
 Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;Y=
u Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,M=
eiryo;font-size:16px;letter-spacing:0.16px">Thank you for your insights.</p=
><p style=3D"box-sizing:border-box;border-width:0px;border-style:solid;marg=
in:1rem 0px 0.5rem;font-family:fkGroteskNeue,ui-sans-serif,system-ui,-apple=
-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue=
&quot;,Arial,&quot;Noto Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;=
,&quot;Segoe UI Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Em=
oji&quot;,&quot;Hiragino Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD =
Gothic Neo&quot;,&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Mi=
crosoft JhengHei&quot;,Meiryo;font-size:16px;letter-spacing:0.16px">I apolo=
gize for my previous suggestion regarding network speed; upon further revie=
w, it was not the correct cause in this scenario.</p><p style=3D"box-sizing=
:border-box;border-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font=
-family:fkGroteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemF=
ont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto=
 Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&q=
uot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragin=
o Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;=
Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,=
Meiryo;font-size:16px;letter-spacing:0.16px">Based on the current observati=
ons and system metrics, the accumulation of WAL on the standby server point=
s to disk I/O limitations during replay=E2=80=94not network speed. CPU and =
RAM usage remain low, and WAL traffic is reaching the replica without delay=
, but replay/apply on disk is slow.</p><p style=3D"box-sizing:border-box;bo=
rder-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font-family:fkGrot=
eskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Sego=
e UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto Sans&quot;,sa=
ns-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;Seg=
oe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragino Sans&quot;,&=
quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;Yu Gothic&quot=
;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,Meiryo;font-si=
ze:16px;letter-spacing:0.16px">The root cause appears to be disk subsystem =
performance and the single-threaded nature of WAL replay in PostgreSQL reco=
very. Optimizing disk throughput or reconfiguring memory may help, but netw=
ork latency does not seem to be affecting this scenario.</p><p style=3D"box=
-sizing:border-box;border-width:0px;border-style:solid;margin:1rem 0px 0.5r=
em;font-family:fkGroteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMac=
SystemFont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&qu=
ot;Noto Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI =
Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;=
Hiragino Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;=
,&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei=
&quot;,Meiryo;font-size:16px;letter-spacing:0.16px">Regards,<br style=3D"bo=
x-sizing:border-box;border-width:0px;border-style:solid">Shubhang</p></div>=
<br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu=
, 30 Oct 2025 at 17:45, Laurenz Albe &lt;<a href=3D"mailto:laurenz.albe@cyb=
ertec.at" target=3D"_blank" rel=3D"noreferrer">laurenz.albe@cybertec.at</a>=
&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On T=
hu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:<br>
&gt; On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, &lt;<a href=3D"=
mailto:sahuop2121@gmail.com" target=3D"_blank" rel=3D"noreferrer">sahuop212=
1@gmail.com</a>&gt; wrote:<br>
&gt; &gt; We have a postgresql cluster setup using patroni.<br>
&gt; &gt; The DB is being used for heavy transactional application, now the=
 problem=C2=A0is that on replica server WAL replay is too slow.<br>
&gt; &gt; We have increased the IOPS to 6k and Throughput to 600 on nvme EB=
S volume of wal directory and 10k &amp;800 on data directory.<br>
&gt; &gt; <br>
&gt; &gt; but the WAL is being accumulated on the replica as usual and appl=
ying wal is having no improvement.<br>
&gt;<br>
&gt; Please check the network speed =E2=80=94 we faced a similar issue earl=
ier, and it turned out to be related to network performance.<br>
&gt; Kindly verify the network latency with your network team as well.<br>
<br>
If WAL is piling up on the standby, how can network speed be the problem?<b=
r>
<br>
Yours,<br>
Laurenz Albe<br>
</blockquote></div>
</blockquote></div>

--000000000000bfa5be06426f95d0--