MIME-Version: 1.0
References: 
 <CAOZWJqPc+s_vA-UfWWLR0s6Mt+DCffjXXVyLHJNJiuMrDLTYcA@mail.gmail.com>
 <CAOJCrX91Xf3HU5J0Vn_FdrRDpMevNiZUEN3oAWwk4J1H0ibo-Q@mail.gmail.com>
 <ce7a0632877f187246a96bde9c41e590c8986703.camel@cybertec.at>
In-Reply-To: <ce7a0632877f187246a96bde9c41e590c8986703.camel@cybertec.at>
From: Shubhang Joshi <shubhangjoshi2405@gmail.com>
Date: Fri, 31 Oct 2025 09:54:40 +0530
Message-ID: 
 <CAOJCrX-3S-afnX=DqTwb=+SS8-_0Gexqs_D+z12jNbg8xZ5ccw@mail.gmail.com>
Subject: Re: WAL replay is too slow on secondary server
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: OMPRAKASH SAHU <sahuop2121@gmail.com>, pgsql-admin@lists.postgresql.org
Content-Type: multipart/alternative; boundary="00000000000031c24206426cbf2a"
Archived-At: 
 <https://www.postgresql.org/message-id/CAOJCrX-3S-afnX%3DDqTwb%3D%2BSS8-_0Gexqs_D%2Bz12jNbg8xZ5ccw%40mail.gmail.com>
Precedence: bulk

--00000000000031c24206426cbf2a
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi OM,
Hi Laurenz,

Thank you for your insights.

I apologize for my previous suggestion regarding network speed; upon
further review, it was not the correct cause in this scenario.

Based on the current observations and system metrics, the accumulation of
WAL on the standby server points to disk I/O limitations during replay=E2=
=80=94not
network speed. CPU and RAM usage remain low, and WAL traffic is reaching
the replica without delay, but replay/apply on disk is slow.

The root cause appears to be disk subsystem performance and the
single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing
disk throughput or reconfiguring memory may help, but network latency does
not seem to be affecting this scenario.

Regards,
Shubhang

On Thu, 30 Oct 2025 at 17:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote=
:

> On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:
> > On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, <sahuop2121@gmai=
l.com>
> wrote:
> > > We have a postgresql cluster setup using patroni.
> > > The DB is being used for heavy transactional application, now the
> problem is that on replica server WAL replay is too slow.
> > > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS
> volume of wal directory and 10k &800 on data directory.
> > >
> > > but the WAL is being accumulated on the replica as usual and applying
> wal is having no improvement.
> >
> > Please check the network speed =E2=80=94 we faced a similar issue earli=
er, and
> it turned out to be related to network performance.
> > Kindly verify the network latency with your network team as well.
>
> If WAL is piling up on the standby, how can network speed be the problem?
>
> Yours,
> Laurenz Albe
>

--00000000000031c24206426cbf2a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><p class=3D"gmail-my-2 gmail-[&amp;+p]:mt-4 gmail-[&amp;_s=
trong:has(+br)]:inline-block gmail-[&amp;_strong:has(+br)]:pb-2" style=3D"b=
ox-sizing:border-box;border-width:0px;border-style:solid;margin:1rem 0px 0.=
5rem;font-family:fkGroteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkM=
acSystemFont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&=
quot;Noto Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe U=
I Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quo=
t;Hiragino Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quo=
t;,&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengH=
ei&quot;,Meiryo;font-size:16px;letter-spacing:0.16px">Hi OM,<br style=3D"bo=
x-sizing:border-box;border-width:0px;border-style:solid">Hi Laurenz,</p><p =
class=3D"gmail-my-2 gmail-[&amp;+p]:mt-4 gmail-[&amp;_strong:has(+br)]:inli=
ne-block gmail-[&amp;_strong:has(+br)]:pb-2" style=3D"box-sizing:border-box=
;border-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font-family:fkG=
roteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;S=
egoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto Sans&quot;=
,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;=
Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragino Sans&quot=
;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;Yu Gothic&q=
uot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,Meiryo;font=
-size:16px;letter-spacing:0.16px">Thank you for your insights.</p><p class=
=3D"gmail-my-2 gmail-[&amp;+p]:mt-4 gmail-[&amp;_strong:has(+br)]:inline-bl=
ock gmail-[&amp;_strong:has(+br)]:pb-2" style=3D"box-sizing:border-box;bord=
er-width:0px;border-style:solid;margin:1rem 0px 0.5rem;font-family:fkGrotes=
kNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe =
UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto Sans&quot;,sans=
-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;Segoe=
 UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;Hiragino Sans&quot;,&qu=
ot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,&quot;Yu Gothic&quot;,=
&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&quot;,Meiryo;font-size=
:16px;letter-spacing:0.16px">I apologize for my previous suggestion regardi=
ng network speed; upon further review, it was not the correct cause in this=
 scenario.</p><p class=3D"gmail-my-2 gmail-[&amp;+p]:mt-4 gmail-[&amp;_stro=
ng:has(+br)]:inline-block gmail-[&amp;_strong:has(+br)]:pb-2" style=3D"box-=
sizing:border-box;border-width:0px;border-style:solid;margin:1rem 0px 0.5re=
m;font-family:fkGroteskNeue,ui-sans-serif,system-ui,-apple-system,BlinkMacS=
ystemFont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quo=
t;Noto Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI E=
moji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color Emoji&quot;,&quot;H=
iragino Sans&quot;,&quot;PingFang SC&quot;,&quot;Apple SD Gothic Neo&quot;,=
&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&quot;Microsoft JhengHei&=
quot;,Meiryo;font-size:16px;letter-spacing:0.16px">Based on the current obs=
ervations and system metrics, the accumulation of WAL on the standby server=
 points to disk I/O limitations during replay=E2=80=94not network speed. CP=
U and RAM usage remain low, and WAL traffic is reaching the replica without=
 delay, but replay/apply on disk is slow.</p><p class=3D"gmail-my-2 gmail-[=
&amp;+p]:mt-4 gmail-[&amp;_strong:has(+br)]:inline-block gmail-[&amp;_stron=
g:has(+br)]:pb-2" style=3D"box-sizing:border-box;border-width:0px;border-st=
yle:solid;margin:1rem 0px 0.5rem;font-family:fkGroteskNeue,ui-sans-serif,sy=
stem-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Roboto,&quot;=
Helvetica Neue&quot;,Arial,&quot;Noto Sans&quot;,sans-serif,&quot;Apple Col=
or Emoji&quot;,&quot;Segoe UI Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot=
;Noto Color Emoji&quot;,&quot;Hiragino Sans&quot;,&quot;PingFang SC&quot;,&=
quot;Apple SD Gothic Neo&quot;,&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&=
quot;,&quot;Microsoft JhengHei&quot;,Meiryo;font-size:16px;letter-spacing:0=
.16px">The root cause appears to be disk subsystem performance and the sing=
le-threaded nature of WAL replay in PostgreSQL recovery. Optimizing disk th=
roughput or reconfiguring memory may help, but network latency does not see=
m to be affecting this scenario.</p><p class=3D"gmail-my-2 gmail-[&amp;+p]:=
mt-4 gmail-[&amp;_strong:has(+br)]:inline-block gmail-[&amp;_strong:has(+br=
)]:pb-2" style=3D"box-sizing:border-box;border-width:0px;border-style:solid=
;margin:1rem 0px 0.5rem;font-family:fkGroteskNeue,ui-sans-serif,system-ui,-=
apple-system,BlinkMacSystemFont,&quot;Segoe UI&quot;,Roboto,&quot;Helvetica=
 Neue&quot;,Arial,&quot;Noto Sans&quot;,sans-serif,&quot;Apple Color Emoji&=
quot;,&quot;Segoe UI Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Col=
or Emoji&quot;,&quot;Hiragino Sans&quot;,&quot;PingFang SC&quot;,&quot;Appl=
e SD Gothic Neo&quot;,&quot;Yu Gothic&quot;,&quot;Microsoft YaHei&quot;,&qu=
ot;Microsoft JhengHei&quot;,Meiryo;font-size:16px;letter-spacing:0.16px">Re=
gards,<br style=3D"box-sizing:border-box;border-width:0px;border-style:soli=
d">Shubhang</p></div><br><div class=3D"gmail_quote gmail_quote_container"><=
div dir=3D"ltr" class=3D"gmail_attr">On Thu, 30 Oct 2025 at 17:45, Laurenz =
Albe &lt;<a href=3D"mailto:laurenz.albe@cybertec.at">laurenz.albe@cybertec.=
at</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:<br>
&gt; On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, &lt;<a href=3D"=
mailto:sahuop2121@gmail.com" target=3D"_blank">sahuop2121@gmail.com</a>&gt;=
 wrote:<br>
&gt; &gt; We have a postgresql cluster setup using patroni.<br>
&gt; &gt; The DB is being used for heavy transactional application, now the=
 problem=C2=A0is that on replica server WAL replay is too slow.<br>
&gt; &gt; We have increased the IOPS to 6k and Throughput to 600 on nvme EB=
S volume of wal directory and 10k &amp;800 on data directory.<br>
&gt; &gt; <br>
&gt; &gt; but the WAL is being accumulated on the replica as usual and appl=
ying wal is having no improvement.<br>
&gt;<br>
&gt; Please check the network speed =E2=80=94 we faced a similar issue earl=
ier, and it turned out to be related to network performance.<br>
&gt; Kindly verify the network latency with your network team as well.<br>
<br>
If WAL is piling up on the standby, how can network speed be the problem?<b=
r>
<br>
Yours,<br>
Laurenz Albe<br>
</blockquote></div>

--00000000000031c24206426cbf2a--