Re: WAL replay is too slow on secondary server

public inbox for [email protected]  
help / color / mirror / Atom feed

From: OMPRAKASH SAHU <[email protected]>
To: Shubhang Joshi <[email protected]>
Cc: Laurenz Albe <[email protected]>
Cc: [email protected]
Subject: Re: WAL replay is too slow on secondary server
Date: Fri, 31 Oct 2025 13:17:48 +0530
Message-ID: <CAOZWJqNR3dxnwn+HGPszQB8BY67_E=eoa7SzArL=t=PMOtUAMQ@mail.gmail.com> (raw)
In-Reply-To: <CAOJCrX-3S-afnX=DqTwb=+SS8-_0Gexqs_D+z12jNbg8xZ5ccw@mail.gmail.com>
References: <CAOZWJqPc+s_vA-UfWWLR0s6Mt+DCffjXXVyLHJNJiuMrDLTYcA@mail.gmail.com>
	<CAOJCrX91Xf3HU5J0Vn_FdrRDpMevNiZUEN3oAWwk4J1H0ibo-Q@mail.gmail.com>
	<[email protected]>
	<CAOJCrX-3S-afnX=DqTwb=+SS8-_0Gexqs_D+z12jNbg8xZ5ccw@mail.gmail.com>

Hi Everyone,

Thankyou for the suggestions.

I have changed few things from DB side on secondary only till yesterday it
seems fine I will be monitoring it further

Below are the changes:

wal_decode_buffer_size
maintenance_io_concurrency
bgwriter_delay

I checked with AWS support as well if micro bursting had happening but
allocation is enough as per them.


Regards,
OM




On Fri, 31 Oct 2025, 09:54 Shubhang Joshi, <[email protected]>
wrote:

> Hi OM,
> Hi Laurenz,
>
> Thank you for your insights.
>
> I apologize for my previous suggestion regarding network speed; upon
> further review, it was not the correct cause in this scenario.
>
> Based on the current observations and system metrics, the accumulation of
> WAL on the standby server points to disk I/O limitations during replay—not
> network speed. CPU and RAM usage remain low, and WAL traffic is reaching
> the replica without delay, but replay/apply on disk is slow.
>
> The root cause appears to be disk subsystem performance and the
> single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing
> disk throughput or reconfiguring memory may help, but network latency does
> not seem to be affecting this scenario.
>
> Regards,
> Shubhang
>
> On Thu, 30 Oct 2025 at 17:45, Laurenz Albe <[email protected]>
> wrote:
>
>> On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:
>> > On Thu, 30 Oct, 2025, 10:07 am OMPRAKASH SAHU, <[email protected]>
>> wrote:
>> > > We have a postgresql cluster setup using patroni.
>> > > The DB is being used for heavy transactional application, now the
>> problem is that on replica server WAL replay is too slow.
>> > > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS
>> volume of wal directory and 10k &800 on data directory.
>> > >
>> > > but the WAL is being accumulated on the replica as usual and applying
>> wal is having no improvement.
>> >
>> > Please check the network speed — we faced a similar issue earlier, and
>> it turned out to be related to network performance.
>> > Kindly verify the network latency with your network team as well.
>>
>> If WAL is piling up on the standby, how can network speed be the problem?
>>
>> Yours,
>> Laurenz Albe
>>
>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: WAL replay is too slow on secondary server
  In-Reply-To: <CAOZWJqNR3dxnwn+HGPszQB8BY67_E=eoa7SzArL=t=PMOtUAMQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox