Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEjrk-00H06m-8k for pgsql-admin@arkaria.postgresql.org; Fri, 31 Oct 2025 07:48:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEjrj-00ChEt-0K for pgsql-admin@arkaria.postgresql.org; Fri, 31 Oct 2025 07:48:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEjri-00ChEl-Jx for pgsql-admin@lists.postgresql.org; Fri, 31 Oct 2025 07:48:05 +0000 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEjrf-005D1x-14 for pgsql-admin@lists.postgresql.org; Fri, 31 Oct 2025 07:48:05 +0000 Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-b6d5c59f2b6so495383066b.2 for ; Fri, 31 Oct 2025 00:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761896881; x=1762501681; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=KBgqLcw64ejvcgHaJ09MOKfhBpezYUwjISUPRqhPE9g=; b=WlMykfSaEbS21hhd6ZXS/kCPRIzGSbBdEwm0JzJTMXi6JEMshAP0uADfaDc9wkqWNQ Nv2cHEL3TXTGBs2mulnyuHWMSEYhZGQ2wFESHk16cid/wNhO86XYrN+KT6I8GODbp+No Tm4DiKK17EAJ6ZO1AOW1rmLTY+8PSytB1arDmkcVVt/hA6vkJ2SAbRy7PUBxyEpzNL6f 0/FvnPThjP1Sn/pRYXBAPi1s+ZORxGS/5F26igDpaJ6NbaIJgKgCJ08f5xw7Pw2LKZNs BGa61atiK8bOYEYsomecrgzo3b7Ar1fQ2rgPrhXtTAkzzRxfUJhXUBTOhcolGYrRTYYl wdtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761896881; x=1762501681; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KBgqLcw64ejvcgHaJ09MOKfhBpezYUwjISUPRqhPE9g=; b=MIKRXbGUkulNhqF4JtgMNnW2A0D1PSnF36eLEEhKISpHnIVKbeRYl0X/9JtJL0DTb/ m3YfPpFEO07E1QE9IYWvJ1H4kXZefT+zIN/HTsLqBk55xY6eMHLOD4SoUCweMwyOhFqf fx+G0+fBrFFBp5DzLoy4NAhxl1ZolPrd+4ntOXhWDJBk3n9IfAQcxVmm3bBv4oSIspYV ZWhOgBW+uRzEqKxh+89O2ezkmQgR5/N6vS1kp+MnYrl8OAaCm6Bmj6znabuszsTHhaKb 9Y1/kVppa+LLPIo0EDqgcRaNBuSLmcCEDYdDuy8b6L530t8K+8IdV/UoCYF/3crlsAAC 7QQg== X-Forwarded-Encrypted: i=1; AJvYcCViMWgYESZltCA537OsRqupHFjahxCheT8ERdB1Q+x3Xe3bhO6wO03FALon3tanPnwEMla6HI3rQ4khCw==@lists.postgresql.org X-Gm-Message-State: AOJu0YxKkuOckLdSN7kerzsPXjRmWSRhFtLYrzOUdH7mH1CKPLoFmru4 +f64zgOMjjaG/2MRiFrcDlcUN5YRL1W9rMv/+/JOyDUcojo6Gjns5xhos1t/D/TMoGt1pXw5G88 ayLZ7X1phQY2peLgmL4LJ89ep+gCsJmQ= X-Gm-Gg: ASbGncuWn48p1vdtBNWulwhg/nqp5B+A9ylbVsvhp5xGh8SIdUY9vgAfYiHQCIAvrjU TPogyERsfsA8SG4oK7lIk0tTU1Ut0lR2ySVQOsiRoTBT43IB6Y7al6T+m0XlBvAUBSXsrTFF68H nTX9hNxL6dd7KQvSH/XTJKI0JyQeTFYuRRfkgRezLh39NqE9JHbxC98Biz+ExlkIpqZ3af3HUb/ jyA6czAT7pCnc/l0XKB/UXs7NYRGPxlcX1mzr8KdibpHGGHFdR1J1RtDf2sT/quvpx+GTguDZU= X-Google-Smtp-Source: AGHT+IEp8FL6x+EqmaaL+0LvYF7hkqQQblXIpD3RwCL3UzrzFuGsPGbHVECFwNHQG8V5sdLoLzgBUeQafGu+rKoJF5Q= X-Received: by 2002:a17:907:94c6:b0:b4a:d60d:fb68 with SMTP id a640c23a62f3a-b70700d36e5mr242464666b.6.1761896881430; Fri, 31 Oct 2025 00:48:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: OMPRAKASH SAHU Date: Fri, 31 Oct 2025 13:17:48 +0530 X-Gm-Features: AWmQ_bkxanBt7b9Y66GJS2YxQPJu6GleO39BrnZpyHTyRU43FvhRUwYWmgOdkdI Message-ID: Subject: Re: WAL replay is too slow on secondary server To: Shubhang Joshi Cc: Laurenz Albe , pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000bfa5be06426f95d0" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000bfa5be06426f95d0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Everyone, Thankyou for the suggestions. I have changed few things from DB side on secondary only till yesterday it seems fine I will be monitoring it further Below are the changes: wal_decode_buffer_size maintenance_io_concurrency bgwriter_delay I checked with AWS support as well if micro bursting had happening but allocation is enough as per them. Regards, OM On Fri, 31 Oct 2025, 09:54 Shubhang Joshi, wrote: > Hi OM, > Hi Laurenz, > > Thank you for your insights. > > I apologize for my previous suggestion regarding network speed; upon > further review, it was not the correct cause in this scenario. > > Based on the current observations and system metrics, the accumulation of > WAL on the standby server points to disk I/O limitations during replay=E2= =80=94not > network speed. CPU and RAM usage remain low, and WAL traffic is reaching > the replica without delay, but replay/apply on disk is slow. > > The root cause appears to be disk subsystem performance and the > single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing > disk throughput or reconfiguring memory may help, but network latency doe= s > not seem to be affecting this scenario. > > Regards, > Shubhang > > On Thu, 30 Oct 2025 at 17:45, Laurenz Albe > wrote: > >> On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote: >> > On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, >> wrote: >> > > We have a postgresql cluster setup using patroni. >> > > The DB is being used for heavy transactional application, now the >> problem is that on replica server WAL replay is too slow. >> > > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS >> volume of wal directory and 10k &800 on data directory. >> > > >> > > but the WAL is being accumulated on the replica as usual and applyin= g >> wal is having no improvement. >> > >> > Please check the network speed =E2=80=94 we faced a similar issue earl= ier, and >> it turned out to be related to network performance. >> > Kindly verify the network latency with your network team as well. >> >> If WAL is piling up on the standby, how can network speed be the problem= ? >> >> Yours, >> Laurenz Albe >> > --000000000000bfa5be06426f95d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Everyone,

Thankyou for the suggestions.

I have changed few things from DB side on secondary only till yester= day it seems fine I will be monitoring it further=C2=A0

Below are the changes:

wal_decode_buffer_size
maintenance_io_concurrency
bgwriter_delay

I checked with AWS support as we= ll if micro bursting had happening but allocation is enough as per them.


Regards,
OM




On= Fri, 31 Oct 2025, 09:54 Shubhang Joshi, <shubhangjoshi2405@gmail.com> wrote:

Hi OM,
Hi Laurenz,

Thank you for your insights.

I apolo= gize for my previous suggestion regarding network speed; upon further revie= w, it was not the correct cause in this scenario.

Based on the current observati= ons and system metrics, the accumulation of WAL on the standby server point= s to disk I/O limitations during replay=E2=80=94not network speed. CPU and = RAM usage remain low, and WAL traffic is reaching the replica without delay= , but replay/apply on disk is slow.

The root cause appears to be disk subsystem = performance and the single-threaded nature of WAL replay in PostgreSQL reco= very. Optimizing disk throughput or reconfiguring memory may help, but netw= ork latency does not seem to be affecting this scenario.

Regards,
Shubhang

=
On Thu= , 30 Oct 2025 at 17:45, Laurenz Albe <laurenz.albe@cybertec.at= > wrote:
On T= hu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote:
> On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, <sahuop212= 1@gmail.com> wrote:
> > We have a postgresql cluster setup using patroni.
> > The DB is being used for heavy transactional application, now the= problem=C2=A0is that on replica server WAL replay is too slow.
> > We have increased the IOPS to 6k and Throughput to 600 on nvme EB= S volume of wal directory and 10k &800 on data directory.
> >
> > but the WAL is being accumulated on the replica as usual and appl= ying wal is having no improvement.
>
> Please check the network speed =E2=80=94 we faced a similar issue earl= ier, and it turned out to be related to network performance.
> Kindly verify the network latency with your network team as well.

If WAL is piling up on the standby, how can network speed be the problem?
Yours,
Laurenz Albe
--000000000000bfa5be06426f95d0--