Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEgh9-00Ft4X-DK for pgsql-admin@arkaria.postgresql.org; Fri, 31 Oct 2025 04:24:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEgh6-00BmUX-LJ for pgsql-admin@arkaria.postgresql.org; Fri, 31 Oct 2025 04:24:55 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEgh6-00BmUO-3M for pgsql-admin@lists.postgresql.org; Fri, 31 Oct 2025 04:24:55 +0000 Received: from mail-yw1-x1130.google.com ([2607:f8b0:4864:20::1130]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEgh3-004fr7-1J for pgsql-admin@lists.postgresql.org; Fri, 31 Oct 2025 04:24:54 +0000 Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-780fd0da0e9so2798607b3.0 for ; Thu, 30 Oct 2025 21:24:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761884692; x=1762489492; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=EFmiGws85x8qR1+dBnjG0HtIGg1vVqSRln4SdYU2Rms=; b=EhTWL4TTUB2HSkGfHbG0scL8f3GUkYAW0zw7UyyyMdxJSRIalAGF9AJy/BgdYX5YKy /tZW0l+nEgNSE64vSpiwjyM2m+cTGZEnj3MHdDxx1muh5zvy7sHzZxaJphTVqeQ/5SGn QcLhi9FNXtVRj96++gGfD0mxLf8KAFK/eH1gx8n0T+SMKvn1bIg71qMXLP+GFfUZoMWx NFR+mqjCg9R81VUVLveKLbBvf4MxmDvs8Z3E+fCiz2s0nedWAuQEvrz1T8f9fHvmnoch Z0BA686g6qF+FG5dC+9RZqpixMd3U3UTejF5yie2vM2xiWA8D0BteOZpEtDYLB/AArrU IroA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761884692; x=1762489492; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EFmiGws85x8qR1+dBnjG0HtIGg1vVqSRln4SdYU2Rms=; b=eFBLVP/9x37DiCQ7LNvzehV82r9UWpkfIZOoYILg/AP6EFVPQSsjbxnCRjsdhixmyR 9ff7EkQcfCBe9/9jw3Cs8XBGv+8hiQfmXZb7/9MHyFLDjUlJMksiG9o7qwgDSX8PKnQT KeiUGLR5zoVr8KkeA3WPRsWoeOsBfANlobBVHUm6cVnTbYbrIe710SUneZYSU2q6Evnq NSnzEZnt0SnuoqvJcqP95K/9ZYig9vpe/33piQcI+ucjsYtXsl1ofx8n8SX1pp+c+Am0 l3En2h9kjw905XJZUyKVhsb9rEh8l5Niw0ss+TeXq+mkdZjL6kC85ZCsKLQIgGWFXxiT jnVA== X-Forwarded-Encrypted: i=1; AJvYcCXTR4DyeLM7XJPBKwF4S0WEJnRSk/falW5I02pfER/hjv9ZIzTLXp9OEnTAUhoKE2Gdbejx9LI4W+V0cQ==@lists.postgresql.org X-Gm-Message-State: AOJu0YyDlPL2pGeqLtJjhnOTKFLCXloYR+Y4tmdL9VEpgqqTJaRt0DK5 lXB3AWl2lnVpk6k01UF6swop7wIclUp449mv9sfmFDM2alLbrdGl+9Ev5tSY7gdff4C50j2MxKc Fhy8wRupxJEc3HZ9fC2DEpEF3g26xFWI= X-Gm-Gg: ASbGnctInNCquEara5oPgt4gwtbyyExwz5TkSKyEOLagUBSrWO7N8v9oDRfg1y3pHXs 9UnD7S6Kn+iwuyL3u8ZGf4z59ijlVrW4hegK5liPM+LxZYa5ZvRj/lNK6cbAGepx2pG7swiPzXk 4sKyacu3Xeb6pYa3FWd8BxMHHMy0Z34qMAnLKaf0iGxD6SOy3ImHG6Qk1up4+Dgod6FPqmZpC+C PahzboAKC0qzk1d8WY4ZFvio8PysJZTmKTPNb69DtoLE0NQvC2rjx0VqVdbJGnP3dQs9Orl X-Google-Smtp-Source: AGHT+IEcajqAWKuQQIqwNkPyCEKrX+EjnwvxpBYJGARM5MQc9sKCt2teNAEtuE8dFaCxUThz59atqE/YZYmd5T4fJtI= X-Received: by 2002:a05:690c:5c01:b0:786:4fd5:e5d9 with SMTP id 00721157ae682-7864fd5f625mr1018807b3.7.1761884691873; Thu, 30 Oct 2025 21:24:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shubhang Joshi Date: Fri, 31 Oct 2025 09:54:40 +0530 X-Gm-Features: AWmQ_bniVF6YUMuKjYeVnoyxRFZWCnJfu9xOtyw9Ug_ivi6CZHhs4PSRsPHJhck Message-ID: Subject: Re: WAL replay is too slow on secondary server To: Laurenz Albe Cc: OMPRAKASH SAHU , pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000031c24206426cbf2a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000031c24206426cbf2a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi OM, Hi Laurenz, Thank you for your insights. I apologize for my previous suggestion regarding network speed; upon further review, it was not the correct cause in this scenario. Based on the current observations and system metrics, the accumulation of WAL on the standby server points to disk I/O limitations during replay=E2= =80=94not network speed. CPU and RAM usage remain low, and WAL traffic is reaching the replica without delay, but replay/apply on disk is slow. The root cause appears to be disk subsystem performance and the single-threaded nature of WAL replay in PostgreSQL recovery. Optimizing disk throughput or reconfiguring memory may help, but network latency does not seem to be affecting this scenario. Regards, Shubhang On Thu, 30 Oct 2025 at 17:45, Laurenz Albe wrote= : > On Thu, 2025-10-30 at 17:08 +0530, Shubhang Joshi wrote: > > On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, > wrote: > > > We have a postgresql cluster setup using patroni. > > > The DB is being used for heavy transactional application, now the > problem is that on replica server WAL replay is too slow. > > > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS > volume of wal directory and 10k &800 on data directory. > > > > > > but the WAL is being accumulated on the replica as usual and applying > wal is having no improvement. > > > > Please check the network speed =E2=80=94 we faced a similar issue earli= er, and > it turned out to be related to network performance. > > Kindly verify the network latency with your network team as well. > > If WAL is piling up on the standby, how can network speed be the problem? > > Yours, > Laurenz Albe > --00000000000031c24206426cbf2a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi OM,
Hi Laurenz,

Thank you for your insights.

I apologize for my previous suggestion regardi= ng network speed; upon further review, it was not the correct cause in this= scenario.

Based on the current obs= ervations and system metrics, the accumulation of WAL on the standby server= points to disk I/O limitations during replay=E2=80=94not network speed. CP= U and RAM usage remain low, and WAL traffic is reaching the replica without= delay, but replay/apply on disk is slow.

The root cause appears to be disk subsystem performance and the sing= le-threaded nature of WAL replay in PostgreSQL recovery. Optimizing disk th= roughput or reconfiguring memory may help, but network latency does not see= m to be affecting this scenario.

Re= gards,
Shubhang


<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, 30 Oct 2025 at 17:45, Laurenz = Albe <laurenz.albe@cybertec.= at> wrote:
sahuop2121@gmail.com>= wrote:
> > We have a postgresql cluster setup using patroni.
> > The DB is being used for heavy transactional application, now the= problem=C2=A0is that on replica server WAL replay is too slow.
> > We have increased the IOPS to 6k and Throughput to 600 on nvme EB= S volume of wal directory and 10k &800 on data directory.
> >
> > but the WAL is being accumulated on the replica as usual and appl= ying wal is having no improvement.
>
> Please check the network speed =E2=80=94 we faced a similar issue earl= ier, and it turned out to be related to network performance.
> Kindly verify the network latency with your network team as well.

If WAL is piling up on the standby, how can network speed be the problem?
Yours,
Laurenz Albe
--00000000000031c24206426cbf2a--