Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEQz5-00Bclc-AF for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 11:38:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEQz4-007sE2-4Q for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 11:38:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEQz3-007sDt-O2 for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 11:38:24 +0000 Received: from mail-yx1-xb12a.google.com ([2607:f8b0:4864:20::b12a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEQz0-0053ls-2E for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 11:38:24 +0000 Received: by mail-yx1-xb12a.google.com with SMTP id 956f58d0204a3-63e27e6ee78so85472d50.3 for ; Thu, 30 Oct 2025 04:38:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761824301; x=1762429101; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yCFYsKpywVxrZNA5iqWHo56RoalF5TEpF0eVPlflzNg=; b=bUgl+e6YhHQeNTnpZsLb/6vWCGw6js9DRnCKjNBAaI7GaMBqNqdkN+s8FPQ5gRcEoi 52VeQAAVwrHfMr89o3sLtkM4EJ3WdsiE4ZrjMF9hQbYCRYl6Nbc9Ui86WleEwgXdw4YE +7MTQxaODSre7P1MjAot35c3XITyC7x6WCoCl2DtBG+kav2XL6BIdIezvjmHQyvrXC8Q 7m8lm4B3uMAgJ1AO8QM4sUau3ZY0Q60zOmvb/nB+FXDJT5LmaQvJC1w3HB47AMHuA/wP P9jnjBd5bRoZVTcAABpPK8jmMlo+zZmWRl9u174xNKFn7LLJW8Qasm0yZ4vXacYn0YLE P1lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761824301; x=1762429101; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yCFYsKpywVxrZNA5iqWHo56RoalF5TEpF0eVPlflzNg=; b=WHgntDyw6KSWI6y5DDBuf5qX9MfWeE8SkHjzHLBCdP5iBfWK/jASjeuVJ5iv196REr VSAiVcx6AxLShjLmenOedU2/dR7pbXW2fzD6lV5OtsSSYww4aj48LT3czgcV14SoKh8s yeMdPbpdYlRY5Zp5plk0vQzRMQC89xSC+57MuJ5cw430RVXNu+5F7SClXMSVFnf3GrYq WTv72opeXjUVco2xZzS2qEl5nshCOxGEhDL1x698dMFiWpuOuQx8a3jsFhCpHHQ+aZNm y5WFtEA7c9498ElwJlqlPPxNglYQ+K430T1H5PgC8dzQxye38FErygg1yhWoI6gtfgrF 51qQ== X-Gm-Message-State: AOJu0YyXtGWxItR0nEZZqon+CQYBOIj9AvarYLmcw5kI/gWphUoiBVRt +EZgRzRESWbpSyiBh71miIQXgf44wtGksDLBlIuaISyoN2JQc9YEoEdblf48DgJExTHqTdwsrpZ EQCPfprhK9ehOCtITbv/Z6k+HZrLuGoAbEw== X-Gm-Gg: ASbGncsrTnXTYJKEuCTuGBnyn+grZQHdDZ3/t46xRdgoxnHHDbmPZmLi22MODmHO0GE 7hD0EdW2VgbYfP5Cnmxwy7CGCa99EvRE5nSTNwYxxBxkRYlMlROfHCRdI+Uu8TFicNsz1PSIGXK Ye+r8dDbLPIwzandEQL+zP7kekgUJuRM7hU65RpkixoJi7ndEUlRyfPMqnGMhNCJju7p8L8sJR3 ImBPsUa0b36vNqMo4Z5266F/NftIAJF+VNh06qYZkUOVHW4ghbKfBrZP2s4CovGAl63bOqpCthk uhplc7kwS5cLiKdLx1KEc/jzknQCGQ== X-Google-Smtp-Source: AGHT+IHovQAoxwCDCkP58pm0dY7pZuF0JDSb/aTg0A7BYXb/csrHd3osFuJbPPk9hnrP9gB+AbE4GC/NXaVTg7Zdddg= X-Received: by 2002:a05:690c:dc5:b0:73b:b57:5732 with SMTP id 00721157ae682-78628fb89c3mr34074017b3.8.1761824300388; Thu, 30 Oct 2025 04:38:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shubhang Joshi Date: Thu, 30 Oct 2025 17:08:06 +0530 X-Gm-Features: AWmQ_bkN-J00RQol34ixLautJSIaZWdGw91bZghYP6SDH9w4utauXin5A0H88YA Message-ID: Subject: Re: WAL replay is too slow on secondary server To: OMPRAKASH SAHU Cc: pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000094d41a06425eaf7b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000094d41a06425eaf7b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi OM, Please check the network speed =E2=80=94 we faced a similar issue earlier, = and it turned out to be related to network performance. Kindly verify the network latency with your network team as well. Regards, Shubhang On Thu, 30 Oct, 2025, 10:07=E2=80=AFam OMPRAKASH SAHU, wrote: > Hi Team, > > Greetings!! > > We have a postgresql cluster setup using patroni. > The DB is being used for heavy transactional application, now the > problem is that on replica server WAL replay is too slow. > We have increased the IOPS to 6k and Throughput to 600 on nvme EBS volume > of wal directory and 10k &800 on data directory. > > but the WAL is being accumulated on the replica as usual and applying wal > is having no improvement. > changed the maintenance_io_concurrency on replica to 32. > CPU utilization max=3D20% , RAM utilization is also max 20. > > see the below postgres logs that shows around 2hrs lag > > > > > > > > *tail -f /var/log/postgresql/postgresql.log2025-10-30 09:02:08 IST > [27125]: user=3D,db=3D,app=3D,client=3DLOG: recovery restart point at > 5B65/F1DAFA202025-10-30 09:02:08 IST [27125]: user=3D,db=3D,app=3D,client= =3DDETAIL: > Last completed transaction was at log time 2025-10-30 > 07:16:40.115131+05:30.2025-10-30 09:08:23 IST [27125]: > user=3D,db=3D,app=3D,client=3DLOG: restartpoint starting: time2025-10-30= 09:12:53 > IST [27125]: user=3D,db=3D,app=3D,client=3DLOG: restartpoint complete: w= rote 44067 > buffers (2.1%); 1 WAL file(s) added, 73 removed, 0 recycled; write=3D269.= 362 > s, sync=3D0.042 s, total=3D269.633 s; sync files=3D142, longest=3D0.005 s= , > average=3D0.001 s; distance=3D1197052 kB, estimate=3D1587336 kB; > lsn=3D5B66/6C3082F8, redo lsn=3D5B66/3AEAEA602025-10-30 09:12:53 IST [271= 25]: > user=3D,db=3D,app=3D,client=3DLOG: recovery restart point at > 5B66/3AEAEA602025-10-30 09:12:53 IST [27125]: user=3D,db=3D,app=3D,client= =3DDETAIL: > Last completed transaction was at log time 2025-10-30 > 07:21:47.56674+05:30.* > > recovery_prefetch output: > > postgres=3D# select * from pg_stat_recovery_prefetch; > stats_reset | prefetch | hit | skip_init | > skip_new | skip_fpw | skip_rep | wal_distance | block_distance | io_dept= h > > ----------------------------------+----------+-----------+-----------+---= -------+----------+-----------+--------------+----------------+---------- > 2025-10-29 23:02:21.396179+05:30 | 182762 | 251000856 | 3841721 | > 1100099 | 3520777 | 137392573 | 8984 | 80 | = 0 > (1 row) > > I would request your thoughts and suggestions if we can get rid of this > slowness and get some speed. > > Regards, > OM > --00000000000094d41a06425eaf7b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi OM,

Please check the network speed =E2=80=94 we faced a simi= lar issue earlier, and it turned out to be related to network performance.<= /div>
Kindly verify the network latency with your network = team as well.

Regards,
Shubhang

On Thu, 30 Oct, 2= 025, 10:07=E2=80=AFam OMPRAKASH SAHU, <sahuop2121@gmail.com> wrote:
Hi Team,

Greetings!!

We have a postgresql cluster setup using patroni.
=
The DB is being used for heavy transactional application, now the prob= lem=C2=A0is that on replica server WAL replay is too slow.
We hav= e increased the IOPS to 6k and Throughput to 600 on nvme EBS volume of wal = directory and 10k &800 on data directory.

but = the WAL is being accumulated on the replica as usual and applying wal is ha= ving no improvement.
changed=C2=A0the maintenance_io_concurrency = on replica to 32.
CPU utilization max=3D20% , RAM utilization is = also max 20.

see the below postgres logs that show= s around 2hrs lag

tail -f /var/log/postgresql/p= ostgresql.log
2025-10-30 09:02:08 IST [27125]: user=3D,db=3D,app=3D,clie= nt=3DLOG: =C2=A0recovery restart point at 5B65/F1DAFA20
2025-10-30 09:02= :08 IST [27125]: user=3D,db=3D,app=3D,client=3DDETAIL: =C2=A0Last completed= transaction was at log time 2025-10-30 07:16:40.115131+05:30.
2025-10-3= 0 09:08:23 IST [27125]: user=3D,db=3D,app=3D,client=3DLOG: =C2=A0restartpoi= nt starting: time
2025-10-30 09:12:53 IST [27125]: user=3D,db=3D,app=3D,= client=3DLOG: =C2=A0restartpoint complete: wrote 44067 buffers (2.1%); 1 WA= L file(s) added, 73 removed, 0 recycled; write=3D269.362 s, sync=3D0.042 s,= total=3D269.633 s; sync files=3D142, longest=3D0.005 s, average=3D0.001 s;= distance=3D1197052 kB, estimate=3D1587336 kB; lsn=3D5B66/6C3082F8, redo ls= n=3D5B66/3AEAEA60
2025-10-30 09:12:53 IST [27125]: user=3D,db=3D,app=3D,= client=3DLOG: =C2=A0recovery restart point at 5B66/3AEAEA60
2025-10-30 0= 9:12:53 IST [27125]: user=3D,db=3D,app=3D,client=3DDETAIL: =C2=A0Last compl= eted transaction was at log time 2025-10-30 07:21:47.56674+05:30.


recovery_prefetch output:

postgres=3D# select * from pg_stat_recovery_prefetch;
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0stats_reset =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0| prefetch | =C2=A0 =C2=A0hit =C2=A0 =C2=A0| skip_init | skip_new= | skip_fpw | skip_rep =C2=A0| wal_distance | block_distance | io_depth
= ----------------------------------+----------+-----------+-----------+-----= -----+----------+-----------+--------------+----------------+----------
= =C2=A02025-10-29 23:02:21.396179+05:30 | =C2=A0 182762 | 251000856 | =C2=A0= 3841721 | =C2=A01100099 | =C2=A03520777 | 137392573 | =C2=A0 =C2=A0 =C2=A0= =C2=A0 8984 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 80 | =C2=A0 =C2=A0= =C2=A0 =C2=A00
(1 row)

I would request your th= oughts and suggestions if we can get rid of this slowness and get some spee= d.

Regards,
OM
--00000000000094d41a06425eaf7b--