Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEKPY-00ALKe-U9 for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 04:37:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEKPW-005x87-7u for pgsql-admin@arkaria.postgresql.org; Thu, 30 Oct 2025 04:37:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEKPV-005x7z-Rk for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 04:37:16 +0000 Received: from mail-ed1-x529.google.com ([2a00:1450:4864:20::529]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEKPS-0050Pt-1t for pgsql-admin@lists.postgresql.org; Thu, 30 Oct 2025 04:37:16 +0000 Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-63c12ff0c5eso1561635a12.0 for ; Wed, 29 Oct 2025 21:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761799033; x=1762403833; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=9f0kUqSmZsh8bmM/tc+YWcinIVjddXwmXHKsAvXW0RI=; b=arRsg9zBJOXgysbuPdojgAenOm/9EM+wVC2/3R+8P3yAa+hyp8AzVBNc4zAs1NcKle 5fqPfSXuTWSnrbUmFyvq/a7Gcc8e3e5vF8H3Z4HA0BneuSEHb2FdOBEHNelRERh5V38X 2ylHMa+UwDZFUElCj7YV2JiFv6E4l+9MKi5XCyFU27bEZDCehXPm3lZ2Y+TDleO8HjN3 6/2EGjQK1sEcPG5yR5nvTwDVRu2j6cnMLGw/uJGPBv61SzoKGld9jtLIm7Nq09SjweTG ECH1TUu+yhUDqcYej3C8jqVato5VN623HL9szDhIettv/6W9FjxTHeYJJeOvEINJ+Tnz Cuow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761799033; x=1762403833; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9f0kUqSmZsh8bmM/tc+YWcinIVjddXwmXHKsAvXW0RI=; b=MRo7f479H4H9VHAsZZ4RCsTp6sc3gSZ7nooa3YfOafKOFqu5gdsfOKQoDRFD2H5ncy Bfk7gXIGcUZUSHEWNapuv/Os/JL/E1tVOv12tXDWrtC6HmF/5DB4em/naviKaPAAaysH 0eLsuj+gBYk1q/Mpbm7TnT0tb8pJH8ix6mHuOSyXvHlDLdLc5tMtD/S8pODFaCVOLtL0 Uzfzcd2GMDAxH2VaPK9jlgUNEyfEKRnhu5lVsb7uEcFDkH8c6lvKN8Ahd8X/0IQSL5OZ q2Yr4T7ZldmA1rKUT8FUnVD4/2shoC6LG8RMOam33T8mernoG1DPyduFKZ/L/yAtb88w KZEA== X-Gm-Message-State: AOJu0YwIxRQLBAaiColR/Gwe3uxHyfdmZLfrwS9m+NQCAtyVPcckWccl C5FhlGoIKcj8YTZifx3A7JPxf5QtCtq7kF1pzmQzqOXOcm4H5eA4vYYJjZoSSh4khRFMcFwDMiN EFLIKom6R1JYEYmnmeHjdX1AiSLareiG3vq+T X-Gm-Gg: ASbGnctvomDiK3EISgO4CtxP4Qv8pJRpCDFbwGkksKiJs/PwXXxiO/QKxg2M7NDYIg0 FEEDQKFLJnCyZccrYAdwMA2HxMuI0LbWvyJwOpwfm2ZKOXPc6HFe+N/BV4hIah1b+3MztRGeb+K XwACr6m2gCKrkXSOVVmK+cHWer1otBzEeA/6g4KbK7h57TNqGeCdp9u8Wm0GH8qwL3yNDoGb1js 6blET8EIqu1jbC/eP2sFuy1TOVX503tu0VS3PUDBsnz1PUvTiaFlICYIx8= X-Google-Smtp-Source: AGHT+IHWxD8bOR8UamN/wy2qe3mjXWzHs+9arGlM6CxSX/PPSkizpKN6lUh0DNqT8yxQcQX5jxoVWJTc7kpCMPz942Q= X-Received: by 2002:a05:6402:27d1:b0:63b:ec3c:ee32 with SMTP id 4fb4d7f45d1cf-6405efb0cf0mr1531529a12.11.1761799032857; Wed, 29 Oct 2025 21:37:12 -0700 (PDT) MIME-Version: 1.0 From: OMPRAKASH SAHU Date: Thu, 30 Oct 2025 10:06:56 +0530 X-Gm-Features: AWmQ_bm8TQWh9AaoAV8DwciFwG9xiZPnFwIotG8QHCOOsqD8imVGxwAHfO0vSvI Message-ID: Subject: WAL replay is too slow on secondary server To: pgsql-admin@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000084e8a6064258cd8a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000084e8a6064258cd8a Content-Type: text/plain; charset="UTF-8" Hi Team, Greetings!! We have a postgresql cluster setup using patroni. The DB is being used for heavy transactional application, now the problem is that on replica server WAL replay is too slow. We have increased the IOPS to 6k and Throughput to 600 on nvme EBS volume of wal directory and 10k &800 on data directory. but the WAL is being accumulated on the replica as usual and applying wal is having no improvement. changed the maintenance_io_concurrency on replica to 32. CPU utilization max=20% , RAM utilization is also max 20. see the below postgres logs that shows around 2hrs lag *tail -f /var/log/postgresql/postgresql.log2025-10-30 09:02:08 IST [27125]: user=,db=,app=,client=LOG: recovery restart point at 5B65/F1DAFA202025-10-30 09:02:08 IST [27125]: user=,db=,app=,client=DETAIL: Last completed transaction was at log time 2025-10-30 07:16:40.115131+05:30.2025-10-30 09:08:23 IST [27125]: user=,db=,app=,client=LOG: restartpoint starting: time2025-10-30 09:12:53 IST [27125]: user=,db=,app=,client=LOG: restartpoint complete: wrote 44067 buffers (2.1%); 1 WAL file(s) added, 73 removed, 0 recycled; write=269.362 s, sync=0.042 s, total=269.633 s; sync files=142, longest=0.005 s, average=0.001 s; distance=1197052 kB, estimate=1587336 kB; lsn=5B66/6C3082F8, redo lsn=5B66/3AEAEA602025-10-30 09:12:53 IST [27125]: user=,db=,app=,client=LOG: recovery restart point at 5B66/3AEAEA602025-10-30 09:12:53 IST [27125]: user=,db=,app=,client=DETAIL: Last completed transaction was at log time 2025-10-30 07:21:47.56674+05:30.* recovery_prefetch output: postgres=# select * from pg_stat_recovery_prefetch; stats_reset | prefetch | hit | skip_init | skip_new | skip_fpw | skip_rep | wal_distance | block_distance | io_depth ----------------------------------+----------+-----------+-----------+----------+----------+-----------+--------------+----------------+---------- 2025-10-29 23:02:21.396179+05:30 | 182762 | 251000856 | 3841721 | 1100099 | 3520777 | 137392573 | 8984 | 80 | 0 (1 row) I would request your thoughts and suggestions if we can get rid of this slowness and get some speed. Regards, OM --00000000000084e8a6064258cd8a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Team,

Greetings!!

We have a postgresql cluster setup using patroni.
=
The DB is being used for heavy transactional application, now the prob= lem=C2=A0is that on replica server WAL replay is too slow.
We hav= e increased the IOPS to 6k and Throughput to 600 on nvme EBS volume of wal = directory and 10k &800 on data directory.

but = the WAL is being accumulated on the replica as usual and applying wal is ha= ving no improvement.
changed=C2=A0the maintenance_io_concurrency = on replica to 32.
CPU utilization max=3D20% , RAM utilization is = also max 20.

see the below postgres logs that show= s around 2hrs lag

tail -f /var/log/postgresql/p= ostgresql.log
2025-10-30 09:02:08 IST [27125]: user=3D,db=3D,app=3D,clie= nt=3DLOG: =C2=A0recovery restart point at 5B65/F1DAFA20
2025-10-30 09:02= :08 IST [27125]: user=3D,db=3D,app=3D,client=3DDETAIL: =C2=A0Last completed= transaction was at log time 2025-10-30 07:16:40.115131+05:30.
2025-10-3= 0 09:08:23 IST [27125]: user=3D,db=3D,app=3D,client=3DLOG: =C2=A0restartpoi= nt starting: time
2025-10-30 09:12:53 IST [27125]: user=3D,db=3D,app=3D,= client=3DLOG: =C2=A0restartpoint complete: wrote 44067 buffers (2.1%); 1 WA= L file(s) added, 73 removed, 0 recycled; write=3D269.362 s, sync=3D0.042 s,= total=3D269.633 s; sync files=3D142, longest=3D0.005 s, average=3D0.001 s;= distance=3D1197052 kB, estimate=3D1587336 kB; lsn=3D5B66/6C3082F8, redo ls= n=3D5B66/3AEAEA60
2025-10-30 09:12:53 IST [27125]: user=3D,db=3D,app=3D,= client=3DLOG: =C2=A0recovery restart point at 5B66/3AEAEA60
2025-10-30 0= 9:12:53 IST [27125]: user=3D,db=3D,app=3D,client=3DDETAIL: =C2=A0Last compl= eted transaction was at log time 2025-10-30 07:21:47.56674+05:30.


recovery_prefetch output:

postgres=3D# select * from pg_stat_recovery_prefetch;
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0stats_reset =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0| prefetch | =C2=A0 =C2=A0hit =C2=A0 =C2=A0| skip_init | skip_new= | skip_fpw | skip_rep =C2=A0| wal_distance | block_distance | io_depth
= ----------------------------------+----------+-----------+-----------+-----= -----+----------+-----------+--------------+----------------+----------
= =C2=A02025-10-29 23:02:21.396179+05:30 | =C2=A0 182762 | 251000856 | =C2=A0= 3841721 | =C2=A01100099 | =C2=A03520777 | 137392573 | =C2=A0 =C2=A0 =C2=A0= =C2=A0 8984 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 80 | =C2=A0 =C2=A0= =C2=A0 =C2=A00
(1 row)

I would request your th= oughts and suggestions if we can get rid of this slowness and get some spee= d.

Regards,
OM
--00000000000084e8a6064258cd8a--