MIME-Version: 1.0
From: Ramakrishna m <ram.pgdb@gmail.com>
Date: Sun, 22 Sep 2024 00:38:02 +0530
Message-ID: <CAG-eXHKr5pFzCKNqVhjQAUM0oxxCe8YUfjoLT8WmZx54Sk48kg@mail.gmail.com>
Subject: Logical Replication Delay
To: pgsql-general@lists.postgresql.org
Cc: ravisql09@gmail.com
Content-Type: multipart/alternative; boundary="0000000000009a61090622a5e0c3"
Archived-At: <https://www.postgresql.org/message-id/CAG-eXHKr5pFzCKNqVhjQAUM0oxxCe8YUfjoLT8WmZx54Sk48kg%40mail.gmail.com>
Precedence: bulk

--0000000000009a61090622a5e0c3
Content-Type: text/plain; charset="UTF-8"

Hi Team,

We have configured bidirectional replication (but traffic can only flow in
one direction) between two data centers (distance: 1000 km, maximum Network
latency: 100 ms) with an application TPS (transactions per second) of 700
at maximum.

We are fine with handling up to 500 TPS without observing any lag between
the two data centers. However, when TPS increases, we notice a lag in WAL
files of over 100 GB (initially, it was 1 TB, but after tuning, it was
reduced to 100 GB). During peak times, WAL files are generated at a rate of
4 GB per minute.

All transactions (Tx) take less than 200 ms, with a maximum of 1 second at
times (no long-running transactions).

*Here are the configured parameters and resources:*

   - *OS*: Ubuntu
   - *RAM*: 376 GB
   - *CPU*: 64 cores
   - *Swap*: 32 GB
   - *PostgreSQL Version*: 16.4 (each side has 3 nodes with Patroni and
   etcd configured)
   - *DB Size*: 15 TB

*Parameters configured on both sides:*
Name                                                              Setting
       Unit


log_replication_commands off
logical_decoding_work_mem 524288 kB
max_logical_replication_workers 16
max_parallel_apply_workers_per_subscription  2
max_replication_slots 20
max_sync_workers_per_subscription 2
max_wal_senders 20
max_worker_processes 40
wal_level logical
wal_receiver_timeout 600000 ms
wal_segment_size 1073741824 B
wal_sender_timeout 600000 ms

*Optimizations applied:*

   1. Vacuum freeze is managed during off-hours; no aggressive vacuum is
   triggered during business hours.
   2. Converted a few tables to unlogged.
   3. Removed unwanted tables from publication.
   4. Partitioned all large tables.

*Pending:*

   1. Turning off/tuning autovacuum parameters to avoid triggering during
   business hours.

*Not possible: *We are running all tables in a single publication, and it
is not possible to separate them.

I would greatly appreciate any suggestions you may have to help avoid
logical replication delays, whether through tuning database or operating
system parameters, or any other recommendations

-- 
Thanks & Regards,
Ram.

--0000000000009a61090622a5e0c3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><p><font color=3D"#000000" style=3D"background-color:rgb(2=
55,255,255)" face=3D"arial, sans-serif">Hi Team,</font></p><p><font color=
=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, san=
s-serif">We have configured bidirectional replication (but traffic can only=
 flow in one direction) between two data centers (distance: 1000 km, maximu=
m Network latency: 100 ms) with an application TPS (transactions per second=
) of 700 at maximum.</font></p><p><font color=3D"#000000" style=3D"backgrou=
nd-color:rgb(255,255,255)" face=3D"arial, sans-serif">We are fine with hand=
ling up to 500 TPS without observing any lag between the two data centers. =
However, when TPS increases, we notice a lag in WAL files of over 100 GB (i=
nitially, it was 1 TB, but after tuning, it was reduced to 100 GB). During =
peak times, WAL files are generated at a rate of 4 GB per minute.</font></p=
><p><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" fac=
e=3D"arial, sans-serif">All transactions (Tx) take less than 200 ms, with a=
 maximum of 1 second at times (no long-running transactions).</font></p><p>=
<strong><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)"=
 face=3D"arial, sans-serif">Here are the configured parameters and resource=
s:</font></strong></p><ul><li><font color=3D"#000000" style=3D"background-c=
olor:rgb(255,255,255)" face=3D"arial, sans-serif"><strong>OS</strong>: Ubun=
tu</font></li><li><font color=3D"#000000" style=3D"background-color:rgb(255=
,255,255)" face=3D"arial, sans-serif"><strong>RAM</strong>: 376 GB</font></=
li><li><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" =
face=3D"arial, sans-serif"><strong>CPU</strong>: 64 cores</font></li><li><f=
ont color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"a=
rial, sans-serif"><strong>Swap</strong>: 32 GB</font></li><li><font color=
=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, san=
s-serif"><strong>PostgreSQL Version</strong>: 16.4 (each side has 3 nodes w=
ith Patroni and etcd configured)</font></li><li><font color=3D"#000000" sty=
le=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif"><strong=
>DB Size</strong>: 15 TB</font></li></ul><p><strong><font color=3D"#000000"=
 style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">Par=
ameters configured on both sides:</font></strong></p><table><thead><tr><th>=
<font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D=
"arial, sans-serif">Name</font></th><th><font color=3D"#000000" style=3D"ba=
ckground-color:rgb(255,255,255)" face=3D"arial, sans-serif">=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Setting=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0</font></th><th><font color=3D"#000000" style=3D"ba=
ckground-color:rgb(255,255,255)" face=3D"arial, sans-serif">Unit</font></th=
></tr></thead></table><table><thead><tr><th><font color=3D"#000000" style=
=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif"><br></fon=
t></th><th></th><th><font color=3D"#000000" style=3D"background-color:rgb(2=
55,255,255)" face=3D"arial, sans-serif"><br></font></th></tr></thead><tbody=
><tr><td><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)=
" face=3D"arial, sans-serif">log_replication_commands</font></td><td><font =
color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial=
, sans-serif">off</font></td><td></td></tr><tr><td><font color=3D"#000000" =
style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">logi=
cal_decoding_work_mem</font></td><td><font color=3D"#000000" style=3D"backg=
round-color:rgb(255,255,255)" face=3D"arial, sans-serif">524288</font></td>=
<td><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" fac=
e=3D"arial, sans-serif">kB</font></td></tr><tr><td><font color=3D"#000000" =
style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">max_=
logical_replication_workers</font></td><td><font color=3D"#000000" style=3D=
"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">16</font></t=
d><td></td></tr><tr><td><font color=3D"#000000" style=3D"background-color:r=
gb(255,255,255)" face=3D"arial, sans-serif">max_parallel_apply_workers_per_=
subscription</font></td><td><font color=3D"#000000" style=3D"background-col=
or:rgb(255,255,255)" face=3D"arial, sans-serif">=C2=A02</font></td><td></td=
></tr><tr><td><font color=3D"#000000" style=3D"background-color:rgb(255,255=
,255)" face=3D"arial, sans-serif">max_replication_slots</font></td><td><fon=
t color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"ari=
al, sans-serif">20</font></td><td></td></tr><tr><td><font color=3D"#000000"=
 style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">max=
_sync_workers_per_subscription</font></td><td><font color=3D"#000000" style=
=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">2</font><=
/td><td></td></tr><tr><td><font color=3D"#000000" style=3D"background-color=
:rgb(255,255,255)" face=3D"arial, sans-serif">max_wal_senders</font></td><t=
d><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=
=3D"arial, sans-serif">20</font></td><td></td></tr><tr><td><font color=3D"#=
000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-ser=
if">max_worker_processes</font></td><td><font color=3D"#000000" style=3D"ba=
ckground-color:rgb(255,255,255)" face=3D"arial, sans-serif">40</font></td><=
td></td></tr><tr><td><font color=3D"#000000" style=3D"background-color:rgb(=
255,255,255)" face=3D"arial, sans-serif">wal_level</font></td><td><font col=
or=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, s=
ans-serif">logical</font></td><td></td></tr><tr><td><font color=3D"#000000"=
 style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">wal=
_receiver_timeout</font></td><td><font color=3D"#000000" style=3D"backgroun=
d-color:rgb(255,255,255)" face=3D"arial, sans-serif">600000</font></td><td>=
<font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D=
"arial, sans-serif">ms</font></td></tr><tr><td><font color=3D"#000000" styl=
e=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">wal_segm=
ent_size</font></td><td><font color=3D"#000000" style=3D"background-color:r=
gb(255,255,255)" face=3D"arial, sans-serif">1073741824</font></td><td><font=
 color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"aria=
l, sans-serif">B</font></td></tr><tr><td><font color=3D"#000000" style=3D"b=
ackground-color:rgb(255,255,255)" face=3D"arial, sans-serif">wal_sender_tim=
eout</font></td><td><font color=3D"#000000" style=3D"background-color:rgb(2=
55,255,255)" face=3D"arial, sans-serif">600000</font></td><td><font color=
=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, san=
s-serif">ms</font></td></tr></tbody></table><p><strong><font color=3D"#0000=
00" style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">=
Optimizations applied:</font></strong></p><ol><li><font color=3D"#000000" s=
tyle=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif">Vacuu=
m freeze is managed during off-hours; no aggressive vacuum is triggered dur=
ing business hours.</font></li><li><font color=3D"#000000" style=3D"backgro=
und-color:rgb(255,255,255)" face=3D"arial, sans-serif">Converted a few tabl=
es to unlogged.</font></li><li><font color=3D"#000000" style=3D"background-=
color:rgb(255,255,255)" face=3D"arial, sans-serif">Removed unwanted tables =
from publication.</font></li><li><font color=3D"#000000" style=3D"backgroun=
d-color:rgb(255,255,255)" face=3D"arial, sans-serif">Partitioned all large =
tables.</font></li></ol><p><strong><font color=3D"#000000" style=3D"backgro=
und-color:rgb(255,255,255)" face=3D"arial, sans-serif">Pending:</font></str=
ong></p><ol><li><font color=3D"#000000" style=3D"background-color:rgb(255,2=
55,255)" face=3D"arial, sans-serif">Turning off/tuning autovacuum parameter=
s to avoid triggering during business hours.</font></li></ol><p><font color=
=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, san=
s-serif"><strong>Not possible:=C2=A0</strong>We are running all tables in a=
 single publication, and it is not possible to separate them.</font></p><p>=
<font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=3D=
"arial, sans-serif">

I would greatly appreciate any suggestions you may have to help avoid logic=
al replication delays, whether through tuning database or operating system =
parameters, or any other recommendations

</font></p><div><font color=3D"#000000" style=3D"background-color:rgb(255,2=
55,255)" face=3D"arial, sans-serif"><br></font></div><font color=3D"#000000=
" style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif"><s=
pan class=3D"gmail_signature_prefix">-- </span><br></font><div dir=3D"ltr" =
class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"lt=
r"><font color=3D"#000000" style=3D"background-color:rgb(255,255,255)" face=
=3D"arial, sans-serif">Thanks &amp; Regards,</font><div><font color=3D"#000=
000" style=3D"background-color:rgb(255,255,255)" face=3D"arial, sans-serif"=
>Ram.</font><br></div></div></div></div>

--0000000000009a61090622a5e0c3--