MIME-Version: 1.0
References: <CAJ4rSwspyYBppQS701C=BVFX2yrP1JErgDCZXbC15BQh2zAS1Q@mail.gmail.com>
 <CAKAnmmK47O_rYDB4iPe=K3QiqzJcCnnxCoYs_40Q1YWb=x0nDg@mail.gmail.com>
 <CAJ4rSwvsRJhwfuem0vsws1WwRZPN8Ju2Nw0e0Oc2j7n=MXUhMg@mail.gmail.com>
 <CAKAnmmK9Fjb5YngH4BSb8Lmk3qh10nw=1M3xdcLatr9-U-f-Ug@mail.gmail.com>
 <CA+bJJbwSZg6fiQ78N-_2NwRBbp=6e5WjnSX9SPa_DOxnaU63Vg@mail.gmail.com>
 <CAJ4rSwstZoVgVjbHeDNVq+7eBWCVZSXjNMRpzB4QFjArZT0Hcg@mail.gmail.com> <942b979a-c5e8-40e7-bdec-0234a0e5a010@aklaver.com>
In-Reply-To: <942b979a-c5e8-40e7-bdec-0234a0e5a010@aklaver.com>
From: Bala M <krishna.pgdba@gmail.com>
Date: Thu, 6 Nov 2025 22:34:20 +0530
Message-ID: <CAJ4rSwuMDcsvXNfxBefWfDknoJMkdZoDmOJ_8pmo8ut_h_V57g@mail.gmail.com>
Subject: Re: Index corruption issue after migration from RHEL 7 to RHEL 9
 (PostgreSQL 11 streaming replication)
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: Greg Sabino Mullane <htamfids@gmail.com>, Francisco Olarte <folarte@peoplecall.com>, chris+google@qwirx.com, 
	pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000024c9f0642f00f55"
Archived-At: <https://www.postgresql.org/message-id/CAJ4rSwuMDcsvXNfxBefWfDknoJMkdZoDmOJ_8pmo8ut_h_V57g%40mail.gmail.com>
Precedence: bulk

--000000000000024c9f0642f00f55
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi  Adrian, Thank you for your response. Please find the requested details
below:

*PostgreSQL Version:*

Source: PostgreSQL 11.15

Target: PostgreSQL 16.9

*Operating System:*

Source: RHEL 7.9

Target: RHEL 9.6

*Network Distance:*

Both servers are in the same data center, connected through a high-speed
internal network (low latency).

Logical Replication Settings:

*Source - Postgres 11.15.*

-- =3D=3D=3D=3D WAL & Replication Settings =3D=3D=3D=3D

 wal_level =3D 'logical'
 max_wal_senders =3D '30'
 max_replication_slots =3D '20'
 wal_keep_segments =3D '800'
 wal_sender_timeout =3D '300s'
 max_worker_processes =3D '32'
 max_logical_replication_workers =3D '16'
 max_sync_workers_per_subscription =3D '8'

 =3D=3D=3D=3D WAL & Checkpoint  =3D=3D=3D=3D

 max_wal_size =3D '40GB'
 min_wal_size =3D '4GB'
 checkpoint_timeout =3D '45min'
 checkpoint_completion_target =3D '0.9'

 =3D=3D=3D=3D  Memory =3D=3D=3D=3D
 shared_buffers =3D '18GB'
 work_mem =3D '128MB'
 maintenance_work_mem =3D 4GB'
 effective_cache_size =3D '275GB'


*Target DB Postgres 16.10*

 =3D=3D=3D=3D Logical Replication Settings  =3D=3D=3D=3D
 max_worker_processes =3D '32'
 max_logical_replication_workers =3D '16'
 max_sync_workers_per_subscription =3D '8'
 wal_receiver_timeout =3D '300s'

 =3D=3D=3D=3D WAL & Checkpoint  =3D=3D=3D=3D

 checkpoint_timeout =3D '45min'
 checkpoint_completion_target =3D '0.9'
 max_wal_size =3D '40GB'
 min_wal_size =3D '4GB'

 =3D=3D=3D=3D  Memory  =3D=3D=3D=3D
 shared_buffers =3D '18GB'
 work_mem =3D '128MB'
 maintenance_work_mem =3D '4GB'
 effective_cache_size =3D '275GB'
 synchronous_commit =3D 'off'


Since you have already started is that not already to late for this?

Yes We are currently in the *testing phase* and validating with the above
parameters. However, the replication process has been *extremely slow =E2=
=80=94
it=E2=80=99s been running for the past 5 days* with limited progress.

Any specific tuning recommendations or best practices to improve
performance at this stage would be greatly appreciated.

Thanks & Regards
Krishna.


On Wed, 5 Nov 2025 at 21:07, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

> On 11/4/25 22:27, Bala M wrote:
> > Thank you all for your suggestions,
> >
> > Thanks for your quick response and for sharing the details.
> > After reviewing the options, the logical replication approach seems to
> > be the most feasible one with minimal downtime.
> >
> > However, we currently have 7 streaming replication setups running from
> > production, with a total database size of around 15 TB. Out of this,
> > there are about 10 large tables ranging from 1 TB (max) to 50 GB (min)
> > each, along with approximately 150+ sequences.
> >
> > Could you please confirm if there are any successful case studies or
> > benchmarks available for a similar setup?
>
> Since you have given minimal information in this post, I doubt there is
> really a way to compare to other situations. Collect the details you
> provided earlier in the thread for those folks getting to it just now.
>
> That would be:
>
> 1) Postgres versions on both ends
>
> 2) OS and versions on both ends.
>
> 3) Network distance between 'machines'.
>
> 4) The logical replication settings.
>
> > Additionally, please share any recommended parameter tuning or best
> > practices for handling logical replication at this scale.
>
> Since you have already started is that not already to late for this?
>
>
>
> >
> > Current server configuration:
> >
> > CPU: 144 cores
> >
> > RAM: 512 GB
> >
> >
> > Thanks & Regards
> > Krishna.
> >
>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>

--000000000000024c9f0642f00f55
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><br></div><div><font face=3D"arial, sans-serif">Hi=C2=
=A0 Adrian, Thank you for your response. Please find the requested details =
below:<br><br><b>PostgreSQL Version:</b><br><br>Source: PostgreSQL 11.15<br=
><br>Target: PostgreSQL 16.9<br><br><b>Operating System:</b><br><br>Source:=
 RHEL 7.9<br><br>Target: RHEL 9.6<br><br><b>Network Distance:</b><br><br>Bo=
th servers are in the same data center, connected through a high-speed inte=
rnal network (low latency).<br><br>Logical Replication Settings:<br><br><b>=
Source - Postgres 11.15.</b><br><br>-- =3D=3D=3D=3D WAL &amp; Replication S=
ettings =3D=3D=3D=3D<br><br>=C2=A0wal_level =3D &#39;logical&#39;<br>=C2=A0=
max_wal_senders =3D &#39;30&#39;<br>=C2=A0max_replication_slots =3D &#39;20=
&#39;<br>=C2=A0wal_keep_segments =3D &#39;800&#39; =C2=A0 =C2=A0 <br>=C2=A0=
wal_sender_timeout =3D &#39;300s&#39;<br>=C2=A0max_worker_processes =3D =
9;32&#39;<br>=C2=A0max_logical_replication_workers =3D &#39;16&#39;<br>=C2=
=A0max_sync_workers_per_subscription =3D &#39;8&#39;<br><br>=C2=A0=3D=3D=3D=
=3D WAL &amp; Checkpoint =C2=A0=3D=3D=3D=3D<br><br>=C2=A0max_wal_size =3D &=
#39;40GB&#39;<br>=C2=A0min_wal_size =3D &#39;4GB&#39;<br>=C2=A0checkpoint_t=
imeout =3D &#39;45min&#39;<br>=C2=A0checkpoint_completion_target =3D &#39;0=
.9&#39;<br><br>=C2=A0=3D=3D=3D=3D =C2=A0Memory =3D=3D=3D=3D<br>=C2=A0shared=
_buffers =3D &#39;18GB&#39; =C2=A0 =C2=A0 =C2=A0 =C2=A0 <br>=C2=A0work_mem =
=3D &#39;128MB&#39;<br>=C2=A0maintenance_work_mem =3D 4GB&#39;<br>=C2=A0eff=
ective_cache_size =3D &#39;275GB&#39; =C2=A0 =C2=A0 <br><br><br><b>Target D=
B Postgres 16.10</b><br><br>=C2=A0=3D=3D=3D=3D Logical Replication Settings=
 =C2=A0=3D=3D=3D=3D<br>=C2=A0max_worker_processes =3D &#39;32&#39;<br>=C2=
=A0max_logical_replication_workers =3D &#39;16&#39;<br>=C2=A0max_sync_worke=
rs_per_subscription =3D &#39;8&#39;<br>=C2=A0wal_receiver_timeout =3D &#39;=
300s&#39;<br>=C2=A0<br>=C2=A0=3D=3D=3D=3D WAL &amp; Checkpoint =C2=A0=3D=3D=
=3D=3D<br><br>=C2=A0checkpoint_timeout =3D &#39;45min&#39;<br>=C2=A0checkpo=
int_completion_target =3D &#39;0.9&#39;<br>=C2=A0max_wal_size =3D &#39;40GB=
&#39;<br>=C2=A0min_wal_size =3D &#39;4GB&#39;<br><br>=C2=A0=3D=3D=3D=3D =C2=
=A0Memory =C2=A0=3D=3D=3D=3D<br>=C2=A0shared_buffers =3D &#39;18GB&#39; =C2=
=A0 <br>=C2=A0work_mem =3D &#39;128MB&#39;<br>=C2=A0maintenance_work_mem =
=3D &#39;4GB&#39;<br>=C2=A0effective_cache_size =3D &#39;275GB&#39;<br>=C2=
=A0synchronous_commit =3D &#39;off&#39;=C2=A0</font></div><div><font face=
=3D"arial, sans-serif"><br></font></div><div><div dir=3D"ltr" class=3D"gmai=
l_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><font face=
=3D"arial, sans-serif"><br></font></div><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pad=
ding-left:1ex"><font face=3D"arial, sans-serif">Since you have already star=
ted is that not already to late for this?</font></blockquote><div dir=3D"lt=
r"><p><font face=3D"arial, sans-serif">Yes We are currently in the <strong>=
testing phase</strong> and validating with the above parameters. However, t=
he replication process has been <strong>extremely slow =E2=80=94 it=E2=80=
=99s been running for the past 5 days</strong> with limited progress.</font=
></p>
<p><font face=3D"arial, sans-serif">Any specific tuning recommendations or =
best practices to improve performance at this stage would be greatly apprec=
iated.</font></p><div><br></div><div>Thanks &amp; Regards</div><div>Krishna=
.</div></div></div></div><br></div><br><div class=3D"gmail_quote gmail_quot=
e_container"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, 5 Nov 2025 at 21=
:07, Adrian Klaver &lt;<a href=3D"mailto:adrian.klaver@aklaver.com">adrian.=
klaver@aklaver.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex">On 11/4/25 22:27, Bala M wrote:<br>
&gt; Thank you all for your suggestions,<br>
&gt; <br>
&gt; Thanks for your quick response and for sharing the details.<br>
&gt; After reviewing the options, the logical replication approach seems to=
 <br>
&gt; be the most feasible one with minimal downtime.<br>
&gt; <br>
&gt; However, we currently have 7 streaming replication setups running from=
 <br>
&gt; production, with a total database size of around 15 TB. Out of this, <=
br>
&gt; there are about 10 large tables ranging from 1 TB (max) to 50 GB (min)=
 <br>
&gt; each, along with approximately 150+ sequences.<br>
&gt; <br>
&gt; Could you please confirm if there are any successful case studies or <=
br>
&gt; benchmarks available for a similar setup?<br>
<br>
Since you have given minimal information in this post, I doubt there is <br=
>
really a way to compare to other situations. Collect the details you <br>
provided earlier in the thread for those folks getting to it just now.<br>
<br>
That would be:<br>
<br>
1) Postgres versions on both ends<br>
<br>
2) OS and versions on both ends.<br>
<br>
3) Network distance between &#39;machines&#39;.<br>
<br>
4) The logical replication settings.<br>
<br>
&gt; Additionally, please share any recommended parameter tuning or best <b=
r>
&gt; practices for handling logical replication at this scale.<br>
<br>
Since you have already started is that not already to late for this?<br>
<br>
<br>
<br>
&gt; <br>
&gt; Current server configuration:<br>
&gt; <br>
&gt; CPU: 144 cores<br>
&gt; <br>
&gt; RAM: 512 GB<br>
&gt; <br>
&gt; <br>
&gt; Thanks &amp; Regards<br>
&gt; Krishna.<br>
&gt; <br>
<br>
<br>
<br>
-- <br>
Adrian Klaver<br>
<a href=3D"mailto:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klave=
r@aklaver.com</a><br>
</blockquote></div>

--000000000000024c9f0642f00f55--