Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vHIAS-002hxR-Vw for pgsql-general@arkaria.postgresql.org; Fri, 07 Nov 2025 08:50:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vHIAR-00DBBf-MN for pgsql-general@arkaria.postgresql.org; Fri, 07 Nov 2025 08:49:59 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vH3PY-009NyP-8K for pgsql-general@lists.postgresql.org; Thu, 06 Nov 2025 17:04:36 +0000 Received: from mail-il1-x12f.google.com ([2607:f8b0:4864:20::12f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vH3PW-006LNo-0Y for pgsql-general@lists.postgresql.org; Thu, 06 Nov 2025 17:04:35 +0000 Received: by mail-il1-x12f.google.com with SMTP id e9e14a558f8ab-4330e912c51so4373485ab.3 for ; Thu, 06 Nov 2025 09:04:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762448672; x=1763053472; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=O9S9EpmlbS+5SJ1Hdy/yo21bl3EKiVgIB7siskw1a6Q=; b=QjOY8PV6rg146gB79X15J6qcr/JoDl14a8SjxrLfLqXRQ9SjBEJeupZZdgyWi/Ahx6 P8+eMEl3w0qPiH1a+hvUx4PHbnATJTD+EDMLWGUe6VIgRL2Nj9mzhTWlZNbTrCVk+vTU VNJX3tp6JEzWcGht0wsmE8oaM/hg+KiYienKvaZY0QXERxg8tIwXuOcskkyGJzTZvH8S ieejlueQo/KTUnjEq+vH3igmyqODkvjkalTK91CFbWDWo8HmkNCL/TIJM5cKmrHFaISd gknnqVgSX9uPqm/SZ+x7I1JEkkJz7AAX1Q4hQpmFjVFKsrtVdxElJ+9eBNRTxgsdY4EB J6BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762448672; x=1763053472; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=O9S9EpmlbS+5SJ1Hdy/yo21bl3EKiVgIB7siskw1a6Q=; b=IS7m59FvlyZCTqbVfnbJzTnEVQnDthXybtADFq/z0+XBeig32bB/ByFFLCfzAlHgS4 QXKRqPmw2wnLZL1SrGTi3/kdL3izEATO1N8mTqXICroqSK1uXrDgIs6iwXMIevjUiA5Q poqp1qWXpJihF6idOgcjEw//2KkdJLgoEv5V0Dz32HgzQP+C3DG3tM9QmChhsnnh5iFs e2wiwdcyuLJwfgsFlweiyb6ksD1jrmIkUbASZORtBQfA5+uEDutl4d9KyNXhFO7BnOqj 0tiorjKSQJ8Vs+7o/jDHxYSp66Bh7rYqZ++DCIaBTXPwkJYCRuEIX4owr1bGnaV4RBfR N5IQ== X-Forwarded-Encrypted: i=1; AJvYcCXrjCGtea5BM3qgRAVskIEGlBpJNESFOFosnRCcr8C7xCnPA18hnFVOCngezIC74GAg52dXAO/vTIuQopfM@lists.postgresql.org X-Gm-Message-State: AOJu0YzQtj4pSBE9LlakPMKkRGo6Vu+NmXQ4CGts0f2Ly8CPd15dhEmV gxCvgbSL/jcF31JKHjmkk1qH29ZK/qRtjilWlUPSKpk9DR2Vm1cV0yxKf9QdibAdMAOQFmf+imq KWajl6+pHBuF8g/MdWrHCpLuaRkGLtmg= X-Gm-Gg: ASbGnct9Mc4sgUmiEOcmxXuWR25tJSx+NFRL5gWZCXkJpqjJjqjM8KzlxK4OqYao6of G5SJwpwdGkWP5dVA/NQW6F9OhNfLNdfnkma3T/eDCK9FHyaOzc3fYrr6ZkYQ7QSXGHoRYcRhi2R Gx5xFrcAvGRLwBw8l/o05OfYbxiIzLQESL59HKLzJ3doK68tY0rO4GS+aFnnXImaivrRJuR/z3/ pvq6FdYXaqYq2xovEpjdwjuXUhxRPoN8kc4uAVCMN1Ps5tIM5oBwx7vFtzKzC7O6uvICdLEkvy1 uG6g0dXWzTB722ENzy8HoRS/+mmX7A== X-Google-Smtp-Source: AGHT+IHePJmNOMWdMzhA+VJFbpYCq9rfHvCb4VsTCyOs7bJbDV/kBrEK7Z5wH8GyIkePd87MDmNOs1Q34QBqjElNy+k= X-Received: by 2002:a05:6e02:2384:b0:433:5a5c:5d75 with SMTP id e9e14a558f8ab-4335f460637mr1476695ab.18.1762448671655; Thu, 06 Nov 2025 09:04:31 -0800 (PST) MIME-Version: 1.0 References: <942b979a-c5e8-40e7-bdec-0234a0e5a010@aklaver.com> In-Reply-To: <942b979a-c5e8-40e7-bdec-0234a0e5a010@aklaver.com> From: Bala M Date: Thu, 6 Nov 2025 22:34:20 +0530 X-Gm-Features: AWmQ_bnQup5hx2kXiJC-zjR3hF0J8qzF3P72h7crmI_4A7oKiCGyVBSQWUU_vx4 Message-ID: Subject: Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) To: Adrian Klaver Cc: Greg Sabino Mullane , Francisco Olarte , chris+google@qwirx.com, pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000024c9f0642f00f55" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000024c9f0642f00f55 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Adrian, Thank you for your response. Please find the requested details below: *PostgreSQL Version:* Source: PostgreSQL 11.15 Target: PostgreSQL 16.9 *Operating System:* Source: RHEL 7.9 Target: RHEL 9.6 *Network Distance:* Both servers are in the same data center, connected through a high-speed internal network (low latency). Logical Replication Settings: *Source - Postgres 11.15.* -- =3D=3D=3D=3D WAL & Replication Settings =3D=3D=3D=3D wal_level =3D 'logical' max_wal_senders =3D '30' max_replication_slots =3D '20' wal_keep_segments =3D '800' wal_sender_timeout =3D '300s' max_worker_processes =3D '32' max_logical_replication_workers =3D '16' max_sync_workers_per_subscription =3D '8' =3D=3D=3D=3D WAL & Checkpoint =3D=3D=3D=3D max_wal_size =3D '40GB' min_wal_size =3D '4GB' checkpoint_timeout =3D '45min' checkpoint_completion_target =3D '0.9' =3D=3D=3D=3D Memory =3D=3D=3D=3D shared_buffers =3D '18GB' work_mem =3D '128MB' maintenance_work_mem =3D 4GB' effective_cache_size =3D '275GB' *Target DB Postgres 16.10* =3D=3D=3D=3D Logical Replication Settings =3D=3D=3D=3D max_worker_processes =3D '32' max_logical_replication_workers =3D '16' max_sync_workers_per_subscription =3D '8' wal_receiver_timeout =3D '300s' =3D=3D=3D=3D WAL & Checkpoint =3D=3D=3D=3D checkpoint_timeout =3D '45min' checkpoint_completion_target =3D '0.9' max_wal_size =3D '40GB' min_wal_size =3D '4GB' =3D=3D=3D=3D Memory =3D=3D=3D=3D shared_buffers =3D '18GB' work_mem =3D '128MB' maintenance_work_mem =3D '4GB' effective_cache_size =3D '275GB' synchronous_commit =3D 'off' Since you have already started is that not already to late for this? Yes We are currently in the *testing phase* and validating with the above parameters. However, the replication process has been *extremely slow =E2= =80=94 it=E2=80=99s been running for the past 5 days* with limited progress. Any specific tuning recommendations or best practices to improve performance at this stage would be greatly appreciated. Thanks & Regards Krishna. On Wed, 5 Nov 2025 at 21:07, Adrian Klaver wrote: > On 11/4/25 22:27, Bala M wrote: > > Thank you all for your suggestions, > > > > Thanks for your quick response and for sharing the details. > > After reviewing the options, the logical replication approach seems to > > be the most feasible one with minimal downtime. > > > > However, we currently have 7 streaming replication setups running from > > production, with a total database size of around 15 TB. Out of this, > > there are about 10 large tables ranging from 1 TB (max) to 50 GB (min) > > each, along with approximately 150+ sequences. > > > > Could you please confirm if there are any successful case studies or > > benchmarks available for a similar setup? > > Since you have given minimal information in this post, I doubt there is > really a way to compare to other situations. Collect the details you > provided earlier in the thread for those folks getting to it just now. > > That would be: > > 1) Postgres versions on both ends > > 2) OS and versions on both ends. > > 3) Network distance between 'machines'. > > 4) The logical replication settings. > > > Additionally, please share any recommended parameter tuning or best > > practices for handling logical replication at this scale. > > Since you have already started is that not already to late for this? > > > > > > > Current server configuration: > > > > CPU: 144 cores > > > > RAM: 512 GB > > > > > > Thanks & Regards > > Krishna. > > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > --000000000000024c9f0642f00f55 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi=C2= =A0 Adrian, Thank you for your response. Please find the requested details = below:

PostgreSQL Version:

Source: PostgreSQL 11.15
Target: PostgreSQL 16.9

Operating System:

Source:= RHEL 7.9

Target: RHEL 9.6

Network Distance:

Bo= th servers are in the same data center, connected through a high-speed inte= rnal network (low latency).

Logical Replication Settings:

= Source - Postgres 11.15.

-- =3D=3D=3D=3D WAL & Replication S= ettings =3D=3D=3D=3D

=C2=A0wal_level =3D 'logical'
=C2=A0= max_wal_senders =3D '30'
=C2=A0max_replication_slots =3D '20= '
=C2=A0wal_keep_segments =3D '800' =C2=A0 =C2=A0
=C2=A0= wal_sender_timeout =3D '300s'
=C2=A0max_worker_processes =3D = 9;32'
=C2=A0max_logical_replication_workers =3D '16'
=C2= =A0max_sync_workers_per_subscription =3D '8'

=C2=A0=3D=3D=3D= =3D WAL & Checkpoint =C2=A0=3D=3D=3D=3D

=C2=A0max_wal_size =3D &= #39;40GB'
=C2=A0min_wal_size =3D '4GB'
=C2=A0checkpoint_t= imeout =3D '45min'
=C2=A0checkpoint_completion_target =3D '0= .9'

=C2=A0=3D=3D=3D=3D =C2=A0Memory =3D=3D=3D=3D
=C2=A0shared= _buffers =3D '18GB' =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0work_mem = =3D '128MB'
=C2=A0maintenance_work_mem =3D 4GB'
=C2=A0eff= ective_cache_size =3D '275GB' =C2=A0 =C2=A0


Target D= B Postgres 16.10

=C2=A0=3D=3D=3D=3D Logical Replication Settings= =C2=A0=3D=3D=3D=3D
=C2=A0max_worker_processes =3D '32'
=C2= =A0max_logical_replication_workers =3D '16'
=C2=A0max_sync_worke= rs_per_subscription =3D '8'
=C2=A0wal_receiver_timeout =3D '= 300s'
=C2=A0
=C2=A0=3D=3D=3D=3D WAL & Checkpoint =C2=A0=3D=3D= =3D=3D

=C2=A0checkpoint_timeout =3D '45min'
=C2=A0checkpo= int_completion_target =3D '0.9'
=C2=A0max_wal_size =3D '40GB= '
=C2=A0min_wal_size =3D '4GB'

=C2=A0=3D=3D=3D=3D =C2= =A0Memory =C2=A0=3D=3D=3D=3D
=C2=A0shared_buffers =3D '18GB' =C2= =A0
=C2=A0work_mem =3D '128MB'
=C2=A0maintenance_work_mem = =3D '4GB'
=C2=A0effective_cache_size =3D '275GB'
=C2= =A0synchronous_commit =3D 'off'=C2=A0


Since you have already star= ted is that not already to late for this?

Yes We are currently in the = testing phase and validating with the above parameters. However, t= he replication process has been extremely slow =E2=80=94 it=E2=80= =99s been running for the past 5 days with limited progress.

Any specific tuning recommendations or = best practices to improve performance at this stage would be greatly apprec= iated.


Thanks & Regards
Krishna= .


On Wed, 5 Nov 2025 at 21= :07, Adrian Klaver <adrian.= klaver@aklaver.com> wrote:
On 11/4/25 22:27, Bala M wrote:
> Thank you all for your suggestions,
>
> Thanks for your quick response and for sharing the details.
> After reviewing the options, the logical replication approach seems to=
> be the most feasible one with minimal downtime.
>
> However, we currently have 7 streaming replication setups running from=
> production, with a total database size of around 15 TB. Out of this, <= br> > there are about 10 large tables ranging from 1 TB (max) to 50 GB (min)=
> each, along with approximately 150+ sequences.
>
> Could you please confirm if there are any successful case studies or <= br> > benchmarks available for a similar setup?

Since you have given minimal information in this post, I doubt there is really a way to compare to other situations. Collect the details you
provided earlier in the thread for those folks getting to it just now.

That would be:

1) Postgres versions on both ends

2) OS and versions on both ends.

3) Network distance between 'machines'.

4) The logical replication settings.

> Additionally, please share any recommended parameter tuning or best > practices for handling logical replication at this scale.

Since you have already started is that not already to late for this?



>
> Current server configuration:
>
> CPU: 144 cores
>
> RAM: 512 GB
>
>
> Thanks & Regards
> Krishna.
>



--
Adrian Klaver
adrian.klave= r@aklaver.com
--000000000000024c9f0642f00f55--