public inbox for [email protected]help / color / mirror / Atom feed
Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) 6+ messages / 2 participants [nested] [flat]
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-05 06:27 Bala M <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Bala M @ 2025-11-05 06:27 UTC (permalink / raw) To: Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; +Cc: [email protected] <[email protected]>; [email protected]; [email protected] Thank you all for your suggestions, Thanks for your quick response and for sharing the details. After reviewing the options, the logical replication approach seems to be the most feasible one with minimal downtime. However, we currently have 7 streaming replication setups running from production, with a total database size of around 15 TB. Out of this, there are about 10 large tables ranging from 1 TB (max) to 50 GB (min) each, along with approximately 150+ sequences. Could you please confirm if there are any successful case studies or benchmarks available for a similar setup? Additionally, please share any recommended parameter tuning or best practices for handling logical replication at this scale. Current server configuration: CPU: 144 cores RAM: 512 GB Thanks & Regards Krishna. On Fri, 24 Oct 2025 at 21:55, Francisco Olarte <[email protected]> wrote: > > On Thu, 23 Oct 2025 at 17:21, Greg Sabino Mullane <[email protected]> > wrote > > pg_dump is the most reliable, and the slowest. Keep in mind that only the >> actual data needs to move over (not the indexes, which get rebuilt after >> the data is loaded). You could also mix-n-match pg_logical and pg_dump if >> you have a few tables that are super large. Whether either approach fits in >> your 24 hour window is hard to say without you running some tests. >> > > Long time ago I had a similar problem and did a "running with scissors" > restore. This means: > > 1.- Prepare normal configuration, test, etc for the new version. > 2.- Prepare a restore configuration, with fsync=off, wallevel=minimal, > whatever option gives you any speed advantage. > > As the target was empty, if restore failed we could just clean and restart. > > 3.- Dump, boot with the restore configuration, restore, clean shutdown, > switch to production configuration, boot again and follow on. > > Time has passed and I lost my notes, but I remember the restore was much > faster than doing it with the normal production configuration. Given > current machine speeds, it maybe doable. > > > Francisco Olarte. > > ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-05 15:36 Adrian Klaver <[email protected]> parent: Bala M <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Adrian Klaver @ 2025-11-05 15:36 UTC (permalink / raw) To: Bala M <[email protected]>; Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; +Cc: [email protected]; [email protected] On 11/4/25 22:27, Bala M wrote: > Thank you all for your suggestions, > > Thanks for your quick response and for sharing the details. > After reviewing the options, the logical replication approach seems to > be the most feasible one with minimal downtime. > > However, we currently have 7 streaming replication setups running from > production, with a total database size of around 15 TB. Out of this, > there are about 10 large tables ranging from 1 TB (max) to 50 GB (min) > each, along with approximately 150+ sequences. > > Could you please confirm if there are any successful case studies or > benchmarks available for a similar setup? Since you have given minimal information in this post, I doubt there is really a way to compare to other situations. Collect the details you provided earlier in the thread for those folks getting to it just now. That would be: 1) Postgres versions on both ends 2) OS and versions on both ends. 3) Network distance between 'machines'. 4) The logical replication settings. > Additionally, please share any recommended parameter tuning or best > practices for handling logical replication at this scale. Since you have already started is that not already to late for this? > > Current server configuration: > > CPU: 144 cores > > RAM: 512 GB > > > Thanks & Regards > Krishna. > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-06 17:04 Bala M <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Bala M @ 2025-11-06 17:04 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; [email protected]; [email protected] Hi Adrian, Thank you for your response. Please find the requested details below: *PostgreSQL Version:* Source: PostgreSQL 11.15 Target: PostgreSQL 16.9 *Operating System:* Source: RHEL 7.9 Target: RHEL 9.6 *Network Distance:* Both servers are in the same data center, connected through a high-speed internal network (low latency). Logical Replication Settings: *Source - Postgres 11.15.* -- ==== WAL & Replication Settings ==== wal_level = 'logical' max_wal_senders = '30' max_replication_slots = '20' wal_keep_segments = '800' wal_sender_timeout = '300s' max_worker_processes = '32' max_logical_replication_workers = '16' max_sync_workers_per_subscription = '8' ==== WAL & Checkpoint ==== max_wal_size = '40GB' min_wal_size = '4GB' checkpoint_timeout = '45min' checkpoint_completion_target = '0.9' ==== Memory ==== shared_buffers = '18GB' work_mem = '128MB' maintenance_work_mem = 4GB' effective_cache_size = '275GB' *Target DB Postgres 16.10* ==== Logical Replication Settings ==== max_worker_processes = '32' max_logical_replication_workers = '16' max_sync_workers_per_subscription = '8' wal_receiver_timeout = '300s' ==== WAL & Checkpoint ==== checkpoint_timeout = '45min' checkpoint_completion_target = '0.9' max_wal_size = '40GB' min_wal_size = '4GB' ==== Memory ==== shared_buffers = '18GB' work_mem = '128MB' maintenance_work_mem = '4GB' effective_cache_size = '275GB' synchronous_commit = 'off' Since you have already started is that not already to late for this? Yes We are currently in the *testing phase* and validating with the above parameters. However, the replication process has been *extremely slow — it’s been running for the past 5 days* with limited progress. Any specific tuning recommendations or best practices to improve performance at this stage would be greatly appreciated. Thanks & Regards Krishna. On Wed, 5 Nov 2025 at 21:07, Adrian Klaver <[email protected]> wrote: > On 11/4/25 22:27, Bala M wrote: > > Thank you all for your suggestions, > > > > Thanks for your quick response and for sharing the details. > > After reviewing the options, the logical replication approach seems to > > be the most feasible one with minimal downtime. > > > > However, we currently have 7 streaming replication setups running from > > production, with a total database size of around 15 TB. Out of this, > > there are about 10 large tables ranging from 1 TB (max) to 50 GB (min) > > each, along with approximately 150+ sequences. > > > > Could you please confirm if there are any successful case studies or > > benchmarks available for a similar setup? > > Since you have given minimal information in this post, I doubt there is > really a way to compare to other situations. Collect the details you > provided earlier in the thread for those folks getting to it just now. > > That would be: > > 1) Postgres versions on both ends > > 2) OS and versions on both ends. > > 3) Network distance between 'machines'. > > 4) The logical replication settings. > > > Additionally, please share any recommended parameter tuning or best > > practices for handling logical replication at this scale. > > Since you have already started is that not already to late for this? > > > > > > > Current server configuration: > > > > CPU: 144 cores > > > > RAM: 512 GB > > > > > > Thanks & Regards > > Krishna. > > > > > > -- > Adrian Klaver > [email protected] > ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-06 18:23 Adrian Klaver <[email protected]> parent: Bala M <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Adrian Klaver @ 2025-11-06 18:23 UTC (permalink / raw) To: Bala M <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; [email protected]; [email protected] On 11/6/25 09:04, Bala M wrote: > > Hi Adrian, Thank you for your response. Please find the requested > details below: > > Since you have already started is that not already to late for this? > > Yes We are currently in the *testing phase* and validating with the > above parameters. However, the replication process has been *extremely > slow — it’s been running for the past 5 days* with limited progress. 1) Show how you have set up the publishers and subscribers. 2) Are you trying to replicate everything at once or a table or subset of tables at a time? 3) How active is the source database? > > Any specific tuning recommendations or best practices to improve > performance at this stage would be greatly appreciated. > > > Thanks & Regards > Krishna. > > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-07 00:30 Bala M <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Bala M @ 2025-11-07 00:30 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; [email protected]; [email protected] *1) Show how you have set up the publishers and subscribers.* We have configured logical replication between PostgreSQL 11 (publisher) and PostgreSQL 16 (subscriber). Publisher: PostgreSQL 11 with wal_level=logical, max_replication_slots=20, and publication created for all tables (FOR ALL TABLES). Subscriber: PostgreSQL 16 with max_logical_replication_workers=16, max_sync_workers_per_subscription=8, connected through a subscription to replicate all tables. select * from pg_publication; pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate -----------+----------+--------------+-----------+-----------+-----------+------------- test_v_pub | 10 | t | t | t | t | t (1 row) select * from pg_replication_slots ; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -------------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+----------------+--------------------- vahan_sub | pgoutput | logical | 16468 | test_v | f | t | 3722767 | | 3146687395 | 20538/7D039948 | 20538/7D045790 pg_120870_sync_94301_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/71C4FD28 | 20538/71C4FD60 pg_120870_sync_93499_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7D039948 | 20538/7D039980 pg_120870_sync_99121_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/6F290C88 | 20538/6F290CC0 pg_120870_sync_108854_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7C6B5FD0 | 20538/7C6B6020 pg_120870_sync_93451_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7D0359A8 | 20538/7D0359E0 pg_120870_sync_94589_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/4C0001E8 | 20538/4C000220 pg_120870_sync_98225_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/6F290DA0 | 20538/6F290DD8 pg_120870_sync_96438_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/75CE4120 | 20538/75CE4158 pg_120870_sync_93469_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7D00E0B0 | 20538/7D00E0E8 pg_120870_sync_94277_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/4C000178 | 20538/4C0001B0 pg_120870_sync_109085_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7B72CFE8 | 20538/7B72D020 pg_120870_sync_109110_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7AE3B078 | 20538/7AE3B0B0 pg_120870_sync_99247_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/4C0001B0 | 20538/4C0001E8 pg_120870_sync_103454_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7D0002F0 | 20538/7D000328 pg_120870_sync_96317_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/71C59D20 | 20538/71C59D58 pg_120870_sync_107790_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7A4102F0 | 20538/7A410328 pg_120870_sync_98730_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/6F00CE40 | 20538/6F00CE78 pg_120870_sync_93488_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/7CF2C4F0 | 20538/7CF2C528 pg_120870_sync_103457_7567666078127625550 | pgoutput | logical | 16468 | test_v | f | f | | | 3146687395 | 20538/6F127188 | 20538/6F1271C0 (20 rows) postgres=# select * from pg_subscription; -[ RECORD 1 ]-------+--------------------------------------------------------------------- oid | 120870 subdbid | 91683 subskiplsn | 0/0 subname | test_v_sub subowner | 10 subenabled | t subbinary | f substream | f subtwophasestate | d subdisableonerr | f subpasswordrequired | t subrunasowner | f subconninfo | host=xyz port=5432 user=postgres password=test dbname=test_v subslotname | test_v_sub subsynccommit | off subpublications | {test_v_pub} suborigin | any *2) Are you trying to replicate everything at once or a table or subsetof tables at a time?* We are replicating all tables at once (full database-level replication), not table-by-table. *3) How active is the source database?* The source database is highly active, with continuous transactions and heavy write activity across large tables. Thanks & Regards Krishna. On Thu, 6 Nov 2025 at 23:53, Adrian Klaver <[email protected]> wrote: > On 11/6/25 09:04, Bala M wrote: > > > > Hi Adrian, Thank you for your response. Please find the requested > > details below: > > > > > Since you have already started is that not already to late for this? > > > > Yes We are currently in the *testing phase* and validating with the > > above parameters. However, the replication process has been *extremely > > slow — it’s been running for the past 5 days* with limited progress. > > 1) Show how you have set up the publishers and subscribers. > > 2) Are you trying to replicate everything at once or a table or subset > of tables at a time? > > 3) How active is the source database? > > > > > Any specific tuning recommendations or best practices to improve > > performance at this stage would be greatly appreciated. > > > > > > Thanks & Regards > > Krishna. > > > > > > -- > Adrian Klaver > [email protected] > ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) @ 2025-11-08 16:34 Adrian Klaver <[email protected]> parent: Bala M <[email protected]> 0 siblings, 0 replies; 6+ messages in thread From: Adrian Klaver @ 2025-11-08 16:34 UTC (permalink / raw) To: Bala M <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; Francisco Olarte <[email protected]>; [email protected]; [email protected] On 11/6/25 16:30, Bala M wrote: > * > * > *1) Show how you have set up the publishers and subscribers.* > > We have configured logical replication between PostgreSQL 11 (publisher) > and PostgreSQL 16 (subscriber). > > Publisher: PostgreSQL 11 with wal_level=logical, > max_replication_slots=20, and publication created for all tables (FOR > ALL TABLES). > I have not worked with a database of this size, still I would think FOR ALL TABLES is not the way to go. That it along with the existing activity on the publisher would result in a slow transfer of data. -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2025-11-08 16:34 UTC | newest] Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-11-05 06:27 Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication) Bala M <[email protected]> 2025-11-05 15:36 ` Adrian Klaver <[email protected]> 2025-11-06 17:04 ` Bala M <[email protected]> 2025-11-06 18:23 ` Adrian Klaver <[email protected]> 2025-11-07 00:30 ` Bala M <[email protected]> 2025-11-08 16:34 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox