public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ron Johnson <[email protected]>
To: Pgsql-admin <[email protected]>
Subject: Re: Guidance Requested: Migrating Large-Scale DB2 Databases to PostgreSQL
Date: Wed, 15 Oct 2025 22:04:09 -0400
Message-ID: <CANzqJaDV559WS7oRupSBrRgLdArD0p53NX2EQA7B6R-Ke-yUog@mail.gmail.com> (raw)
In-Reply-To: <CAGc_7HnBQZva+cnnm_=_pWSmrS6v217FPW7b8DC43RKXi9jqvQ@mail.gmail.com>
References: <CAGc_7HnBQZva+cnnm_=_pWSmrS6v217FPW7b8DC43RKXi9jqvQ@mail.gmail.com>

On Wed, Oct 15, 2025 at 5:14 PM mahamood hussain <[email protected]>
wrote:

> Hi Team,
>
> We are in the process of migrating several DB2 databases to PostgreSQL,
> primarily to reduce the high licensing costs associated with DB2. These
> databases support retail applications (e.g., supermarkets and stores), and
> during peak hours, we anticipate over 100 concurrent connections.
> ------------------------------
> Current Database Profile:
>
>    -
>
>    Approximately 3,000 tables in total
>    -
>
>    Around 100 tables contain active data
>    -
>
>    Most tables have low data volume
>    -
>
>    A few large tables range from 10 GB to 2 TB
>    -
>
>    The largest table contains approximately 80 billion rows
>
> ------------------------------
> Migration Approach:
>
>    -
>
>    We are using Ispirer for code conversion (DB2 to PostgreSQL).
>    -
>
>    For data migration, we are evaluating Fivetran, but noted that it
>    relies on the COPY method for data loading.
>
> ------------------------------
> Questions & Areas Where We Need Guidance:
>
>    1.
>
>    Is Fivetran a suitable option for migrating very large datasets (e.g.,
>    tables with 80+ billion rows)?
>    2.
>
>    Are there any reliable open-source tools for DB2 to PostgreSQL data
>    migration that we can use internally, without needing to invest in a tool
>    like Fivetran?
>    3.
>
>    Are there more scalable or efficient alternatives for both the initial
>    load and ongoing/incremental sync to PostgreSQL?
>
> ------------------------------
> Additional Input Requested:
>
>    -
>
>    What are the key best practices (Do’s and Don’ts) to keep in mind
>    during a large-scale DB2 → PostgreSQL migration?
>
>
Since this is retail, I bet the only two statements that ever touch the
biggest tables are INSERT and SELECT.

In a similar situation (Oracle to PG, but that should not make a
difference), I created per-month views on the source side, and then
exported them to CSV files, which then got COPY-loaded into Postgresql.

For each table, I tracked in a spreadsheet which month-csv files had been
exported, compressed. scp'd, loaded.and indexed.  Iteratively, I developed
a pretty slick "parallel assembly line" process of cron jobs of continually
looping shell scripts that exported views to CSV files, compressed CSV
files who's export had completed, and then scp'd the files once the CSV
files had finished compressing.  On the VM running PG, a cron job running a
continuously-looping shell script would pick up successfully transferred
files then decompress and load them into the appropriate table and create
secondary indices as soon as a file finished being uploaded.  *It was quite
fast*,

On cutover day, I just had to move the current month's data from those
tables, along with the (relatively) small amount of data in tables that get
UPDATEd and then ran the CREATE INDEX and ALTER TABLE ... ADD FOREIGN KEY
statements.

Honestly, that 10GB table is small enough that I'd leave until cutover day.

Note that I had six hours to do the final data moves.


>    - Are there specific PostgreSQL settings or configurations we should
>    pay attention to for optimizing performance, especially for large datasets
>    and DB2-style workloads?
>
>
What's a DB2-style workload?


>
>
> ------------------------------
>
> We are keen to ensure performance
>
The default autovacuum settings are too conservative IMO, so I set them
thusly:
autovacuum_analyze_scale_factor = 0.03
autovacuum_vacuum_scale_factor = 0.03
autovacuum_vacuum_insert_scale_factor = 0.03
default_statistics_target = 5000

And the memory configs for a system with 128GB RAM
effective_cache_size = 108GB # RAM less room for Linux & Carbon Black
shared_buffers = 32GB       # 25% of RAM
work_mem = 164MB            # ECS/100 (expected max connections)
maintenance_work_mem = 8GB

Your mileage will vary.

> , data integrity,
>

Build the PG instance with checksums enabled.  It's 2025; the computational
overhead is minuscule.
If you use Foreign Key constraints, verify that supporting indices exist.
They probably already do exist, but can't hurt to check...

PgBackRest is my go-to backup and restore program.  Multi-threaded, with
automatic PITR, compression and encryption.

and scalability throughout this migration. Any insights—particularly from
> those with experience in similar large-scale PostgreSQL
> implementations—would be highly appreciated.
>

You'll probably want to partition those big tables. Note, though, that PG
requires that the partition field *be part of the PK*.

If your existing PKs have transaction_date, invoice_date, etc as part of
the PK, then hurray; your job is easy.  If not (which is likely the case if
the PK is synthetic), then I'd partition by some number PK values.  Every
10,000,000 PK values or something.  I'd study the distribution patterns of
the data.  *For example*, if there is lower volume in January through
March, and higher volume in summer and December, then maybe partition every
7,000,000 PK values in the typically slow months, and 14,000,000 PK values
in the busy months.  Gaps between PK values due to sequence caching will
also affect that number.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Guidance Requested: Migrating Large-Scale DB2 Databases to PostgreSQL
  In-Reply-To: <CANzqJaDV559WS7oRupSBrRgLdArD0p53NX2EQA7B6R-Ke-yUog@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox