Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v9DLv-00Cft0-8Z for pgsql-admin@arkaria.postgresql.org; Thu, 16 Oct 2025 02:04:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v9DLt-009MdG-V0 for pgsql-admin@arkaria.postgresql.org; Thu, 16 Oct 2025 02:04:24 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v9DLt-009Md7-B1 for pgsql-admin@lists.postgresql.org; Thu, 16 Oct 2025 02:04:24 +0000 Received: from mail-oa1-x30.google.com ([2001:4860:4864:20::30]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v9DLq-0020QQ-01 for pgsql-admin@lists.postgresql.org; Thu, 16 Oct 2025 02:04:23 +0000 Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-3c9660aaa13so223387fac.1 for ; Wed, 15 Oct 2025 19:04:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1760580261; x=1761185061; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=v6mbGa2n25wCOrfc/lMmN/ggvWUXBPunoOW0FlDISV4=; b=aiVK16qN7HcScSBmHcObLRruM0TI2DKTdW5bOzuG2Cd+xRpYo4twOQeno3RkgFd4ll UWFniQWyr1Yrrnw4Y6+VTT3Wmrl4igl9qKBXFGD24du0HB2boKKHI5YByQVMTK2QgP47 bwJBS83VXLuWHZOemmovEiLguS2IU/TT2vokh4graahFX8veq2VBSZdEXJjUs995zJcP OL3gAje28NRJuKILAp1HR7O0Yh0Yv6zhjc+h4MTOJ5Dd7ehQuyzY2SA/+6aHloRTgItU KWHThBoompFEiRV1wK5GQrSYGEOS9HCxyOYWGE/E340Gygoe9ZijmUPjDEmMwlq4NbGr 9dzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760580261; x=1761185061; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v6mbGa2n25wCOrfc/lMmN/ggvWUXBPunoOW0FlDISV4=; b=n6rd9utnYilPBLc/lLXiGXbZLk9U/hWVyHilXaOXRvS/2nZiJ6lcdoTkW6EKdUDJqR 1BdLif2gPLwqrIuf8+LOUGBJXq1LbFduxwv72s1PGW8tR5NUzf+0JfWI26oCdsHI+O9k OdUhlAOYH5iyn01eSywSww2QCJVyE93j6AgwT7Sm05Hpk2dO+kuqRcMQE4I0JbPQ+i0F CgR1wY+cc5bB5aczKQJ1VA9/aOcoT25D1hfuka886JATr+vycsPAQBjaIamvVGOgAe0o n2GJMATOrQygI7A2Sl2lqd7PsTlX/RNxuBloJPy8jvMKvMW0DiH3U9mqrX5UbJnz4BdM zgGA== X-Gm-Message-State: AOJu0YyIXqBl+ihoJdgwUXfI983Mtm/blHOw3GJf3kQiQ5l2+OD36cI2 2J44PVdpHa07607rJNQsNqN7Tn5vuLi3A7+XKr5Ak0RdQIgpOGOJ/wPdI2/9Z/qdl2C0o3vryGe +zwbys7eNOeXSpiqC8bh9g3+s9hUZ6P12C/bH X-Gm-Gg: ASbGnctkjXaNDlTmgF/ZM/syNG70m3irphH4sFuEprGWezeJpGps0L6MR9me/fK0/dA ezNKmRfPUm5IQ5jLigldBKXPpdaUu25hjxl0XGsY8DCU5aVOQpgSFV+BHOOULraRIgIVd31PMef IQbb6kEPqK8KJBHR1/yw0YBDo3uBdWRzt71cJgmbxa54W53sLPIqAu0eNocf14PQcaW6jkSvd73 v+Vyw01YfJpMGFu0QVTXlh45pwehPiG3GrzUEmxFq+FdviCXvmURPUVHzxZDE8Nyso0i1CH X-Google-Smtp-Source: AGHT+IFOJT3DiM3EkTZzcW1W1PxJn08QB8EsKa4ccG5hNwpr+lbf/AWs4q3KRYfnpR5sCiN2yB4IDDWBmxr6dfhMcjI= X-Received: by 2002:a05:6870:ff84:b0:3c9:69fd:4515 with SMTP id 586e51a60fabf-3c969fd48d8mr974422fac.43.1760580260817; Wed, 15 Oct 2025 19:04:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Wed, 15 Oct 2025 22:04:09 -0400 X-Gm-Features: AS18NWArucu-yVX7LRJ0HONbCBikdNLJbLD5eWZhpVGdrzW0XQf3tT9BOvc9uXU Message-ID: Subject: Re: Guidance Requested: Migrating Large-Scale DB2 Databases to PostgreSQL To: Pgsql-admin Content-Type: multipart/alternative; boundary="0000000000000b718406413d09b3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000b718406413d09b3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Oct 15, 2025 at 5:14=E2=80=AFPM mahamood hussain wrote: > Hi Team, > > We are in the process of migrating several DB2 databases to PostgreSQL, > primarily to reduce the high licensing costs associated with DB2. These > databases support retail applications (e.g., supermarkets and stores), an= d > during peak hours, we anticipate over 100 concurrent connections. > ------------------------------ > Current Database Profile: > > - > > Approximately 3,000 tables in total > - > > Around 100 tables contain active data > - > > Most tables have low data volume > - > > A few large tables range from 10 GB to 2 TB > - > > The largest table contains approximately 80 billion rows > > ------------------------------ > Migration Approach: > > - > > We are using Ispirer for code conversion (DB2 to PostgreSQL). > - > > For data migration, we are evaluating Fivetran, but noted that it > relies on the COPY method for data loading. > > ------------------------------ > Questions & Areas Where We Need Guidance: > > 1. > > Is Fivetran a suitable option for migrating very large datasets (e.g., > tables with 80+ billion rows)? > 2. > > Are there any reliable open-source tools for DB2 to PostgreSQL data > migration that we can use internally, without needing to invest in a t= ool > like Fivetran? > 3. > > Are there more scalable or efficient alternatives for both the initial > load and ongoing/incremental sync to PostgreSQL? > > ------------------------------ > Additional Input Requested: > > - > > What are the key best practices (Do=E2=80=99s and Don=E2=80=99ts) to k= eep in mind > during a large-scale DB2 =E2=86=92 PostgreSQL migration? > > Since this is retail, I bet the only two statements that ever touch the biggest tables are INSERT and SELECT. In a similar situation (Oracle to PG, but that should not make a difference), I created per-month views on the source side, and then exported them to CSV files, which then got COPY-loaded into Postgresql. For each table, I tracked in a spreadsheet which month-csv files had been exported, compressed. scp'd, loaded.and indexed. Iteratively, I developed a pretty slick "parallel assembly line" process of cron jobs of continually looping shell scripts that exported views to CSV files, compressed CSV files who's export had completed, and then scp'd the files once the CSV files had finished compressing. On the VM running PG, a cron job running a continuously-looping shell script would pick up successfully transferred files then decompress and load them into the appropriate table and create secondary indices as soon as a file finished being uploaded. *It was quite fast*, On cutover day, I just had to move the current month's data from those tables, along with the (relatively) small amount of data in tables that get UPDATEd and then ran the CREATE INDEX and ALTER TABLE ... ADD FOREIGN KEY statements. Honestly, that 10GB table is small enough that I'd leave until cutover day. Note that I had six hours to do the final data moves. > - Are there specific PostgreSQL settings or configurations we should > pay attention to for optimizing performance, especially for large data= sets > and DB2-style workloads? > > What's a DB2-style workload? > > > ------------------------------ > > We are keen to ensure performance > The default autovacuum settings are too conservative IMO, so I set them thusly: autovacuum_analyze_scale_factor =3D 0.03 autovacuum_vacuum_scale_factor =3D 0.03 autovacuum_vacuum_insert_scale_factor =3D 0.03 default_statistics_target =3D 5000 And the memory configs for a system with 128GB RAM effective_cache_size =3D 108GB # RAM less room for Linux & Carbon Black shared_buffers =3D 32GB # 25% of RAM work_mem =3D 164MB # ECS/100 (expected max connections) maintenance_work_mem =3D 8GB Your mileage will vary. > , data integrity, > Build the PG instance with checksums enabled. It's 2025; the computational overhead is minuscule. If you use Foreign Key constraints, verify that supporting indices exist. They probably already do exist, but can't hurt to check... PgBackRest is my go-to backup and restore program. Multi-threaded, with automatic PITR, compression and encryption. and scalability throughout this migration. Any insights=E2=80=94particularl= y from > those with experience in similar large-scale PostgreSQL > implementations=E2=80=94would be highly appreciated. > You'll probably want to partition those big tables. Note, though, that PG requires that the partition field *be part of the PK*. If your existing PKs have transaction_date, invoice_date, etc as part of the PK, then hurray; your job is easy. If not (which is likely the case if the PK is synthetic), then I'd partition by some number PK values. Every 10,000,000 PK values or something. I'd study the distribution patterns of the data. *For example*, if there is lower volume in January through March, and higher volume in summer and December, then maybe partition every 7,000,000 PK values in the typically slow months, and 14,000,000 PK values in the busy months. Gaps between PK values due to sequence caching will also affect that number. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --0000000000000b718406413d09b3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Oct 15, 2025 at 5:14=E2=80=AFPM m= ahamood hussain <hussain.ieg@gm= ail.com> wrote:

Hi Team,

We are in the process of migr= ating several DB2 databases to PostgreSQL, primarily to reduce the high lic= ensing costs associated with DB2. These databases support retail applicatio= ns (e.g., supermarkets and stores), and during peak hours, we anticipate ov= er 100 concurrent connections.


Current Database Profile:

  • Approximately 3,000 tables in total

  • Around 100 tables contain active data

  • Most tables have low data volume

  • A few large tables range from 10 GB to 2 TB

  • The largest table contains approximately 80 billion rows


Migration Approach:<= /h3>
  • We are using Ispirer for code conversion (DB2 to PostgreSQL).

  • For data migration, we are evaluating Fivetran, but noted that it relies= on the COPY method for data loading.


Questions & Areas Where= We Need Guidance:

  1. Is Fivetran a suitable option for migrating very large datasets (e.g., t= ables with 80+ billion rows)?

  2. Are there any reliable open-source tools for DB2 to PostgreSQL data migr= ation that we can use internally, without needing to invest in a tool like = Fivetran?

  3. Are there more scalable or efficient alternatives for both the initial l= oad and ongoing/incremental sync to PostgreSQL?


Additional Input Requested:=

  • What are the key best practices (Do=E2=80=99s and Don=E2=80=99ts) to kee= p in mind during a large-scale DB2 =E2=86=92 PostgreSQL migration?

  • =

Since this is retai= l, I bet the only two statements that ever touch the biggest tables are INS= ERT and SELECT.

In a similar situation (Oracle to = PG, but that should not make a difference), I created per-month views on th= e source side, and then exported them to CSV files, which then got COPY-loa= ded into Postgresql.

For each table, I tracked in = a spreadsheet which month-csv files had been exported, compressed. scp'= d, loaded.and indexed.=C2=A0 Iteratively, I developed a pretty slick "= parallel assembly line" process of cron jobs of continually looping sh= ell scripts that exported views to CSV files, compressed CSV files who'= s export had completed, and then scp'd the files once the CSV files had= finished compressing.=C2=A0 On the VM running PG, a cron job running a con= tinuously-looping shell script would pick up successfully transferred files= then decompress and load them into the appropriate table and create second= ary indices as soon as a file finished being uploaded.=C2=A0 It was quit= e fast,=C2=A0

On cutover day, I just had to mo= ve the current month's data from those tables, along with the (relative= ly) small amount of data in tables that get UPDATEd and then ran the CREATE= INDEX and ALTER TABLE ... ADD FOREIGN KEY statements.

=
Honestly, that 10GB table is small=C2=A0enough that I'd leave unti= l cutover day.

Note that I had six hours to do the= final data moves.

  • Ar= e there specific PostgreSQL settings or configurations we should pay attent= ion to for optimizing performance, especially for large datasets and DB2-st= yle workloads?

= What's a DB2-style workload?
=C2=A0

We are keen to ensure performance

The default autovacuum settings are too conservative IMO, so I se= t them thusly:
autovacuum_analyze_scale_= factor =3D 0.03
autovacuum_vacuum_scale_factor =3D 0.03
autovacuum_va= cuum_insert_scale_factor =3D 0.03
default_statistics_target =3D 5000
=

And the memory configs for a=C2=A0system w= ith 128GB RAM
effective_cache_size =3D 1= 08GB # RAM less room for Linux & Carbon Black
shared_buffers =3D 32G= B =C2=A0 =C2=A0 =C2=A0 # 25% of RAM
work_mem =3D 164MB =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0# ECS/100 (expected max connections)
maintenance= _work_mem =3D 8GB

Your mileage will vary.

, data integrity,


Build the PG instance with checksums enabled.= =C2=A0 It's 2025; the computational overhead is minuscule.
If= you use Foreign Key constraints, verify that supporting indices exist.=C2= =A0 They probably already do exist, but can't hurt to check...

PgBackRest is my go-to backup and restore program.=C2=A0 M= ulti-threaded, with automatic PITR, compression and encryption.
<= br>

and scalability throughout this migr= ation. Any insights=E2=80=94particularly from those with experience in simi= lar large-scale PostgreSQL implementations=E2=80=94would be highly apprecia= ted.


You'll proba= bly want to partition those big tables. Note, though, that PG requires that= the partition field be part of the PK.

If = your existing PKs have transaction_date, invoice_date, etc as part of the P= K, then hurray; your job is easy.=C2=A0 If not (which is likely the case if= the PK is synthetic), then I'd partition by some number PK values.=C2= =A0 Every 10,000,000 PK values or something.=C2=A0 I'd study the distri= bution patterns of the data.=C2=A0 For example,=C2=A0if there is low= er volume in January through March, and higher volume in summer and Decembe= r, then maybe partition every 7,000,000 PK values in the typically slow mon= ths, and 14,000,000 PK values in the busy months.=C2=A0 Gaps between PK val= ues due to sequence caching will also affect that number.
<= br>
--
Death to <Redacted>, a= nd butter sauce.
Don't boil me, I'm still alive.
&= lt;Redacted> lobster!
--0000000000000b718406413d09b3--