Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL5Ro-000WHH-6j for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 19:31:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uL5Rl-005Jbu-Et for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 19:31:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL5Rl-005JbL-0l for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 19:31:17 +0000 Received: from mail-oo1-xc29.google.com ([2607:f8b0:4864:20::c29]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uL5Ri-000vyt-1B for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 19:31:16 +0000 Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-6065796762fso684760eaf.3 for ; Fri, 30 May 2025 12:31:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748633473; x=1749238273; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=P8udDkxaHKJKmbXSHVtFY8ebsThC1SMa5ikKFM0NBw0=; b=c0jknqrV1GGijrBSmL1Js3gtxcI8Cd16PBAtIa29CTOYTVvFysa9gcMeU+f7CCMp1C uvB815IFx6N0jFXKvpdfxquZv4rI/H+DweUnNswetm+qr/+UKagrKYGknnmj0ejxLpxy tYP6qgfp7HqrmEKV4Q9G4IrYeSjLbYFCSHa6tiXNv3OOKt4dodNpSV+iuWS+aSMnBFcn 8HnTDtDS0AtZM9U/8mtq7u5QfeCsMoO/Srnu7bF9xKh2JCUmifhAFJlHjvFB5nB1uOuz 7L52+EeGA4SHocbSRXGKT5tAFESw+hQqcsQIq+hkThE4zNNaoUvqaAS743xHh9RmJAhw zdfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748633473; x=1749238273; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=P8udDkxaHKJKmbXSHVtFY8ebsThC1SMa5ikKFM0NBw0=; b=udO2MIMMfkiujQc45m4sTmxMvi0p8lMeKzPKoEu3UNiteBin36siNfaAh7CTMxtv6O iyoXUCAT3HZRPknVexWMP0O645cBl4JDyk4WuAqRtWb3I2lSct5jlz297j7ADf7TjYng uY7HH6uiq1PB4K0XNidXoFv71CtlN8PAgnpMplm/hoXynO9fGkGxdQ4Y7QICYUGdiWWB SAG8ZFZ1hnyX/pFuDd/GjJ6u/vkfqEdeCB0TauLLxQIUfObPFX1GyVJVyQFbGvXy3OQB J3T3xon2VMm1M5voHsL7RM455mG6lh6eC0uViIURka2Le3vfouDXNwDJwZPvehby9A4t ewdQ== X-Gm-Message-State: AOJu0YwpLhccRAJgBoRDOAdmw6D21xH6bjDNydsR2l89a2ZAgW8TGEMC pjoLQbAtQpADW3q1Igkreq0sDx3W049isuNUkLPPSkwqv1i1nBfxyOW1NR5RxKAi98wNrv7P8Kl 9A86aUFLLMW2Nph1YkMOMXc2sgI9dad8= X-Gm-Gg: ASbGncvg3xzgmJQCqMuNCWkDDQS6MJbvyWJggYYU3eHnFolSGbZRmiJGcdxOuykXdQD s8yR3AgR0sFBKGuRM9NwLK1ORcKZkeW4a9iCk7dMx54anYx7Myfr0QAz9yb/Tjzs/Sfe2L6OE1h ulOzRQNIhJCJzC4wqx0c+e+BIfrNCjgI8WCA== X-Google-Smtp-Source: AGHT+IE7lIhhKmCgvUe/LHmDFMD6PH+DbH/p0nKyb0FlBTD61QIIByip1OW0BVsNysLQYxGmevUzqxVgxPi8TyZSPHg= X-Received: by 2002:a05:6870:a922:b0:2d5:1725:f529 with SMTP id 586e51a60fabf-2e92a49a7a4mr2019260fac.27.1748633472769; Fri, 30 May 2025 12:31:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Fri, 30 May 2025 15:31:01 -0400 X-Gm-Features: AX0GCFunbMY4BNbSe-3ZAH5NPXc_0ksPHviRpSbAADtyM_mmBJ2YYe2BafuyaPM Message-ID: Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL To: Andy Hartman Cc: Pgsql-admin Content-Type: multipart/alternative; boundary="000000000000fca5c906365f74d5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000fca5c906365f74d5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable That's an unanswerable question, as I would not use Windows. =F0=9F=98=81 Seriously though, since it's an image-heavy database full of PDF and TIFF files, I'd do what I did on Linux when needing to migrate/upgrade a 6TB (including indices) db from PG 9.6 to PG 14, and took four hours: pg_dump -Z1 --jobs=3D16 On Fri, May 30, 2025 at 2:39=E2=80=AFPM Andy Hartman wrote: > What would you use for backup if PG hosted on Windows > > On Fri, May 30, 2025 at 2:10=E2=80=AFPM Ron Johnson > wrote: > >> Hmm... that was a few years ago, back when v12 was new. It took about = a >> month (mainly because they didn't want me running exports during "office >> hours"). >> >> There were 120 INSERT & SELECT (no UPDATE or DELETE) tables, so I was >> able to add indices on date columns, create by-month views. (We migrate= d >> the dozen or so *relatively* small UPDATE tables on cut-over day. On >> that same day, I migrated the current month and the previous month's dat= a >> in those 120 tables. >> >> I made separate cron jobs to: >> - export views from Oracle into COPY-style tab-separated flat files, >> - lz4-compress views that had finished exporting, and >> - scp files that were finished compressing, to an AWS EC2 VM. >> >> These jobs pipelined, so there was always a job exporting, always a job >> ready to compress tsv files, and another job ready to scp the lz4 files. >> When there was nothing for a step to do, the job would sleep for a coupl= e >> of minutes, then check if there was more work to do. >> >> On the AWS EC2 VM, a different cron job waited for files to finish >> transferring, then loaded them into the correct table. Just like with th= e >> source host jobs, the "load" job would sleep a bit and then check for mo= re >> work. I manually applied Indices. >> >> The AWS RDS PG12 database was about 4TB. Snapshots were handled by AWS. >> If this had been one of my on-prem systems, I'd have used pgbackrest. >> (pgbackrest is impressively fast: takes good advantage of PG's 1GB file >> max, and globs "small" files into one big file.) >> >> On Fri, May 30, 2025 at 12:15=E2=80=AFPM Andy Hartman >> wrote: >> >>> what was the duration start to finish of the migration of the 6tb of >>> data. then what do you use for a quick backup after archived PG data >>> >>> Thanks. >>> >>> On Fri, May 30, 2025 at 11:29=E2=80=AFAM Ron Johnson >>> wrote: >>> >>>> On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus wrote: >>>> >>>>> Hi Team, >>>>> >>>>> We are currently planning a data archival initiative for our >>>>> production PostgreSQL databases and would appreciate suggestions or >>>>> insights from the community regarding best practices and proven appro= aches. >>>>> >>>>> **Scenario:** >>>>> - We have a few large tables (several hundred million rows) where we >>>>> want to archive historical data (e.g., older than 1 year). >>>>> - The archived data should be moved to a separate PostgreSQL database >>>>> (on a same or different server). >>>>> - Our goals are: efficient data movement, minimal downtime, and safe >>>>> deletion from the source after successful archival. >>>>> >>>>> - PostgreSQL version: 15.12 >>>>> - Both source and target databases are PostgreSQL. >>>>> >>>>> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded >>>>> to a SharePoint or similar storage system. However, our infrastructur= e team >>>>> raised concerns around the computational load of large CSV processing= and >>>>> potential security implications with file transfers. >>>>> >>>>> We=E2=80=99d like to understand: >>>>> - What approaches have worked well for you in practice? >>>>> >>>> >>>> This is how I migrated 6TB of data from an Oracle database to >>>> Postgresql, and then implemented quarterly archiving of the PG databas= e: >>>> - COPY FROM (SELECT * FROM live_table WHERE date_fld in >>>> some_manageable_date_range) TO STDOUT. >>>> - Compress >>>> - scp >>>> - COPY TO archive_table. >>>> - Index >>>> - DELETE FROM live_table WHERE date_fld in some_manageable_date_range >>>> (This I only did in the PG archive process >>>> >>>> (Naturally, the Oracle migration used Oracle-specific commands.) >>>> >>>> - Are there specific tools or strategies you=E2=80=99d recommend for o= ngoing >>>>> archival? >>>>> >>>> >>>> I write generic bash loops to which you pass an array that contains th= e >>>> table name, PK, date column and date range. >>>> >>>> Given a list of tables, it did the COPY FROM, lz4 and scp. Once that >>>> finished successfully, another script dropped archive indices on the >>>> current table, COPY TO and CREATE INDEX statements. A third script di= d the >>>> deletes. >>>> >>>> This works even when the live database tables are all connected via >>>> FK. You just need to carefully order the tables in your script. >>>> >>>> >>>>> - Any performance or consistency issues we should watch out for? >>>>> >>>> >>>> My rules for scripting are "bite-sized pieces" and "check those return >>>> codes!". >>>> >>>> >>>>> Your insights or any relevant documentation/pointers would be >>>>> immensely helpful. >>>>> >>>> >>>> Index support uber alles. When deleting from a table which relies on = a >>>> foreign key link to a table which _does_ have a date field, don't hesi= tate >>>> to join on that table. >>>> >>>> And DELETE of bite-sized chunks is faster than people give it credit >>>> for. >>>> >>>> -- >>>> Death to , and butter sauce. >>>> Don't boil me, I'm still alive. >>>> lobster! >>>> >>> >> >> -- >> Death to , and butter sauce. >> Don't boil me, I'm still alive. >> lobster! >> > --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --000000000000fca5c906365f74d5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
That's an unanswerable question, as I would not use Wi= ndows.=C2=A0=C2=A0=F0=9F=98=81

Seriously though, since i= t's an image-heavy database full of PDF and TIFF files, I'd do what= I did on Linux when needing to migrate/upgrade a 6TB (including indices) d= b from PG 9.6 to PG 14, and took four hours:
pg_dump -Z1 --jobs= =3D16


On Fri, May 30, 2025 at 2:3= 9=E2=80=AFPM Andy Hartman <ha= rtman60home@gmail.com> wrote:
What would you use for backup if PG h= osted on Windows

On Fri, May 30, 2025 at 2:10=E2=80=AFPM Ron Johnson <ronljohnsonjr@gma= il.com> wrote:
Hmm... that was a few years ago, back when v12 = was new.=C2=A0 It took about=C2=A0 a month (mainly because they didn't = want=C2=A0me running exports during "office hours").
There were 120 INSERT & SELECT (no UPDATE or DELETE) table= s, so I was able to add indices on date columns, create by-month views.=C2= =A0 (We migrated the dozen or so relatively small UPDATE tables on c= ut-over day.=C2=A0 On that same day, I migrated the current month and the p= revious month's data in those 120 tables.

I ma= de separate cron jobs to:
- export views from Oracle into COPY-st= yle tab-separated flat files,=C2=A0
- lz4-compress views that had= finished exporting, and
- scp files that were finished compressi= ng, to an AWS EC2 VM.

These jobs pipelined, so the= re was always a job exporting, always a job ready to compress tsv files, an= d another job ready to scp the lz4 files.=C2=A0 When there was nothing for = a step to do, the job would sleep for a couple of minutes, then check if th= ere was more work to do.

On the AWS EC2 VM, a diff= erent cron job waited for files to finish transferring, then loaded them in= to the correct table. Just like with the source host jobs, the "load&q= uot; job would sleep a bit and then check for more work. I manually applied= Indices.

The AWS RDS PG12 database was about 4TB.= =C2=A0 Snapshots were handled by AWS.=C2=A0 If this had been one of my on-p= rem systems, I'd have used pgbackrest.=C2=A0 (pgbackrest is impressivel= y fast: takes good advantage of PG's 1GB file max, and globs "smal= l" files into one big file.)

On Fri, May 30, 2025 at 12:15=E2=80=AFPM A= ndy Hartman <hartman60home@gmail.com> wrote:
what was the duration start to fi= nish of the migration of the 6tb of data. then what do you use for a quick = backup after archived PG data=C2=A0

Thanks.

On Fri, May 30, 2025 at 1= 1:29=E2=80=AFAM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
= On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus <mplus7535@gmail.com> wrote:
Hi Team,

We are currently planning a data archival initiative for our production Po= stgreSQL databases and would appreciate suggestions or insights from the co= mmunity regarding best practices and proven approaches.

**Scenario:**
- We h= ave a few large tables (several hundred million rows) where we want to arch= ive historical data (e.g., older than 1 year).
- The= archived data should be moved to a separate PostgreSQL database (on a same= or different server).
- Our goals are: efficient da= ta movement, minimal downtime, and safe deletion from the source after succ= essful archival.

- Postg= reSQL version: 15.12
- Both source and target databa= ses are PostgreSQL.

We e= xplored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a Share= Point or similar storage system. However, our infrastructure team raised co= ncerns around the computational load of large CSV processing and potential = security implications with file transfers.

We=E2=80=99d like to understand:
= - What approaches have worked well for you in practice?

This is how I migrated 6TB of data from an Oracle = database to Postgresql, and then implemented quarterly archiving of the PG = database:
- COPY FROM (SELECT * FROM live_table WHERE date_fld in= some_manageable_date_range) TO STDOUT.
- Compress
- sc= p
- COPY TO archive_table.
- Index
- DELETE F= ROM live_table WHERE date_fld in some_manageable_date_range=C2=A0 (This I o= nly did in the PG archive process
=C2=A0
(Naturally, th= e Oracle migration used Oracle-specific commands.)

- Are there specific tools or strategies you=E2=80=99d recommend for= ongoing archival?

I write gene= ric bash loops to which you pass an array that contains the table name,=C2= =A0PK,=C2=A0date column and date range.

Given a li= st of tables, it did the COPY FROM, lz4 and scp.=C2=A0 Once that finished s= uccessfully, another script dropped=C2=A0archive indices on the current tab= le, COPY TO and CREATE INDEX statements.=C2=A0 A third script did the delet= es.

This works even when the live database tables = are all connected via FK.=C2=A0 You just need to carefully order the tables= in your script.
=C2=A0
- Any performance or cons= istency issues we should watch out for?

My rules=C2=A0for=C2=A0scripting are "bite-sized pieces"= and "check those return codes!".
=C2=A0
Your insights or any relevant documentation/pointers would be immensely h= elpful.
=C2=A0
Index support uber al= les.=C2=A0 When deleting from a table which relies on a foreign key link to= a table which _does_ have a date field, don't hesitate to join on that= table.

And DELETE of bite-sized chunks is faster = than people give it credit for.

--
Death to <Redacted>, and butter sauce.
Don&#= 39;t boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--000000000000fca5c906365f74d5--