Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Ron Johnson <[email protected]>
To: Pgsql-admin <[email protected]>
Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
Date: Fri, 30 May 2025 11:29:34 -0400
Message-ID: <CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com> (raw)
In-Reply-To: <CAL5GnivMgBgRdY9YTLmAQKQa=TQVTRwghiGovK6Q6XxScdGOzg@mail.gmail.com>
References: <CAL5GnivMgBgRdY9YTLmAQKQa=TQVTRwghiGovK6Q6XxScdGOzg@mail.gmail.com>

On Fri, May 30, 2025 at 3:51 AM Motog Plus <[email protected]> wrote:

> Hi Team,
>
> We are currently planning a data archival initiative for our production
> PostgreSQL databases and would appreciate suggestions or insights from the
> community regarding best practices and proven approaches.
>
> **Scenario:**
> - We have a few large tables (several hundred million rows) where we want
> to archive historical data (e.g., older than 1 year).
> - The archived data should be moved to a separate PostgreSQL database (on
> a same or different server).
> - Our goals are: efficient data movement, minimal downtime, and safe
> deletion from the source after successful archival.
>
> - PostgreSQL version: 15.12
> - Both source and target databases are PostgreSQL.
>
> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a
> SharePoint or similar storage system. However, our infrastructure team
> raised concerns around the computational load of large CSV processing and
> potential security implications with file transfers.
>
> We’d like to understand:
> - What approaches have worked well for you in practice?
>

This is how I migrated 6TB of data from an Oracle database to Postgresql,
and then implemented quarterly archiving of the PG database:
- COPY FROM (SELECT * FROM live_table WHERE date_fld in
some_manageable_date_range) TO STDOUT.
- Compress
- scp
- COPY TO archive_table.
- Index
- DELETE FROM live_table WHERE date_fld in some_manageable_date_range
(This I only did in the PG archive process

(Naturally, the Oracle migration used Oracle-specific commands.)

- Are there specific tools or strategies you’d recommend for ongoing
> archival?
>

I write generic bash loops to which you pass an array that contains the
table name, PK, date column and date range.

Given a list of tables, it did the COPY FROM, lz4 and scp.  Once that
finished successfully, another script dropped archive indices on the
current table, COPY TO and CREATE INDEX statements.  A third script did the
deletes.

This works even when the live database tables are all connected via FK.
You just need to carefully order the tables in your script.


> - Any performance or consistency issues we should watch out for?
>

My rules for scripting are "bite-sized pieces" and "check those return
codes!".


> Your insights or any relevant documentation/pointers would be immensely
> helpful.
>

Index support uber alles.  When deleting from a table which relies on a
foreign key link to a table which _does_ have a date field, don't hesitate
to join on that table.

And DELETE of bite-sized chunks is faster than people give it credit for.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL
  In-Reply-To: <CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox