MIME-Version: 1.0
References: 
 <CAL5GnivMgBgRdY9YTLmAQKQa=TQVTRwghiGovK6Q6XxScdGOzg@mail.gmail.com>
In-Reply-To: 
 <CAL5GnivMgBgRdY9YTLmAQKQa=TQVTRwghiGovK6Q6XxScdGOzg@mail.gmail.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Fri, 30 May 2025 11:29:34 -0400
Message-ID: 
 <CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com>
Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating
 Historical Data in PostgreSQL
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000078c80806365c1598"
Archived-At: 
 <https://www.postgresql.org/message-id/CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA%40mail.gmail.com>
Precedence: bulk

--00000000000078c80806365c1598
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus <mplus7535@gmail.com> wr=
ote:

> Hi Team,
>
> We are currently planning a data archival initiative for our production
> PostgreSQL databases and would appreciate suggestions or insights from th=
e
> community regarding best practices and proven approaches.
>
> **Scenario:**
> - We have a few large tables (several hundred million rows) where we want
> to archive historical data (e.g., older than 1 year).
> - The archived data should be moved to a separate PostgreSQL database (on
> a same or different server).
> - Our goals are: efficient data movement, minimal downtime, and safe
> deletion from the source after successful archival.
>
> - PostgreSQL version: 15.12
> - Both source and target databases are PostgreSQL.
>
> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a
> SharePoint or similar storage system. However, our infrastructure team
> raised concerns around the computational load of large CSV processing and
> potential security implications with file transfers.
>
> We=E2=80=99d like to understand:
> - What approaches have worked well for you in practice?
>

This is how I migrated 6TB of data from an Oracle database to Postgresql,
and then implemented quarterly archiving of the PG database:
- COPY FROM (SELECT * FROM live_table WHERE date_fld in
some_manageable_date_range) TO STDOUT.
- Compress
- scp
- COPY TO archive_table.
- Index
- DELETE FROM live_table WHERE date_fld in some_manageable_date_range
(This I only did in the PG archive process

(Naturally, the Oracle migration used Oracle-specific commands.)

- Are there specific tools or strategies you=E2=80=99d recommend for ongoin=
g
> archival?
>

I write generic bash loops to which you pass an array that contains the
table name, PK, date column and date range.

Given a list of tables, it did the COPY FROM, lz4 and scp.  Once that
finished successfully, another script dropped archive indices on the
current table, COPY TO and CREATE INDEX statements.  A third script did the
deletes.

This works even when the live database tables are all connected via FK.
You just need to carefully order the tables in your script.


> - Any performance or consistency issues we should watch out for?
>

My rules for scripting are "bite-sized pieces" and "check those return
codes!".


> Your insights or any relevant documentation/pointers would be immensely
> helpful.
>

Index support uber alles.  When deleting from a table which relies on a
foreign key link to a table which _does_ have a date field, don't hesitate
to join on that table.

And DELETE of bite-sized chunks is faster than people give it credit for.

--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--00000000000078c80806365c1598
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Fri, May 30, 2025 at 3:51=E2=80=AFAM M=
otog Plus &lt;<a href=3D"mailto:mplus7535@gmail.com">mplus7535@gmail.com</a=
>&gt; wrote:</div><div class=3D"gmail_quote gmail_quote_container"><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto">Hi Team,<div di=
r=3D"auto"><br></div><div dir=3D"auto">We are currently planning a data arc=
hival initiative for our production PostgreSQL databases and would apprecia=
te suggestions or insights from the community regarding best practices and =
proven approaches.</div><div dir=3D"auto"><br></div><div dir=3D"auto">**Sce=
nario:**</div><div dir=3D"auto">- We have a few large tables (several hundr=
ed million rows) where we want to archive historical data (e.g., older than=
 1 year).</div><div dir=3D"auto">- The archived data should be moved to a s=
eparate PostgreSQL database (on a same or different server).</div><div dir=
=3D"auto">- Our goals are: efficient data movement, minimal downtime, and s=
afe deletion from the source after successful archival.</div><div dir=3D"au=
to"><br></div><div dir=3D"auto">- PostgreSQL version: 15.12</div><div dir=
=3D"auto">- Both source and target databases are PostgreSQL.</div><div dir=
=3D"auto"><br></div><div dir=3D"auto">We explored using `COPY TO` and `COPY=
 FROM` with CSV files, uploaded to a SharePoint or similar storage system. =
However, our infrastructure team raised concerns around the computational l=
oad of large CSV processing and potential security implications with file t=
ransfers.</div><div dir=3D"auto"><br></div><div dir=3D"auto">We=E2=80=99d l=
ike to understand:</div><div dir=3D"auto">- What approaches have worked wel=
l for you in practice?</div></div></blockquote><div><br></div><div>This is =
how I migrated 6TB of data from an Oracle database to Postgresql, and then =
implemented quarterly archiving of the PG database:</div><div>- COPY FROM (=
SELECT * FROM live_table WHERE date_fld in some_manageable_date_range) TO S=
TDOUT.</div><div>- Compress</div><div>- scp</div><div>- COPY TO archive_tab=
le.</div><div>- Index</div><div>- DELETE FROM live_table WHERE date_fld in =
some_manageable_date_range=C2=A0 (This I only did in the PG archive process=
</div><div>=C2=A0</div><div>(Naturally, the Oracle migration used Oracle-sp=
ecific commands.)</div><div><br></div><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddi=
ng-left:1ex"><div dir=3D"auto"><div dir=3D"auto">- Are there specific tools=
 or strategies you=E2=80=99d recommend for ongoing archival?</div></div></b=
lockquote><div><br></div><div>I write generic bash loops to which you pass =
an array that contains the table name,=C2=A0PK,=C2=A0date column and date r=
ange.</div><div><br></div><div>Given a list of tables, it did the COPY FROM=
, lz4 and scp.=C2=A0 Once that finished successfully, another script droppe=
d=C2=A0archive indices on the current table, COPY TO and CREATE INDEX state=
ments.=C2=A0 A third script did the deletes.</div><div><br></div><div>This =
works even when the live database tables are all connected via FK.=C2=A0 Yo=
u just need to carefully order the tables in your script.</div><div>=C2=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><d=
iv dir=3D"auto">- Any performance or consistency issues we should watch out=
 for?</div></div></blockquote><div><br></div><div>My rules=C2=A0for=C2=A0sc=
ripting are &quot;bite-sized pieces&quot; and &quot;check those return code=
s!&quot;.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"=
margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-lef=
t:1ex"><div dir=3D"auto"><div dir=3D"auto">Your insights or any relevant do=
cumentation/pointers would be immensely helpful.</div></div></blockquote><d=
iv>=C2=A0</div><div>Index support uber alles.=C2=A0 When deleting from a ta=
ble which relies on a foreign key link to a table which _does_ have a date =
field, don&#39;t hesitate to join on that table.</div><div><br></div><div>A=
nd DELETE of bite-sized chunks is faster than people give it credit for.</d=
iv><div><br></div></div><span class=3D"gmail_signature_prefix">-- </span><b=
r><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr">Death to &lt;=
Redacted&gt;, and butter sauce.<div>Don&#39;t boil me, I&#39;m still alive.=
<br><div><div>&lt;Redacted&gt; lobster!</div></div></div></div></div></div>

--00000000000078c80806365c1598--