MIME-Version: 1.0
References: 
 <CAL5GnivMgBgRdY9YTLmAQKQa=TQVTRwghiGovK6Q6XxScdGOzg@mail.gmail.com>
 <CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com>
In-Reply-To: 
 <CANzqJaA6B7XCyqxXFfdZMYTN5GNagHBdgEzbqwcti16N9wfcDA@mail.gmail.com>
From: Andy Hartman <hartman60home@gmail.com>
Date: Fri, 30 May 2025 12:14:52 -0400
Message-ID: 
 <CAEZv3cpESEGDUu-W5WSDo=LqORjk122YR7UOEdui6ujpTU-eAQ@mail.gmail.com>
Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating
 Historical Data in PostgreSQL
To: Ron Johnson <ronljohnsonjr@gmail.com>
Cc: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000007876bb06365cb727"
Archived-At: 
 <https://www.postgresql.org/message-id/CAEZv3cpESEGDUu-W5WSDo%3DLqORjk122YR7UOEdui6ujpTU-eAQ%40mail.gmail.com>
Precedence: bulk

--0000000000007876bb06365cb727
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

what was the duration start to finish of the migration of the 6tb of data.
then what do you use for a quick backup after archived PG data

Thanks.

On Fri, May 30, 2025 at 11:29=E2=80=AFAM Ron Johnson <ronljohnsonjr@gmail.c=
om>
wrote:

> On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus <mplus7535@gmail.com> =
wrote:
>
>> Hi Team,
>>
>> We are currently planning a data archival initiative for our production
>> PostgreSQL databases and would appreciate suggestions or insights from t=
he
>> community regarding best practices and proven approaches.
>>
>> **Scenario:**
>> - We have a few large tables (several hundred million rows) where we wan=
t
>> to archive historical data (e.g., older than 1 year).
>> - The archived data should be moved to a separate PostgreSQL database (o=
n
>> a same or different server).
>> - Our goals are: efficient data movement, minimal downtime, and safe
>> deletion from the source after successful archival.
>>
>> - PostgreSQL version: 15.12
>> - Both source and target databases are PostgreSQL.
>>
>> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to =
a
>> SharePoint or similar storage system. However, our infrastructure team
>> raised concerns around the computational load of large CSV processing an=
d
>> potential security implications with file transfers.
>>
>> We=E2=80=99d like to understand:
>> - What approaches have worked well for you in practice?
>>
>
> This is how I migrated 6TB of data from an Oracle database to Postgresql,
> and then implemented quarterly archiving of the PG database:
> - COPY FROM (SELECT * FROM live_table WHERE date_fld in
> some_manageable_date_range) TO STDOUT.
> - Compress
> - scp
> - COPY TO archive_table.
> - Index
> - DELETE FROM live_table WHERE date_fld in some_manageable_date_range
> (This I only did in the PG archive process
>
> (Naturally, the Oracle migration used Oracle-specific commands.)
>
> - Are there specific tools or strategies you=E2=80=99d recommend for ongo=
ing
>> archival?
>>
>
> I write generic bash loops to which you pass an array that contains the
> table name, PK, date column and date range.
>
> Given a list of tables, it did the COPY FROM, lz4 and scp.  Once that
> finished successfully, another script dropped archive indices on the
> current table, COPY TO and CREATE INDEX statements.  A third script did t=
he
> deletes.
>
> This works even when the live database tables are all connected via FK.
> You just need to carefully order the tables in your script.
>
>
>> - Any performance or consistency issues we should watch out for?
>>
>
> My rules for scripting are "bite-sized pieces" and "check those return
> codes!".
>
>
>> Your insights or any relevant documentation/pointers would be immensely
>> helpful.
>>
>
> Index support uber alles.  When deleting from a table which relies on a
> foreign key link to a table which _does_ have a date field, don't hesitat=
e
> to join on that table.
>
> And DELETE of bite-sized chunks is faster than people give it credit for.
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

--0000000000007876bb06365cb727
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">what was the duration start to finish of the migration of =
the 6tb of data. then what do you use for a quick backup after archived PG =
data=C2=A0<br><br>Thanks.</div><br><div class=3D"gmail_quote gmail_quote_co=
ntainer"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, May 30, 2025 at 11:2=
9=E2=80=AFAM Ron Johnson &lt;<a href=3D"mailto:ronljohnsonjr@gmail.com">ron=
ljohnsonjr@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204=
);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">On Fri, May 30, 2025 =
at 3:51=E2=80=AFAM Motog Plus &lt;<a href=3D"mailto:mplus7535@gmail.com" ta=
rget=3D"_blank">mplus7535@gmail.com</a>&gt; wrote:</div><div class=3D"gmail=
_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"=
>Hi Team,<div dir=3D"auto"><br></div><div dir=3D"auto">We are currently pla=
nning a data archival initiative for our production PostgreSQL databases an=
d would appreciate suggestions or insights from the community regarding bes=
t practices and proven approaches.</div><div dir=3D"auto"><br></div><div di=
r=3D"auto">**Scenario:**</div><div dir=3D"auto">- We have a few large table=
s (several hundred million rows) where we want to archive historical data (=
e.g., older than 1 year).</div><div dir=3D"auto">- The archived data should=
 be moved to a separate PostgreSQL database (on a same or different server)=
.</div><div dir=3D"auto">- Our goals are: efficient data movement, minimal =
downtime, and safe deletion from the source after successful archival.</div=
><div dir=3D"auto"><br></div><div dir=3D"auto">- PostgreSQL version: 15.12<=
/div><div dir=3D"auto">- Both source and target databases are PostgreSQL.</=
div><div dir=3D"auto"><br></div><div dir=3D"auto">We explored using `COPY T=
O` and `COPY FROM` with CSV files, uploaded to a SharePoint or similar stor=
age system. However, our infrastructure team raised concerns around the com=
putational load of large CSV processing and potential security implications=
 with file transfers.</div><div dir=3D"auto"><br></div><div dir=3D"auto">We=
=E2=80=99d like to understand:</div><div dir=3D"auto">- What approaches hav=
e worked well for you in practice?</div></div></blockquote><div><br></div><=
div>This is how I migrated 6TB of data from an Oracle database to Postgresq=
l, and then implemented quarterly archiving of the PG database:</div><div>-=
 COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_date=
_range) TO STDOUT.</div><div>- Compress</div><div>- scp</div><div>- COPY TO=
 archive_table.</div><div>- Index</div><div>- DELETE FROM live_table WHERE =
date_fld in some_manageable_date_range=C2=A0 (This I only did in the PG arc=
hive process</div><div>=C2=A0</div><div>(Naturally, the Oracle migration us=
ed Oracle-specific commands.)</div><div><br></div><blockquote class=3D"gmai=
l_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,20=
4,204);padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto">- Are there sp=
ecific tools or strategies you=E2=80=99d recommend for ongoing archival?</d=
iv></div></blockquote><div><br></div><div>I write generic bash loops to whi=
ch you pass an array that contains the table name,=C2=A0PK,=C2=A0date colum=
n and date range.</div><div><br></div><div>Given a list of tables, it did t=
he COPY FROM, lz4 and scp.=C2=A0 Once that finished successfully, another s=
cript dropped=C2=A0archive indices on the current table, COPY TO and CREATE=
 INDEX statements.=C2=A0 A third script did the deletes.</div><div><br></di=
v><div>This works even when the live database tables are all connected via =
FK.=C2=A0 You just need to carefully order the tables in your script.</div>=
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"auto"><div dir=3D"auto">- Any performance or consistency issues we shou=
ld watch out for?</div></div></blockquote><div><br></div><div>My rules=C2=
=A0for=C2=A0scripting are &quot;bite-sized pieces&quot; and &quot;check tho=
se return codes!&quot;.</div><div>=C2=A0</div><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto">Your insights or a=
ny relevant documentation/pointers would be immensely helpful.</div></div><=
/blockquote><div>=C2=A0</div><div>Index support uber alles.=C2=A0 When dele=
ting from a table which relies on a foreign key link to a table which _does=
_ have a date field, don&#39;t hesitate to join on that table.</div><div><b=
r></div><div>And DELETE of bite-sized chunks is faster than people give it =
credit for.</div><div><br></div></div><span class=3D"gmail_signature_prefix=
">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr"=
>Death to &lt;Redacted&gt;, and butter sauce.<div>Don&#39;t boil me, I&#39;=
m still alive.<br><div><div>&lt;Redacted&gt; lobster!</div></div></div></di=
v></div></div>
</blockquote></div>

--0000000000007876bb06365cb727--