Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL2Nw-00H3wi-TZ for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 16:15:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uL2Nv-003VNV-IU for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 16:15:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL2Nv-003VNN-3h for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 16:15:07 +0000 Received: from mail-yb1-xb31.google.com ([2607:f8b0:4864:20::b31]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uL2Ns-000uaT-17 for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 16:15:06 +0000 Received: by mail-yb1-xb31.google.com with SMTP id 3f1490d57ef6-e694601f624so1607970276.1 for ; Fri, 30 May 2025 09:15:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748621703; x=1749226503; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GicyQwpl+ZtQqSc9t1j6tfRPAGofyX4qESsU+LzHMss=; b=QEqoAGHrZESjT8BzGLlJIzsLw3y1JZc3H5oKMZXHLx+s+4IQtzKiuy0iUgIsiqQTw+ 5yqpN/PhqAi8aE+EvEq0psJHpbuK8eTxt46ljwioSbO1/xn3RhqbtOvsVMbVVYzDbDwY +9zebsgROrHR+wczpCWNU+yuOpE0LDYgW+HZoKZe7ywt7Dr8BKefKhJqKKc0x7BY6rpp JG0w0m0+AzzqA8KKBt8u8tIImxOf/4MoEZ1fq6zsPduhbezzqcak82gjs/1c73Yc3QXh Yt5RYAhJ57ZsyTJGfruFksoHR8/pl8cG46Dkd7GtkLiCbDo+1dnXa0wEHAYRFDjkGlMJ nJqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748621703; x=1749226503; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GicyQwpl+ZtQqSc9t1j6tfRPAGofyX4qESsU+LzHMss=; b=lx8HdJyJXU2y19c8Qnf8wBDHM81kN09CWlR2HSaKykBY2VJlHIsPZKr7iX0Qi6prDv ZEnOhj40RirsXDSBvoTP8OTmQQxOGLz9IMeQAc/Fs4oN7LKOAO29WW1C/ReCucvSU2fI g6y/MUMFjjOflLCIzSVN6T9V8v9xp6WY4bnMSqh9CFOtOXP7L55FmRTubedbpwb6tcxR 9nSkPjjSkmt1FkYwniM0tljrBs1PADT/tiaFMdrp5pHPD8sJaTtkMbmTRavnFL8mPdpG 3PGelqwIiPECsoEglYA8hD7Oq+A8XBVWsITzzpzAgaJBwSlk1kfaKL3Cz5WJtPchNYcI mV3g== X-Gm-Message-State: AOJu0YwAMoiscH4QOGRR9RDuDh3bo74CMKBmFDyAUpuyxSraDuwv+0ZQ vURLBOym6vE9wsC/rvUPptEBrcvmBZ5zN5ekqSrqQjr1oJkKz1zL6JT4qQm05+oyqDWWCcuF12q U2TyPjrR4K++d0ICvg+q20mhQhLCWLaA= X-Gm-Gg: ASbGncuwZri7AqCyx2FmJZZMpu6a6EmQgH+Tve/fgGu4bq/t7h36EkvG2FLd8AjK7k6 GpiEYwPjgjTFEIIsLvHlfLqYfmZBOHQEIZSWYyYUvRZ03owfjpR9dubZzmQtggocdJEM5Qw/uJT OhOG1uuJ/36e8CFDjEESGcqC12eQUkGlvC X-Google-Smtp-Source: AGHT+IHNC5EBYafrvTRNp9rbGWdGIjzjAJv1YlgVYW8FJXg9xMRfpDFDg+gP9E3KJr7VQfgOGmTESQow3pbm37Y1cLA= X-Received: by 2002:a05:6902:2b89:b0:e7f:733e:5cc5 with SMTP id 3f1490d57ef6-e7f81e5f745mr5401119276.25.1748621703278; Fri, 30 May 2025 09:15:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andy Hartman Date: Fri, 30 May 2025 12:14:52 -0400 X-Gm-Features: AX0GCFtvYpGLcCzXLkAOy22hB96-G1k4IiC8z-jdn4MWfVp20bFa0Y4BjKIbhk8 Message-ID: Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL To: Ron Johnson Cc: Pgsql-admin Content-Type: multipart/alternative; boundary="0000000000007876bb06365cb727" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007876bb06365cb727 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable what was the duration start to finish of the migration of the 6tb of data. then what do you use for a quick backup after archived PG data Thanks. On Fri, May 30, 2025 at 11:29=E2=80=AFAM Ron Johnson wrote: > On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus = wrote: > >> Hi Team, >> >> We are currently planning a data archival initiative for our production >> PostgreSQL databases and would appreciate suggestions or insights from t= he >> community regarding best practices and proven approaches. >> >> **Scenario:** >> - We have a few large tables (several hundred million rows) where we wan= t >> to archive historical data (e.g., older than 1 year). >> - The archived data should be moved to a separate PostgreSQL database (o= n >> a same or different server). >> - Our goals are: efficient data movement, minimal downtime, and safe >> deletion from the source after successful archival. >> >> - PostgreSQL version: 15.12 >> - Both source and target databases are PostgreSQL. >> >> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to = a >> SharePoint or similar storage system. However, our infrastructure team >> raised concerns around the computational load of large CSV processing an= d >> potential security implications with file transfers. >> >> We=E2=80=99d like to understand: >> - What approaches have worked well for you in practice? >> > > This is how I migrated 6TB of data from an Oracle database to Postgresql, > and then implemented quarterly archiving of the PG database: > - COPY FROM (SELECT * FROM live_table WHERE date_fld in > some_manageable_date_range) TO STDOUT. > - Compress > - scp > - COPY TO archive_table. > - Index > - DELETE FROM live_table WHERE date_fld in some_manageable_date_range > (This I only did in the PG archive process > > (Naturally, the Oracle migration used Oracle-specific commands.) > > - Are there specific tools or strategies you=E2=80=99d recommend for ongo= ing >> archival? >> > > I write generic bash loops to which you pass an array that contains the > table name, PK, date column and date range. > > Given a list of tables, it did the COPY FROM, lz4 and scp. Once that > finished successfully, another script dropped archive indices on the > current table, COPY TO and CREATE INDEX statements. A third script did t= he > deletes. > > This works even when the live database tables are all connected via FK. > You just need to carefully order the tables in your script. > > >> - Any performance or consistency issues we should watch out for? >> > > My rules for scripting are "bite-sized pieces" and "check those return > codes!". > > >> Your insights or any relevant documentation/pointers would be immensely >> helpful. >> > > Index support uber alles. When deleting from a table which relies on a > foreign key link to a table which _does_ have a date field, don't hesitat= e > to join on that table. > > And DELETE of bite-sized chunks is faster than people give it credit for. > > -- > Death to , and butter sauce. > Don't boil me, I'm still alive. > lobster! > --0000000000007876bb06365cb727 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
what was the duration start to finish of the migration of = the 6tb of data. then what do you use for a quick backup after archived PG = data=C2=A0

Thanks.

On Fri, May 30, 2025 at 11:2= 9=E2=80=AFAM Ron Johnson <ron= ljohnsonjr@gmail.com> wrote:
On Fri, May 30, 2025 = at 3:51=E2=80=AFAM Motog Plus <mplus7535@gmail.com> wrote:
Hi Team,

We are currently pla= nning a data archival initiative for our production PostgreSQL databases an= d would appreciate suggestions or insights from the community regarding bes= t practices and proven approaches.

**Scenario:**
- We have a few large table= s (several hundred million rows) where we want to archive historical data (= e.g., older than 1 year).
- The archived data should= be moved to a separate PostgreSQL database (on a same or different server)= .
- Our goals are: efficient data movement, minimal = downtime, and safe deletion from the source after successful archival.

- PostgreSQL version: 15.12<= /div>
- Both source and target databases are PostgreSQL.

We explored using `COPY T= O` and `COPY FROM` with CSV files, uploaded to a SharePoint or similar stor= age system. However, our infrastructure team raised concerns around the com= putational load of large CSV processing and potential security implications= with file transfers.

We= =E2=80=99d like to understand:
- What approaches hav= e worked well for you in practice?

<= div>This is how I migrated 6TB of data from an Oracle database to Postgresq= l, and then implemented quarterly archiving of the PG database:
-= COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_date= _range) TO STDOUT.
- Compress
- scp
- COPY TO= archive_table.
- Index
- DELETE FROM live_table WHERE = date_fld in some_manageable_date_range=C2=A0 (This I only did in the PG arc= hive process
=C2=A0
(Naturally, the Oracle migration us= ed Oracle-specific commands.)

- Are there sp= ecific tools or strategies you=E2=80=99d recommend for ongoing archival?

I write generic bash loops to whi= ch you pass an array that contains the table name,=C2=A0PK,=C2=A0date colum= n and date range.

Given a list of tables, it did t= he COPY FROM, lz4 and scp.=C2=A0 Once that finished successfully, another s= cript dropped=C2=A0archive indices on the current table, COPY TO and CREATE= INDEX statements.=C2=A0 A third script did the deletes.

This works even when the live database tables are all connected via = FK.=C2=A0 You just need to carefully order the tables in your script.
=
=C2=A0
- Any performance or consistency issues we shou= ld watch out for?

My rules=C2= =A0for=C2=A0scripting are "bite-sized pieces" and "check tho= se return codes!".
=C2=A0
Your insights or a= ny relevant documentation/pointers would be immensely helpful.
<= /blockquote>
=C2=A0
Index support uber alles.=C2=A0 When dele= ting from a table which relies on a foreign key link to a table which _does= _ have a date field, don't hesitate to join on that table.
And DELETE of bite-sized chunks is faster than people give it = credit for.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'= m still alive.
<Redacted> lobster!
--0000000000007876bb06365cb727--