Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL1g5-00Gn2C-Qb for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 15:29:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uL1g4-0037nv-Hy for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 15:29:48 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL1g4-0037nk-2i for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 15:29:48 +0000 Received: from mail-oa1-x2d.google.com ([2001:4860:4864:20::2d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uL1g2-000jIz-0m for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 15:29:47 +0000 Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-2e3e58edab5so860069fac.3 for ; Fri, 30 May 2025 08:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748618985; x=1749223785; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=TQjte8j9csk5+aT3YrSBMxQlGGedz7ME0nR6ykUxq/c=; b=gMXjzO77eRf4ss1/ZiQpAFuQGYTXdZeUTtYHp7mdk+GcKiFbAEhR57tyHi31QAJ1vH 4SNp+ZD2dHmML07FvYPdaQLJI7W7anUvVBmyZVTsc6lj+L0QTuplL8MiqZXTrV8imRfV kFdpUOnWVowjH5Kmb34+7n9PxR+uwId671KvNtA21wJoPEedA5lK/PoEI6lq0KXNulHC 1qMHtFjNn7gw4eWnalLV0viYomT+1h1RAeRyf7xoqia+bO30kLu8hBhNxC/65aXqIRnK ho1akQvCamvtIp37Ag1Hyz5tgsjZB3b8NIsrPc44ED7vGE9KUaK8o4hzM3HDPmpbNI4w pIvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748618985; x=1749223785; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TQjte8j9csk5+aT3YrSBMxQlGGedz7ME0nR6ykUxq/c=; b=je/F496APxTLjmCDU/a8lXg556Vz3AQDQFEOoI1XodlaNb1OnqN76DM4VLXcOC8H40 FKPoIdf0tAjT7Us4kvn/CzINrhGq5+jSSBTjlH23zFiGdDBqGtnt/V94RSRVb9tKZPbL cI1tiFtnJGdDSxRCOJHt7e/brUXXQGthaZhvvRDsPUpas3R+W49cZAuKVeUbmHNQQmPO PsFEB7wLu8CYDGIZJtsPGG9OeTeUcKudcKsZ5x6OfX4wIvqTMotwEXKwJqEp1mm6tpHQ sm3bso6LZoqBVijsiWXV5dPI9INa69bxtWOsrH0cBA1cIOx4Spf3ATRdHDWbAd9IhwZz HZzg== X-Gm-Message-State: AOJu0YxLG3r4hNFfecT/IItrROZxFVrOA5NZvkRTV+emU8euK0cnx0jy eEIJnDLR4hQ1nnOk7eAjX0KYOnWZUPbwAScnifY7+HDs+evEhf9hSoVbR7FPIbxxYRFPYBYk4c9 DfATyGDMGdPEhB0EAIq3iVF2ngNUmrLxvSTFc X-Gm-Gg: ASbGncvYV9oWSKoC5jywaXIlVYR8X2egMRT6YGtfQt9APQLp/+NXBIoQGvivcv1SZAI IZ/BgEvE5EyQtrB0DMO0mJojxm3FjZy4EZCxj4Uqf2a3BLdHm2XZIAAo95rMMWJ1ePC77QKWPOc IhI+lVUA0Uetkl+ZnMqvcRIgEia7ab9dE3QVBTLAabnLgQ X-Google-Smtp-Source: AGHT+IG8h4oYVbyybStyim/XZe+2WSstF/mIPTchidgjj/ivYsVGWGniTtPOJS1hmvnAsQFQ2GlTf4CbPfW9ZfujUmI= X-Received: by 2002:a05:6870:9b0a:b0:2db:a997:7a62 with SMTP id 586e51a60fabf-2e92a1da77dmr1387816fac.17.1748618985390; Fri, 30 May 2025 08:29:45 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Fri, 30 May 2025 11:29:34 -0400 X-Gm-Features: AX0GCFuQAfWp9wQtUl9G4-5I1n3hynEug-uEviy-MFtgYH8HU_nPEK6L40Ai3Rs Message-ID: Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL To: Pgsql-admin Content-Type: multipart/alternative; boundary="00000000000078c80806365c1598" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000078c80806365c1598 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus wr= ote: > Hi Team, > > We are currently planning a data archival initiative for our production > PostgreSQL databases and would appreciate suggestions or insights from th= e > community regarding best practices and proven approaches. > > **Scenario:** > - We have a few large tables (several hundred million rows) where we want > to archive historical data (e.g., older than 1 year). > - The archived data should be moved to a separate PostgreSQL database (on > a same or different server). > - Our goals are: efficient data movement, minimal downtime, and safe > deletion from the source after successful archival. > > - PostgreSQL version: 15.12 > - Both source and target databases are PostgreSQL. > > We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded to a > SharePoint or similar storage system. However, our infrastructure team > raised concerns around the computational load of large CSV processing and > potential security implications with file transfers. > > We=E2=80=99d like to understand: > - What approaches have worked well for you in practice? > This is how I migrated 6TB of data from an Oracle database to Postgresql, and then implemented quarterly archiving of the PG database: - COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_date_range) TO STDOUT. - Compress - scp - COPY TO archive_table. - Index - DELETE FROM live_table WHERE date_fld in some_manageable_date_range (This I only did in the PG archive process (Naturally, the Oracle migration used Oracle-specific commands.) - Are there specific tools or strategies you=E2=80=99d recommend for ongoin= g > archival? > I write generic bash loops to which you pass an array that contains the table name, PK, date column and date range. Given a list of tables, it did the COPY FROM, lz4 and scp. Once that finished successfully, another script dropped archive indices on the current table, COPY TO and CREATE INDEX statements. A third script did the deletes. This works even when the live database tables are all connected via FK. You just need to carefully order the tables in your script. > - Any performance or consistency issues we should watch out for? > My rules for scripting are "bite-sized pieces" and "check those return codes!". > Your insights or any relevant documentation/pointers would be immensely > helpful. > Index support uber alles. When deleting from a table which relies on a foreign key link to a table which _does_ have a date field, don't hesitate to join on that table. And DELETE of bite-sized chunks is faster than people give it credit for. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --00000000000078c80806365c1598 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, May 30, 2025 at 3:51=E2=80=AFAM M= otog Plus <mplus7535@gmail.com> wrote:
Hi Team,

We are currently planning a data arc= hival initiative for our production PostgreSQL databases and would apprecia= te suggestions or insights from the community regarding best practices and = proven approaches.

**Sce= nario:**
- We have a few large tables (several hundr= ed million rows) where we want to archive historical data (e.g., older than= 1 year).
- The archived data should be moved to a s= eparate PostgreSQL database (on a same or different server).
- Our goals are: efficient data movement, minimal downtime, and s= afe deletion from the source after successful archival.

- PostgreSQL version: 15.12
- Both source and target databases are PostgreSQL.

We explored using `COPY TO` and `COPY= FROM` with CSV files, uploaded to a SharePoint or similar storage system. = However, our infrastructure team raised concerns around the computational l= oad of large CSV processing and potential security implications with file t= ransfers.

We=E2=80=99d l= ike to understand:
- What approaches have worked wel= l for you in practice?

This is = how I migrated 6TB of data from an Oracle database to Postgresql, and then = implemented quarterly archiving of the PG database:
- COPY FROM (= SELECT * FROM live_table WHERE date_fld in some_manageable_date_range) TO S= TDOUT.
- Compress
- scp
- COPY TO archive_tab= le.
- Index
- DELETE FROM live_table WHERE date_fld in = some_manageable_date_range=C2=A0 (This I only did in the PG archive process=
=C2=A0
(Naturally, the Oracle migration used Oracle-sp= ecific commands.)

- Are there specific tools= or strategies you=E2=80=99d recommend for ongoing archival?

I write generic bash loops to which you pass = an array that contains the table name,=C2=A0PK,=C2=A0date column and date r= ange.

Given a list of tables, it did the COPY FROM= , lz4 and scp.=C2=A0 Once that finished successfully, another script droppe= d=C2=A0archive indices on the current table, COPY TO and CREATE INDEX state= ments.=C2=A0 A third script did the deletes.

This = works even when the live database tables are all connected via FK.=C2=A0 Yo= u just need to carefully order the tables in your script.
=C2=A0<= /div>
- Any performance or consistency issues we should watch out= for?

My rules=C2=A0for=C2=A0sc= ripting are "bite-sized pieces" and "check those return code= s!".
=C2=A0
Your insights or any relevant do= cumentation/pointers would be immensely helpful.
=C2=A0

--
Death to <= Redacted>, and butter sauce.
Don't boil me, I'm still alive.=
<Redacted> lobster!
--00000000000078c80806365c1598--