Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL4e0-000EIp-9Z for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 18:39:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uL4dy-004uR4-Op for pgsql-admin@arkaria.postgresql.org; Fri, 30 May 2025 18:39:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uL4dy-004uQv-9Q for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 18:39:50 +0000 Received: from mail-yb1-xb29.google.com ([2607:f8b0:4864:20::b29]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uL4dv-000vfx-1I for pgsql-admin@lists.postgresql.org; Fri, 30 May 2025 18:39:49 +0000 Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-e7311e66a8eso2076791276.2 for ; Fri, 30 May 2025 11:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748630386; x=1749235186; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Y+pcaOa9yDCmJg1NFme7Em4kJeg9yYWq5qCx95Fw9Ks=; b=TJlHKsOq6DJeIaTvYx5PYKRRSzc48wwe/ys8VmnM0ya3guINWN9Ldff8+h6/kz06Qx X6F3N1K2YvL0zTFoE7dBZcebyr1GwVfd7PMMIxylVfOEAkmwlq3c96kMU4Lq82aaPVjT 1AqMBGKYjeRDgE9BF+Op7RWGYX3buD+nNzK9RWM99rB+YfBkkKsvRb5URlVvyR26zWPR VRuGkPBwSErgY7edlgKgHDNZ4F/dMNG1U6tZxySyLLqFBYfK8hOYB26BUmmzW8euvpKS dNgMXhf4ukAKi+M1tYcemKqtUi725JQPU4UeOAtIKT80otJHnyLm20nW3U/NWoaCSeKU wCuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748630386; x=1749235186; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Y+pcaOa9yDCmJg1NFme7Em4kJeg9yYWq5qCx95Fw9Ks=; b=nakay2/Exc14loYspILF69Tzlku4TiaUQ3SFnNdAwY090U6ZhNnOJ8jZglrFqgsEpn /RkoKDeCl5T1wJUgU0L7MuxJ66g7mf6MM3BB5LOTRw3mP6iXnmWRSSiRU4OoXK5pZWge ZqhP7Jc/NfjPyCkzP4oU6knfuz5w9VGXfZ52/DMJgR4CT/u7CM8RKDhErkemNUbxD/zW g9adqyXc/ClXEmFzEdA9os+/cZxVbcE3eRPvcxgZ6h+3zm/qTA7vmrlhb2qefrCr5FxZ X9OQKnMgDJIzuuHGGPp0KooGC/WDSt3R8F+HUjCP+vHGRe8Hxf9C8SDX+7f12G+zTxkB sI+w== X-Gm-Message-State: AOJu0Yw7OtjDRrcond+6T+SMLHtbacCBhMcLYJI4wp6gbOD8bOlLAJYD 2cwqnGLdwa/wOU7x7KVdjpIH0ytrGWFNMY/V+AjkOfTZb/zCcctLMyEUlzdsehSEH7fg+qMV1Gj nnl9i4/KdHsicfZF12RBW3iwvJQd/sPdDtA== X-Gm-Gg: ASbGncsLgdDPHFJvUSDvktiWQKYxbEujTDuHgxDTtmhNJxxXex/LqLQj+swa/9goxMx ot9la2prip0e/VieAfBhiq7lloz8GmuIPDtS/s4JnkNiz/931DUBLde/IxfjFH3QG3xsxw3Pc9C mvfqqY9+9bJTa6puSuy74L1j8B3TOXs7c/VIlOLAowfTJndV9qddvm3Q== X-Google-Smtp-Source: AGHT+IHVXltMkEIoNzSliVFmEPnD6jZX6sqH8sEkrbhee+eEA1xu2Njv3Dzouib2qH2PEu2/sNeHM930Wujao7rkrgU= X-Received: by 2002:a05:6902:320d:b0:e7d:6779:6642 with SMTP id 3f1490d57ef6-e7f81e2c613mr6018842276.25.1748630386229; Fri, 30 May 2025 11:39:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andy Hartman Date: Fri, 30 May 2025 14:39:35 -0400 X-Gm-Features: AX0GCFtvLQWiDi0P1LeNP4vd5Wfdrxm0L7w1OfQNDaVaoAxCh92YkVvB3P9f_Sc Message-ID: Subject: Re: Seeking Suggestions for Best Practices: Archiving and Migrating Historical Data in PostgreSQL To: Ron Johnson Cc: Pgsql-admin Content-Type: multipart/alternative; boundary="00000000000003c3fa06365ebdfe" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000003c3fa06365ebdfe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable What would you use for backup if PG hosted on Windows On Fri, May 30, 2025 at 2:10=E2=80=AFPM Ron Johnson wrote: > Hmm... that was a few years ago, back when v12 was new. It took about a > month (mainly because they didn't want me running exports during "office > hours"). > > There were 120 INSERT & SELECT (no UPDATE or DELETE) tables, so I was abl= e > to add indices on date columns, create by-month views. (We migrated the > dozen or so *relatively* small UPDATE tables on cut-over day. On that > same day, I migrated the current month and the previous month's data in > those 120 tables. > > I made separate cron jobs to: > - export views from Oracle into COPY-style tab-separated flat files, > - lz4-compress views that had finished exporting, and > - scp files that were finished compressing, to an AWS EC2 VM. > > These jobs pipelined, so there was always a job exporting, always a job > ready to compress tsv files, and another job ready to scp the lz4 files. > When there was nothing for a step to do, the job would sleep for a couple > of minutes, then check if there was more work to do. > > On the AWS EC2 VM, a different cron job waited for files to finish > transferring, then loaded them into the correct table. Just like with the > source host jobs, the "load" job would sleep a bit and then check for mor= e > work. I manually applied Indices. > > The AWS RDS PG12 database was about 4TB. Snapshots were handled by AWS. > If this had been one of my on-prem systems, I'd have used pgbackrest. > (pgbackrest is impressively fast: takes good advantage of PG's 1GB file > max, and globs "small" files into one big file.) > > On Fri, May 30, 2025 at 12:15=E2=80=AFPM Andy Hartman > wrote: > >> what was the duration start to finish of the migration of the 6tb of >> data. then what do you use for a quick backup after archived PG data >> >> Thanks. >> >> On Fri, May 30, 2025 at 11:29=E2=80=AFAM Ron Johnson >> wrote: >> >>> On Fri, May 30, 2025 at 3:51=E2=80=AFAM Motog Plus wrote: >>> >>>> Hi Team, >>>> >>>> We are currently planning a data archival initiative for our productio= n >>>> PostgreSQL databases and would appreciate suggestions or insights from= the >>>> community regarding best practices and proven approaches. >>>> >>>> **Scenario:** >>>> - We have a few large tables (several hundred million rows) where we >>>> want to archive historical data (e.g., older than 1 year). >>>> - The archived data should be moved to a separate PostgreSQL database >>>> (on a same or different server). >>>> - Our goals are: efficient data movement, minimal downtime, and safe >>>> deletion from the source after successful archival. >>>> >>>> - PostgreSQL version: 15.12 >>>> - Both source and target databases are PostgreSQL. >>>> >>>> We explored using `COPY TO` and `COPY FROM` with CSV files, uploaded t= o >>>> a SharePoint or similar storage system. However, our infrastructure te= am >>>> raised concerns around the computational load of large CSV processing = and >>>> potential security implications with file transfers. >>>> >>>> We=E2=80=99d like to understand: >>>> - What approaches have worked well for you in practice? >>>> >>> >>> This is how I migrated 6TB of data from an Oracle database to >>> Postgresql, and then implemented quarterly archiving of the PG database= : >>> - COPY FROM (SELECT * FROM live_table WHERE date_fld in >>> some_manageable_date_range) TO STDOUT. >>> - Compress >>> - scp >>> - COPY TO archive_table. >>> - Index >>> - DELETE FROM live_table WHERE date_fld in some_manageable_date_range >>> (This I only did in the PG archive process >>> >>> (Naturally, the Oracle migration used Oracle-specific commands.) >>> >>> - Are there specific tools or strategies you=E2=80=99d recommend for on= going >>>> archival? >>>> >>> >>> I write generic bash loops to which you pass an array that contains the >>> table name, PK, date column and date range. >>> >>> Given a list of tables, it did the COPY FROM, lz4 and scp. Once that >>> finished successfully, another script dropped archive indices on the >>> current table, COPY TO and CREATE INDEX statements. A third script did= the >>> deletes. >>> >>> This works even when the live database tables are all connected via FK. >>> You just need to carefully order the tables in your script. >>> >>> >>>> - Any performance or consistency issues we should watch out for? >>>> >>> >>> My rules for scripting are "bite-sized pieces" and "check those return >>> codes!". >>> >>> >>>> Your insights or any relevant documentation/pointers would be immensel= y >>>> helpful. >>>> >>> >>> Index support uber alles. When deleting from a table which relies on a >>> foreign key link to a table which _does_ have a date field, don't hesit= ate >>> to join on that table. >>> >>> And DELETE of bite-sized chunks is faster than people give it credit fo= r. >>> >>> -- >>> Death to , and butter sauce. >>> Don't boil me, I'm still alive. >>> lobster! >>> >> > > -- > Death to , and butter sauce. > Don't boil me, I'm still alive. > lobster! > --00000000000003c3fa06365ebdfe Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
What would you use for backup if PG hosted on Windows
On Fri, May 30, 2025 at 2:10=E2=80=AFPM Ron Johnson <ronljohnsonjr@gmail.com> wr= ote:
Hmm... that was a few years ago, back when v12 was new.=C2=A0 It= took about=C2=A0 a month (mainly because they didn't want=C2=A0me runn= ing exports during "office hours").

Ther= e were 120 INSERT & SELECT (no UPDATE or DELETE) tables, so I was able = to add indices on date columns, create by-month views.=C2=A0 (We migrated t= he dozen or so relatively small UPDATE tables on cut-over day.=C2=A0= On that same day, I migrated the current month and the previous month'= s data in those 120 tables.

I made separate cron j= obs to:
- export views from Oracle into COPY-style tab-separated = flat files,=C2=A0
- lz4-compress views that had finished exportin= g, and
- scp files that were finished compressing, to an AWS EC2 = VM.

These jobs pipelined, so there was always a jo= b exporting, always a job ready to compress tsv files, and another job read= y to scp the lz4 files.=C2=A0 When there was nothing for a step to do, the = job would sleep for a couple of minutes, then check if there was more work = to do.

On the AWS EC2 VM, a different cron job wai= ted for files to finish transferring, then loaded them into the correct tab= le. Just like with the source host jobs, the "load" job would sle= ep a bit and then check for more work. I manually applied Indices.

The AWS RDS PG12 database was about 4TB.=C2=A0 Snapshots w= ere handled by AWS.=C2=A0 If this had been one of my on-prem systems, I'= ;d have used pgbackrest.=C2=A0 (pgbackrest is impressively fast: takes good= advantage of PG's 1GB file max, and globs "small" files into= one big file.)

On Fri, May 30, 2025 at 12:15=E2=80=AFPM Andy Hartman <<= a href=3D"mailto:hartman60home@gmail.com" target=3D"_blank">hartman60home@g= mail.com> wrote:
what was the duration start to finish of the migra= tion of the 6tb of data. then what do you use for a quick backup after arch= ived PG data=C2=A0

Thanks.

On Fri, May 30, 2025 at 11:29=E2=80=AFAM R= on Johnson <ronljohnsonjr@gmail.com> wrote:
On Fri, May 30, 2= 025 at 3:51=E2=80=AFAM Motog Plus <mplus7535@gmail.com> wrote:
Hi Team,

We are currently= planning a data archival initiative for our production PostgreSQL database= s and would appreciate suggestions or insights from the community regarding= best practices and proven approaches.

**Scenario:**
- We have a few large t= ables (several hundred million rows) where we want to archive historical da= ta (e.g., older than 1 year).
- The archived data sh= ould be moved to a separate PostgreSQL database (on a same or different ser= ver).
- Our goals are: efficient data movement, mini= mal downtime, and safe deletion from the source after successful archival.<= /div>

- PostgreSQL version: 15= .12
- Both source and target databases are PostgreSQ= L.

We explored using `CO= PY TO` and `COPY FROM` with CSV files, uploaded to a SharePoint or similar = storage system. However, our infrastructure team raised concerns around the= computational load of large CSV processing and potential security implicat= ions with file transfers.

We=E2=80=99d like to understand:
- What approaches= have worked well for you in practice?

This is how I migrated 6TB of data from an Oracle database to Postg= resql, and then implemented quarterly archiving of the PG database:
- COPY FROM (SELECT * FROM live_table WHERE date_fld in some_manageable_= date_range) TO STDOUT.
- Compress
- scp
- COP= Y TO archive_table.
- Index
- DELETE FROM live_table WH= ERE date_fld in some_manageable_date_range=C2=A0 (This I only did in the PG= archive process
=C2=A0
(Naturally, the Oracle migratio= n used Oracle-specific commands.)

- Are ther= e specific tools or strategies you=E2=80=99d recommend for ongoing archival= ?

I write generic bash loops to= which you pass an array that contains the table name,=C2=A0PK,=C2=A0date c= olumn and date range.

Given a list of tables, it d= id the COPY FROM, lz4 and scp.=C2=A0 Once that finished successfully, anoth= er script dropped=C2=A0archive indices on the current table, COPY TO and CR= EATE INDEX statements.=C2=A0 A third script did the deletes.

=
This works even when the live database tables are all connected = via FK.=C2=A0 You just need to carefully order the tables in your script.
=C2=A0
- Any performance or consistency issues we = should watch out for?

My rules= =C2=A0for=C2=A0scripting are "bite-sized pieces" and "check = those return codes!".
=C2=A0
Your insights o= r any relevant documentation/pointers would be immensely helpful.
=C2=A0
Index support uber alles.=C2=A0 When d= eleting from a table which relies on a foreign key link to a table which _d= oes_ have a date field, don't hesitate to join on that table.

And DELETE of bite-sized chunks is faster than people give = it credit for.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I&#= 39;m still alive.
<Redacted> lobster!
<= /div>


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--00000000000003c3fa06365ebdfe--