public inbox for [email protected]
help / color / mirror / Atom feedIn-order pg_dump (or in-order COPY TO)
22+ messages / 7 participants
[nested] [flat]
* In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 19:43 Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` Re: In-order pg_dump (or in-order COPY TO) David G. Johnston <[email protected]>
2025-08-26 22:00 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 14:06 ` Re: In-order pg_dump (or in-order COPY TO) Greg Sabino Mullane <[email protected]>
2025-09-04 12:02 ` Re: In-order pg_dump (or in-order COPY TO) Álvaro Herrera <[email protected]>
0 siblings, 6 replies; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-26 19:43 UTC (permalink / raw)
To: [email protected]
Hello list,
I am storing dumps of a database (pg_dump custom format) in a
de-duplicating backup server. Each dump is many terabytes in size, so
deduplication is very important. And de-duplication itself is based on
rolling checksums which is pretty flexible, it can compensate for blocks
moving by some offset.
Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new.
This means that the data has been significantly altered. The new dumps
contain the same rows but probably in very different order. Could the
row-order have changed when doing COPY FROM with pg_restore? No idea,
but now that I think about it this can happen by many operations, like
CLUSTER, VACUUM FULL etc so the question still applies.
A *logical* dump of data shouldn't be affected by on-disk order.
Internal representation shouldn't affect the output.
This makes me wonder: Is there a way to COPY TO in primary-key order?
If that is possible, then pg_dump could make use of it.
Thanks in advance,
Dimitris
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-26 20:31 ` David G. Johnston <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
2025-08-27 12:40 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
5 siblings, 2 replies; 22+ messages in thread
From: David G. Johnston @ 2025-08-26 20:31 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <[email protected]> wrote:
> Could the
> row-order have changed when doing COPY FROM with pg_restore?
There is no reliable, meaningful, row ordering when it comes to the
physical files. Sure, cluster does make an attempt, but it is quite
limited in practice.
> A *logical* dump of data shouldn't be affected by on-disk order.
> Internal representation shouldn't affect the output.
>
The logical dump has no ordering - it will come out however it comes out.
"COPY <table> TO ..." doesn't have an order by clause - there is no way to
make or communicate to it that ordering is important. For adhoc work you
can use "COPY <query> TO ..." and put and order by in the query.
David J.
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` Re: In-order pg_dump (or in-order COPY TO) David G. Johnston <[email protected]>
@ 2025-08-26 22:08 ` Ron Johnson <[email protected]>
2025-08-26 22:17 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Ron Johnson @ 2025-08-26 22:08 UTC (permalink / raw)
To: pgsql-general
On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <
[email protected]> wrote:
> On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <[email protected]>
> wrote:
>
>> Could the
>> row-order have changed when doing COPY FROM with pg_restore?
>
>
> There is no reliable, meaningful, row ordering when it comes to the
> physical files. Sure, cluster does make an attempt, but it is quite
> limited in practice.
>
>
>> A *logical* dump of data shouldn't be affected by on-disk order.
>> Internal representation shouldn't affect the output.
>>
>
> The logical dump has no ordering - it will come out however it comes out.
> "COPY <table> TO ..." doesn't have an order by clause - there is no way to
> make or communicate to it that ordering is important.
>
Doesn't COPY TO copy out records in the order they appeared in the physical
files? That _seems_ to mean that the records laid down by COPY FROM should
be in the same order as they were in the old dump files.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` Re: In-order pg_dump (or in-order COPY TO) David G. Johnston <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
@ 2025-08-26 22:17 ` Tom Lane <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Tom Lane @ 2025-08-26 22:17 UTC (permalink / raw)
To: Ron Johnson <[email protected]>; +Cc: pgsql-general
Ron Johnson <[email protected]> writes:
> On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <
> [email protected]> wrote:
>> The logical dump has no ordering - it will come out however it comes out.
>> "COPY <table> TO ..." doesn't have an order by clause - there is no way to
>> make or communicate to it that ordering is important.
> Doesn't COPY TO copy out records in the order they appeared in the physical
> files?
It emits whatever a sequential-scan plan would emit. If you set
synchronize_seqscans = off (which pg_dump does), that will match
physical row order.
At least with our standard table AM. If you're using Aurora or
one of those other PG forks with proprietary storage layers,
you'd have to ask them.
I suspect the OP's problem is not row order per se, but differing
TIDs or XIDs, which are things pg_dump does not endeavor to
replicate. Or, given that he said something about blocks, maybe
he's actually sensitive to where the free space is.
regards, tom lane
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` Re: In-order pg_dump (or in-order COPY TO) David G. Johnston <[email protected]>
@ 2025-08-27 12:40 ` Dimitrios Apostolou <[email protected]>
1 sibling, 0 replies; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-27 12:40 UTC (permalink / raw)
To: David G. Johnston <[email protected]>; +Cc: [email protected]
On Tuesday 2025-08-26 22:31, David G. Johnston wrote:
>The logical dump has no ordering - it will come out however it comes out. "COPY <table> TO ..." doesn't have an order by clause - there is no way to make or communicate to it that ordering is important. For adhoc work you can use "COPY <query> TO ..." and put and order by in the query.
Thank you, so it's not possible currently.
How would "COPY <query> TO" behave for copying very large tables?
Would it make sense to optionally have that in pg_dump?
Or would it make sense as a new feature, to optionally order "COPY
<table> TO ..." based on primary key where available, and use that in
pg_dump option?
Dimitris
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-26 22:00 ` Ron Johnson <[email protected]>
2025-08-27 12:52 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
5 siblings, 1 reply; 22+ messages in thread
From: Ron Johnson @ 2025-08-26 22:00 UTC (permalink / raw)
To: pgsql-general
On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <[email protected]> wrote:
> Hello list,
>
> I am storing dumps of a database (pg_dump custom format) in a
> de-duplicating backup server. Each dump is many terabytes in size, so
> deduplication is very important. And de-duplication itself is based on
> rolling checksums which is pretty flexible, it can compensate for blocks
> moving by some offset.
>
This might be a silly question, but why are you using -Fc to
create multi-TB dumps instead of -Fd? If nothing else, the parallelization
granted by -Fd will greatly speed up the dumps.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:00 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
@ 2025-08-27 12:52 ` Dimitrios Apostolou <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-27 12:52 UTC (permalink / raw)
To: Ron Johnson <[email protected]>; +Cc: pgsql-general
On Wednesday 2025-08-27 00:00, Ron Johnson wrote:
>
>This might be a silly question, but why are you using -Fc to create multi-TB dumps instead of -Fd? If nothing else, the parallelization granted by -Fd will greatly speed up the dumps.
Hi Ron,
the primary reason is space. With -Fc I don't have to store the 10TB
archive locally. And I don't have to wait for pg_dump to finish. I pipe
it directly to the "borg create" command which does compression and
deduplication and sends new chunks to a remote borgbackup server.
Regards,
Dimitris
P.S. please use "Reply-all" when replying, I've missed quite a few
replies that I found on the list archives. :-)
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-26 22:08 ` Tom Lane <[email protected]>
2025-08-26 22:12 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
2025-08-27 12:34 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
5 siblings, 2 replies; 22+ messages in thread
From: Tom Lane @ 2025-08-26 22:08 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
Dimitrios Apostolou <[email protected]> writes:
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.
> This means that the data has been significantly altered. The new dumps
> contain the same rows but probably in very different order. Could the
> row-order have changed when doing COPY FROM with pg_restore?
I'd expect pg_dump/pg_restore to preserve the physical row ordering,
simply because it doesn't do anything that would change that.
However, restoring into an empty table would result in a table with
minimal free space, whereas the original table probably had a
meaningful amount of free space thanks to updates and deletes. Thus
for example TIDs would not be the same. If your "rolling checksum"
methodology is at all sensitive to page boundaries, the table would
look quite different to it.
regards, tom lane
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
@ 2025-08-26 22:12 ` Ron Johnson <[email protected]>
2025-08-26 22:23 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
1 sibling, 1 reply; 22+ messages in thread
From: Ron Johnson @ 2025-08-26 22:12 UTC (permalink / raw)
To: pgsql-generallists.postgresql.org <[email protected]>
On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <[email protected]> wrote:
> Dimitrios Apostolou <[email protected]> writes:
> > Unfortunately after I did pg_restore to a new server, I notice that the
> > dumps from the new server are not being de-duplicated, all blocks are
> > considered new.
>
> > This means that the data has been significantly altered. The new dumps
> > contain the same rows but probably in very different order. Could the
> > row-order have changed when doing COPY FROM with pg_restore?
>
> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.
>
> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes. Thus
> for example TIDs would not be the same. If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.
>
But the rolling checksums are against a pg_dump file, not a pg_basebackup
file.
What probably changed are table OIDs. Would that change the ordering of
COPY data in post-restore dump files?
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
2025-08-26 22:12 ` Re: In-order pg_dump (or in-order COPY TO) Ron Johnson <[email protected]>
@ 2025-08-26 22:23 ` Tom Lane <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Tom Lane @ 2025-08-26 22:23 UTC (permalink / raw)
To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>
Ron Johnson <[email protected]> writes:
> On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <[email protected]> wrote:
>> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
>> simply because it doesn't do anything that would change that.
> But the rolling checksums are against a pg_dump file, not a pg_basebackup
> file.
Oh, that wasn't clear to me.
> What probably changed are table OIDs. Would that change the ordering of
> COPY data in post-restore dump files?
It would not change the order of data within any one table. There are
corner cases in which different OID assignments can cause pg_dump to
emit database objects in a different order, see this recent thread:
https://www.postgresql.org/message-id/flat/20250707192654.9e.nmisch%40google.com
regards, tom lane
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:08 ` Re: In-order pg_dump (or in-order COPY TO) Tom Lane <[email protected]>
@ 2025-08-27 12:34 ` Dimitrios Apostolou <[email protected]>
1 sibling, 0 replies; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-27 12:34 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: [email protected]
On Wednesday 2025-08-27 00:08, Tom Lane wrote:
> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes. Thus
> for example TIDs would not be the same. If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.
Thanks Tom. I'm not following how the empty space in the tables could
affect the custom format dumps. Where can I read more about these TIDs?
Are they stored in the pg_dump custom format archive?
The rolling checksum method should iron out shifting of data chunks that
are around a couple MB in size. Shorter shifts will not be caught, and
I assume that the "page boundaries" changes you mentioned would happen
every 8KB. So that is definitely too fine grained for the deduplicated
algorithm.
FYI something that I forgot to mention is that pg_restore is --data-only
and writes go through walwriter. The database with the tables has been
created from scratch so every table is empty before the pg_restore. Not
sure how this affects the above.
> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.
Regardless of my specific case, it's scary to think that doing VACUUM
FULL, CLUSTER, or who knows what other maintenance command, will modify
the logical dumps. Some implicit ordering could be enforced by pg_dump
if possible, for example when a primary key exists. Does it make sense?
Is it even possible?
Thanks,
Dimitris
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-26 22:54 ` Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
5 siblings, 1 reply; 22+ messages in thread
From: Adrian Klaver @ 2025-08-26 22:54 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; [email protected]
On 8/26/25 12:43, Dimitrios Apostolou wrote:
> Hello list,
>
> I am storing dumps of a database (pg_dump custom format) in a de-
> duplicating backup server. Each dump is many terabytes in size, so
> deduplication is very important. And de-duplication itself is based on
> rolling checksums which is pretty flexible, it can compensate for blocks
> moving by some offset.
>
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.
What are the pg_dump/pg_restore commands?
What are the Postgres versions involved?
Are they community versions of Postgres or something else?
What is the depduplication program?
> Thanks in advance,
> Dimitris
>
>
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
@ 2025-08-27 12:09 ` Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-27 12:09 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; +Cc: [email protected]
On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:
>Date: Wed, 27 Aug 2025 00:54:52
>From: Adrian Klaver <[email protected]>
>To: Dimitrios Apostolou <[email protected]>, [email protected]
>Subject: Re: In-order pg_dump (or in-order COPY TO)
>
> On 8/26/25 12:43, Dimitrios Apostolou wrote:
>> Hello list,
>>
>> I am storing dumps of a database (pg_dump custom format) in a de-
>> duplicating backup server. Each dump is many terabytes in size, so
>> deduplication is very important. And de-duplication itself is based on
>> rolling checksums which is pretty flexible, it can compensate for blocks
>> moving by some offset.
>>
>> Unfortunately after I did pg_restore to a new server, I notice that the
>> dumps from the new server are not being de-duplicated, all blocks are
>> considered new.
>
> What are the pg_dump/pg_restore commands?
>
> What are the Postgres versions involved?
>
> Are they community versions of Postgres or something else?
>
> What is the depduplication program?
>
>
Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
pg_dump -v --format=custom --compress=none --no-toast-compression --serializable-deferrable db_name | borg create ...
As you can see the backup (and deduplicating) program is borgbackup.
Restore is in PostgreSQL 17:
I first create the empty tables by running the DDL commands in version
control to setup the database. And then I do pg_restore --data-only:
pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --section=data dump_file
Worth noting is that the above pg_restore goes through the WAL, i.e. all
writes are done by walwriter, not the backend directly.
Postgres is standard open source running on own server. It has a couple
of custom patches that shouldn't matter in this codepath.
>> Thanks in advance,
>> Dimitris
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-27 15:25 ` Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Adrian Klaver @ 2025-08-27 15:25 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On 8/27/25 05:09, Dimitrios Apostolou wrote:
> On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:
>
>> What are the pg_dump/pg_restore commands?
>>
>> What are the Postgres versions involved?
>>
>> Are they community versions of Postgres or something else?
>>
>> What is the depduplication program?
>>
>>
>
>
Comments in line below.
> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
>
> pg_dump -v --format=custom --compress=none --no-toast-compression --
> serializable-deferrable db_name | borg create ...
>
>
> As you can see the backup (and deduplicating) program is borgbackup.
Ok, I use BorgBackup and it is fairly forgiving of normal changes.
FYI, if you ever want to use compression check out gzip --rsyncable, I
have found it plays well with Borg. For more information see:
https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/
>
>
> Restore is in PostgreSQL 17:
>
> I first create the empty tables by running the DDL commands in version
> control to setup the database. And then I do pg_restore --data-only:
>
> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --
> section=data dump_file
If you are using only the --data section why not --data-only in the pg_dump?
Or is the pg_dump output used for other purposes?
>
>
> Worth noting is that the above pg_restore goes through the WAL, i.e. all
> writes are done by walwriter, not the backend directly.
Please explain the above further.
The problem occurs when you do the pg_dump after this restore, correct?
Is it the same pg_dump command as you show above?
>
> Postgres is standard open source running on own server. It has a couple
> of custom patches that shouldn't matter in this codepath.
For completeness and just in case they may affect the output what do the
patches do?
>
>
>>> Thanks in advance,
>>> Dimitris
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
@ 2025-08-27 16:10 ` Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-27 16:10 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; +Cc: [email protected]
On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:
> Comments in line below.
>
>> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
>>
>> pg_dump -v --format=custom --compress=none --no-toast-compression --
>> serializable-deferrable db_name | borg create ...
>>
>>
>> As you can see the backup (and deduplicating) program is borgbackup.
>
> Ok, I use BorgBackup and it is fairly forgiving of normal changes.
>
> FYI, if you ever want to use compression check out gzip --rsyncable, I have
> found it plays well with Borg. For more information see:
>
> https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/
Yes, zstd has also --rsyncable. In this case I let borg do per-chunk
compression after deduplication, it has worked well so far.
>> Restore is in PostgreSQL 17:
>>
>> I first create the empty tables by running the DDL commands in version
>> control to setup the database. And then I do pg_restore --data-only:
>>
>> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --
>> section=data dump_file
>
> If you are using only the --data section why not --data-only in the pg_dump?
I want the dump to be as complete as possible. Didn't think it would
create issues.
>
> Or is the pg_dump output used for other purposes?
It has happened that I have selectively restored user schemas from that
dump.
>> Worth noting is that the above pg_restore goes through the WAL, i.e. all
>> writes are done by walwriter, not the backend directly.
>
> Please explain the above further.
The COPY FROM data is going through the WAL, as usual INSERTS do. The
writes to disk happen by the walwriter process.
OTOH, If you have configured the server with wal_level=minimal and
BEGIN a transaction, CREATE or TRUNCATE a table, and then COPY FROM into
that table, then the backend process writes directly to the table
without logging to the WAL.
This can be much faster, but most importantly it avoids situations of
WAL overflow that are very difficult to predict and can mess your server
up completely. [1]
[1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net
My patches are for activating that codepath in pg_restore, but they were
not used on purpose and I took notice that the writes went via WAL.
>
> The problem occurs when you do the pg_dump after this restore, correct?
Correct. The first pg_dump from the restored pg17 is not deduplicated at
all. Most of the tables have not changed (logically at least; apparently
they have changed physically).
>
> Is it the same pg_dump command as you show above?
Yes.
>
>>
>> Postgres is standard open source running on own server. It has a couple of
>> custom patches that shouldn't matter in this codepath.
>
> For completeness and just in case they may affect the output what do the
> patches do?
Two patches for speeding up scanning an archive without TOC, like the
one I'm having (because it is piped into borg, instead of written to
file). These were activated, but shouldn't matter. They only build the
TOC in pg_restore's memory.
https://commitfest.postgresql.org/patch/5809/
https://commitfest.postgresql.org/patch/5817/
And two patches for speeding up pg_restore like mentioned above, under
specific arguments that I didn't provide. (one speedup needs --clean,
and the other needs --freeze).
https://commitfest.postgresql.org/patch/5821/
https://commitfest.postgresql.org/patch/5826/
IIRC I did not activate them (via --clean) because TRUNCATE fails when
foreign keys exist. See the discussion threads.
Dimitris
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-30 18:19 ` Adrian Klaver <[email protected]>
2025-08-31 01:21 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Adrian Klaver @ 2025-08-30 18:19 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On 8/27/25 09:10, Dimitrios Apostolou wrote:
>
> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:
>>
>> For completeness and just in case they may affect the output what do
>> the patches do?
>
> Two patches for speeding up scanning an archive without TOC, like the
> one I'm having (because it is piped into borg, instead of written to
> file). These were activated, but shouldn't matter. They only build the
> TOC in pg_restore's memory.
Are you sure about that?
I just did:
pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres |
borg create --stats --stdin-name pg_file --stdin-user aklaver
--stdin-group aklaver borg_test/::PgTest -
Then:
borg mount borg_test/ mnt_tmp/
cd mnt_tmp/PgTest/
and then:
pg_restore -l pg_file
and I got a TOC.
Or are you streaming the data out of the Borg archive?
>
> https://commitfest.postgresql.org/patch/5809/
> https://commitfest.postgresql.org/patch/5817/
>
> And two patches for speeding up pg_restore like mentioned above, under
> specific arguments that I didn't provide. (one speedup needs --clean,
> and the other needs --freeze).
>
> https://commitfest.postgresql.org/patch/5821/
> https://commitfest.postgresql.org/patch/5826/
>
> IIRC I did not activate them (via --clean) because TRUNCATE fails when
> foreign keys exist. See the discussion threads.
>
>
> Dimitris
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
@ 2025-08-31 01:21 ` Dimitrios Apostolou <[email protected]>
2025-08-31 15:41 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-31 01:21 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.
This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.
Thank you for testing.
Dimitris
On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:
>On 8/27/25 09:10, Dimitrios Apostolou wrote:
>>
>> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:
>
>
>>>
>>> For completeness and just in case they may affect the output what do the patches do?
>>
>> Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg, instead of written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory.
>
>Are you sure about that?
>
>I just did:
>
>pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres | borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest -
>
>Then:
>
>borg mount borg_test/ mnt_tmp/
>cd mnt_tmp/PgTest/
>
>and then:
>
>pg_restore -l pg_file
>
>and I got a TOC.
>
>Or are you streaming the data out of the Borg archive?
>
>>
>> https://commitfest.postgresql.org/patch/5809/
>> https://commitfest.postgresql.org/patch/5817/
>>
>> And two patches for speeding up pg_restore like mentioned above, under specific arguments that I didn't provide. (one speedup needs --clean, and the other needs --freeze).
>>
>> https://commitfest.postgresql.org/patch/5821/
>> https://commitfest.postgresql.org/patch/5826/
>>
>> IIRC I did not activate them (via --clean) because TRUNCATE fails when foreign keys exist. See the discussion threads.
>>
>>
>> Dimitris
>
>
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-31 01:21 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-31 15:41 ` Adrian Klaver <[email protected]>
2025-08-31 17:52 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Adrian Klaver @ 2025-08-31 15:41 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On 8/30/25 18:21, Dimitrios Apostolou wrote:
> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.
>
> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.
It may be that my coffee is not strong enough, but I don't understand
what you are trying to say.
Are you using, from previous post, the following?:
"Two patches for speeding up scanning an archive without TOC, like the
one I'm having (because it is piped into borg, instead of written to
file). These were activated, but shouldn't matter. They only build the
TOC in pg_restore's memory. "
The part I don't see is how you get a dump file without a TOC?
When I do the pg_dump and pipe it to Borg the resulting file has a TOC.
Can you show the rest of the | borg create ... command?
>
> Thank you for testing.
> Dimitris
>
> On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-31 01:21 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-31 15:41 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
@ 2025-08-31 17:52 ` Dimitrios Apostolou <[email protected]>
2025-09-01 17:24 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
0 siblings, 1 reply; 22+ messages in thread
From: Dimitrios Apostolou @ 2025-08-31 17:52 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
When I first said "dump file without TOC" I actually meant "without offsets in the TOC".
The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.
All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with borg. ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has already been written to stdout in the beginning.
Dimitris
On 31 August 2025 17:41:34 CEST, Adrian Klaver <[email protected]> wrote:
>On 8/30/25 18:21, Dimitrios Apostolou wrote:
>> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.
>>
>> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.
>
>It may be that my coffee is not strong enough, but I don't understand what you are trying to say.
>
>Are you using, from previous post, the following?:
>
>"Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg, instead of written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory. "
>
>The part I don't see is how you get a dump file without a TOC?
>
>When I do the pg_dump and pipe it to Borg the resulting file has a TOC.
>
>Can you show the rest of the | borg create ... command?
>
>>
>> Thank you for testing.
>> Dimitris
>>
>> On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:
>
>
>
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-31 01:21 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-31 15:41 ` Re: In-order pg_dump (or in-order COPY TO) Adrian Klaver <[email protected]>
2025-08-31 17:52 ` Re: In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-09-01 17:24 ` Adrian Klaver <[email protected]>
0 siblings, 0 replies; 22+ messages in thread
From: Adrian Klaver @ 2025-09-01 17:24 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On 8/31/25 10:52, Dimitrios Apostolou wrote:
> When I first said "dump file without TOC" I actually meant "without offsets in the TOC".
>
> The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.
I did some digging in the code and see that the TOC is more then that,
it stores a range of data. Still have not part where the offsets are
ignored for writes to stdout, but will keep on digging.
Getting back to your OP:
"Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new."
I ran a test on a much smaller database and I do not see the above. The
commands and the Borg info for the archive are in attached file.
>
> All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with borg. ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has already been written to stdout in the beginning.
>
> Dimitris
--
Adrian Klaver
[email protected]
--Dump from Postgres 17 instance and pipe to Borg create.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
--Dump from 17 instance and then restore to 18b3 instance
pg_dump -Fc -d pp_archive -U postgres -f pp_archive_17.out
pg_restore -c -C -d postgres -U postgres -p 5434 pp_archive_17.out
--Initial Postgres 18 dump after restore from Postgres 17 dump
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
--Second Postgres 18 dump.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
--Update single field across 644129 rows of a table.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
borg info --last 4 borg_test/
Archive name: PgTest_17
Archive fingerprint: 14163251a84f93e992b8304bbeac15056daa45ce3c8c5f76655af9ee1ef641de
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:07:20
Time (end): Mon, 2025-09-01 10:07:23
Duration: 2.80 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 6.92 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_1
Archive fingerprint: 5fd88a99b5765d0af71d450b917157fa1d682691db33306e550ff7d0597fb05f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:15
Time (end): Mon, 2025-09-01 10:08:17
Duration: 2.24 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 1.58 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_2
Archive fingerprint: fcd113299afd6673c10e44dcd621722a46e35871d7171b24a3d7d61e3f48d1da
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:43
Time (end): Mon, 2025-09-01 10:08:45
Duration: 2.17 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 1.58 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_3
Archive fingerprint: dbc0ddcfdad566d38ab3e210e376490e3e5cd1510c9cd45861b9ab550a04913f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:09:33
Time (end): Mon, 2025-09-01 10:09:36
Duration: 2.65 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.09 MB 34.40 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Attachments:
[text/plain] pg_borg_test.txt (4.6K, 2-pg_borg_test.txt)
download | inline:
--Dump from Postgres 17 instance and pipe to Borg create.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
--Dump from 17 instance and then restore to 18b3 instance
pg_dump -Fc -d pp_archive -U postgres -f pp_archive_17.out
pg_restore -c -C -d postgres -U postgres -p 5434 pp_archive_17.out
--Initial Postgres 18 dump after restore from Postgres 17 dump
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
--Second Postgres 18 dump.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
--Update single field across 644129 rows of a table.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
borg info --last 4 borg_test/
Archive name: PgTest_17
Archive fingerprint: 14163251a84f93e992b8304bbeac15056daa45ce3c8c5f76655af9ee1ef641de
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:07:20
Time (end): Mon, 2025-09-01 10:07:23
Duration: 2.80 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 6.92 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_1
Archive fingerprint: 5fd88a99b5765d0af71d450b917157fa1d682691db33306e550ff7d0597fb05f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:15
Time (end): Mon, 2025-09-01 10:08:17
Duration: 2.24 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 1.58 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_2
Archive fingerprint: fcd113299afd6673c10e44dcd621722a46e35871d7171b24a3d7d61e3f48d1da
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:43
Time (end): Mon, 2025-09-01 10:08:45
Duration: 2.17 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.00 MB 1.58 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
Archive name: PgTest_18_3
Archive fingerprint: dbc0ddcfdad566d38ab3e210e376490e3e5cd1510c9cd45861b9ab550a04913f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:09:33
Time (end): Mon, 2025-09-01 10:09:36
Duration: 2.65 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 208.87 MB 42.09 MB 34.40 MB
All archives: 835.47 MB 168.09 MB 84.90 MB
Unique chunks Total chunks
Chunk index: 162 324
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-08-27 14:06 ` Greg Sabino Mullane <[email protected]>
5 siblings, 0 replies; 22+ messages in thread
From: Greg Sabino Mullane @ 2025-08-27 14:06 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <[email protected]> wrote:
> I am storing dumps of a database (pg_dump custom format) in a
> de-duplicating backup server. Each dump is many terabytes in size, so
> deduplication is very important. And de-duplication itself is based on
> rolling checksums which is pretty flexible, it can compensate for blocks
> moving by some offset.
>
I suggest looking into pgBackRest, and it's block incremental feature,
which sounds similar to what you are doing. But it also does it with
parallel processes, and can do things like grab backup files from your
replicas, plus a lot of other features.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
^ permalink raw reply [nested|flat] 22+ messages in thread
* Re: In-order pg_dump (or in-order COPY TO)
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
@ 2025-09-04 12:02 ` Álvaro Herrera <[email protected]>
5 siblings, 0 replies; 22+ messages in thread
From: Álvaro Herrera @ 2025-09-04 12:02 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]
On 2025-Aug-26, Dimitrios Apostolou wrote:
> I am storing dumps of a database (pg_dump custom format) in a de-duplicating
> backup server. Each dump is many terabytes in size, so deduplication is very
> important. And de-duplication itself is based on rolling checksums which is
> pretty flexible, it can compensate for blocks moving by some offset.
Hello,
It's generally considered nowadays that pg_dump is not the best option
to create backups of very large databases. You may be better served by
using a binary backup tool -- something like Barman. With current
Postgres releases you can create incremental backups, which would
probably be more effective at deduplicating than playing with pg_dump's
TOC, because it's based on what actually happens to the data. Barman
provides support for hook scripts, which perhaps can be used to transfer
the backup files to Borg. (I haven't actually tried to do this, but the
Barman developers talk about using them to transfer the backups to tape,
so I imagine getting them to play with Borg it's a Simple Matter of
Programming.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/)
^ permalink raw reply [nested|flat] 22+ messages in thread
end of thread, other threads:[~2025-09-04 12:02 UTC | newest]
Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` David G. Johnston <[email protected]>
2025-08-26 22:08 ` Ron Johnson <[email protected]>
2025-08-26 22:17 ` Tom Lane <[email protected]>
2025-08-27 12:40 ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:00 ` Ron Johnson <[email protected]>
2025-08-27 12:52 ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:08 ` Tom Lane <[email protected]>
2025-08-26 22:12 ` Ron Johnson <[email protected]>
2025-08-26 22:23 ` Tom Lane <[email protected]>
2025-08-27 12:34 ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Adrian Klaver <[email protected]>
2025-08-27 12:09 ` Dimitrios Apostolou <[email protected]>
2025-08-27 15:25 ` Adrian Klaver <[email protected]>
2025-08-27 16:10 ` Dimitrios Apostolou <[email protected]>
2025-08-30 18:19 ` Adrian Klaver <[email protected]>
2025-08-31 01:21 ` Dimitrios Apostolou <[email protected]>
2025-08-31 15:41 ` Adrian Klaver <[email protected]>
2025-08-31 17:52 ` Dimitrios Apostolou <[email protected]>
2025-09-01 17:24 ` Adrian Klaver <[email protected]>
2025-08-27 14:06 ` Greg Sabino Mullane <[email protected]>
2025-09-04 12:02 ` Álvaro Herrera <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox