In-order pg_dump (or in-order COPY TO)

public inbox for [email protected]  
help / color / mirror / Atom feed

In-order pg_dump (or in-order COPY TO)
22+ messages / 7 participants
[nested] [flat]

* In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 19:43  Dimitrios Apostolou <[email protected]>
  0 siblings, 6 replies; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-26 19:43 UTC (permalink / raw)
  To: [email protected]

Hello list,

I am storing dumps of a database (pg_dump custom format) in a 
de-duplicating backup server. Each dump is many terabytes in size, so 
deduplication is very important. And de-duplication itself is based on 
rolling checksums which is pretty flexible, it can compensate for blocks 
moving by some offset.

Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new.

This means that the data has been significantly altered. The new dumps 
contain the same rows but probably in very different order. Could the 
row-order have changed when doing COPY FROM with pg_restore? No idea, 
but now that I think about it this can happen by many operations, like 
CLUSTER, VACUUM FULL etc so the question still applies.

A *logical* dump of data shouldn't be affected by on-disk order. 
Internal representation shouldn't affect the output.

This makes me wonder: Is there a way to COPY TO in primary-key order?

If that is possible, then pg_dump could make use of it.

Thanks in advance,
Dimitris

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 20:31  David G. Johnston <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 2 replies; 22+ messages in thread

From: David G. Johnston @ 2025-08-26 20:31 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <[email protected]> wrote:

> Could the
> row-order have changed when doing COPY FROM with pg_restore?

There is no reliable, meaningful, row ordering when it comes to the
physical files.  Sure, cluster does make an attempt, but it is quite
limited in practice.

> A *logical* dump of data shouldn't be affected by on-disk order.
> Internal representation shouldn't affect the output.
>

The logical dump has no ordering - it will come out however it comes out.
"COPY <table> TO ..." doesn't have an order by clause - there is no way to
make or communicate to it that ordering is important.  For adhoc work you
can use "COPY <query> TO ..." and put and order by in the query.

David J.

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:00  Ron Johnson <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 1 reply; 22+ messages in thread

From: Ron Johnson @ 2025-08-26 22:00 UTC (permalink / raw)
  To: pgsql-general

On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <[email protected]> wrote:

> Hello list,
>
> I am storing dumps of a database (pg_dump custom format) in a
> de-duplicating backup server. Each dump is many terabytes in size, so
> deduplication is very important. And de-duplication itself is based on
> rolling checksums which is pretty flexible, it can compensate for blocks
> moving by some offset.
>

This might be a silly question, but why are you using -Fc to
create multi-TB dumps instead of -Fd?  If nothing else, the parallelization
granted by -Fd will greatly speed up the dumps.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:08  Tom Lane <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 2 replies; 22+ messages in thread

From: Tom Lane @ 2025-08-26 22:08 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

Dimitrios Apostolou <[email protected]> writes:
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.

> This means that the data has been significantly altered. The new dumps 
> contain the same rows but probably in very different order. Could the 
> row-order have changed when doing COPY FROM with pg_restore?

I'd expect pg_dump/pg_restore to preserve the physical row ordering,
simply because it doesn't do anything that would change that.

However, restoring into an empty table would result in a table with
minimal free space, whereas the original table probably had a
meaningful amount of free space thanks to updates and deletes.  Thus
for example TIDs would not be the same.  If your "rolling checksum"
methodology is at all sensitive to page boundaries, the table would
look quite different to it.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:08  Ron Johnson <[email protected]>
  parent: David G. Johnston <[email protected]>
  1 sibling, 1 reply; 22+ messages in thread

From: Ron Johnson @ 2025-08-26 22:08 UTC (permalink / raw)
  To: pgsql-general

On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <
[email protected]> wrote:

> On Tue, Aug 26, 2025 at 12:43 PM Dimitrios Apostolou <[email protected]>
> wrote:
>
>> Could the
>> row-order have changed when doing COPY FROM with pg_restore?
>
>
> There is no reliable, meaningful, row ordering when it comes to the
> physical files.  Sure, cluster does make an attempt, but it is quite
> limited in practice.
>
>
>> A *logical* dump of data shouldn't be affected by on-disk order.
>> Internal representation shouldn't affect the output.
>>
>
> The logical dump has no ordering - it will come out however it comes out.
> "COPY <table> TO ..." doesn't have an order by clause - there is no way to
> make or communicate to it that ordering is important.
>

Doesn't COPY TO copy out records in the order they appeared in the physical
files?  That _seems_ to mean that the records laid down by COPY FROM should
be in the same order as they were in the old dump files.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:12  Ron Johnson <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 22+ messages in thread

From: Ron Johnson @ 2025-08-26 22:12 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <[email protected]> wrote:

> Dimitrios Apostolou <[email protected]> writes:
> > Unfortunately after I did pg_restore to a new server, I notice that the
> > dumps from the new server are not being de-duplicated, all blocks are
> > considered new.
>
> > This means that the data has been significantly altered. The new dumps
> > contain the same rows but probably in very different order. Could the
> > row-order have changed when doing COPY FROM with pg_restore?
>
> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.
>
> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes.  Thus
> for example TIDs would not be the same.  If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.
>

But the rolling checksums are against a pg_dump file, not a pg_basebackup
file.

What probably changed are table OIDs.  Would that change the ordering of
COPY data in post-restore dump files?

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:17  Tom Lane <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Tom Lane @ 2025-08-26 22:17 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

Ron Johnson <[email protected]> writes:
> On Tue, Aug 26, 2025 at 4:31 PM David G. Johnston <
> [email protected]> wrote:
>> The logical dump has no ordering - it will come out however it comes out.
>> "COPY <table> TO ..." doesn't have an order by clause - there is no way to
>> make or communicate to it that ordering is important.

> Doesn't COPY TO copy out records in the order they appeared in the physical
> files?

It emits whatever a sequential-scan plan would emit.  If you set
synchronize_seqscans = off (which pg_dump does), that will match
physical row order.

At least with our standard table AM.  If you're using Aurora or
one of those other PG forks with proprietary storage layers,
you'd have to ask them.

I suspect the OP's problem is not row order per se, but differing
TIDs or XIDs, which are things pg_dump does not endeavor to
replicate.  Or, given that he said something about blocks, maybe
he's actually sensitive to where the free space is.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:23  Tom Lane <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Tom Lane @ 2025-08-26 22:23 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

Ron Johnson <[email protected]> writes:
> On Tue, Aug 26, 2025 at 6:08 PM Tom Lane <[email protected]> wrote:
>> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
>> simply because it doesn't do anything that would change that.

> But the rolling checksums are against a pg_dump file, not a pg_basebackup
> file.

Oh, that wasn't clear to me.

> What probably changed are table OIDs.  Would that change the ordering of
> COPY data in post-restore dump files?

It would not change the order of data within any one table.  There are
corner cases in which different OID assignments can cause pg_dump to
emit database objects in a different order, see this recent thread:

https://www.postgresql.org/message-id/flat/20250707192654.9e.nmisch%40google.com

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-26 22:54  Adrian Klaver <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 1 reply; 22+ messages in thread

From: Adrian Klaver @ 2025-08-26 22:54 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; [email protected]

On 8/26/25 12:43, Dimitrios Apostolou wrote:
> Hello list,
> 
> I am storing dumps of a database (pg_dump custom format) in a de- 
> duplicating backup server. Each dump is many terabytes in size, so 
> deduplication is very important. And de-duplication itself is based on 
> rolling checksums which is pretty flexible, it can compensate for blocks 
> moving by some offset.
> 
> Unfortunately after I did pg_restore to a new server, I notice that the
> dumps from the new server are not being de-duplicated, all blocks are
> considered new.

What are the pg_dump/pg_restore commands?

What are the Postgres versions involved?

Are they community versions of Postgres or something else?

What is the depduplication program?


> Thanks in advance,
> Dimitris
> 
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 12:09  Dimitrios Apostolou <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-27 12:09 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: [email protected]

On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:

>Date: Wed, 27 Aug 2025 00:54:52
>From: Adrian Klaver <[email protected]>
>To: Dimitrios Apostolou <[email protected]>, [email protected]
>Subject: Re: In-order pg_dump (or in-order COPY TO)
>
> On 8/26/25 12:43, Dimitrios Apostolou wrote:
>>  Hello list,
>>
>>  I am storing dumps of a database (pg_dump custom format) in a de-
>>  duplicating backup server. Each dump is many terabytes in size, so
>>  deduplication is very important. And de-duplication itself is based on
>>  rolling checksums which is pretty flexible, it can compensate for blocks
>>  moving by some offset.
>>
>>  Unfortunately after I did pg_restore to a new server, I notice that the
>>  dumps from the new server are not being de-duplicated, all blocks are
>>  considered new.
>

> What are the pg_dump/pg_restore commands?
>
> What are the Postgres versions involved?
>
> Are they community versions of Postgres or something else?
>
> What is the depduplication program?
>
>

Dump is from PostgreSQL 16, it's pg_dump writing to stdout:

pg_dump -v --format=custom --compress=none --no-toast-compression --serializable-deferrable db_name  |  borg create ...

As you can see the backup (and deduplicating) program is borgbackup.

Restore is in PostgreSQL 17:

I first create the empty tables by running the DDL commands in version 
control to setup the database. And then I do pg_restore --data-only:

pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --section=data  dump_file

Worth noting is that the above pg_restore goes through the WAL, i.e. all 
writes are done by walwriter, not the backend directly.

Postgres is standard open source running on own server. It has a couple 
of custom patches that shouldn't matter in this codepath.

>>  Thanks in advance,
>>  Dimitris

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 12:34  Dimitrios Apostolou <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 0 replies; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-27 12:34 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Wednesday 2025-08-27 00:08, Tom Lane wrote:

> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes.  Thus
> for example TIDs would not be the same.  If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.

Thanks Tom. I'm not following how the empty space in the tables could 
affect the custom format dumps. Where can I read more about these TIDs? 
Are they stored in the pg_dump custom format archive?

The rolling checksum method should iron out shifting of data chunks that 
are around a couple MB in size. Shorter shifts will not be caught, and 
I assume that the "page boundaries" changes you mentioned would happen 
every 8KB. So that is definitely too fine grained for the deduplicated 
algorithm.

FYI something that I forgot to mention is that pg_restore is --data-only 
and writes go through walwriter. The database with the tables has been 
created from scratch so every table is empty before the pg_restore. Not 
sure how this affects the above.

> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.

Regardless of my specific case, it's scary to think that doing VACUUM 
FULL, CLUSTER, or who knows what other maintenance command, will modify 
the logical dumps. Some implicit ordering could be enforced by pg_dump 
if possible, for example when a primary key exists. Does it make sense? 
Is it even possible?

Thanks,
Dimitris

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 12:40  Dimitrios Apostolou <[email protected]>
  parent: David G. Johnston <[email protected]>
  1 sibling, 0 replies; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-27 12:40 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: [email protected]

On Tuesday 2025-08-26 22:31, David G. Johnston wrote:

>The logical dump has no ordering - it will come out however it comes out.  "COPY <table> TO ..." doesn't have an order by clause - there is no way to make or communicate to it that ordering is important.  For adhoc work you can use "COPY <query> TO ..." and put and order by in the query.

Thank you, so it's not possible currently.

How would "COPY <query> TO" behave for copying very large tables?
Would it make sense to optionally have that in pg_dump?

Or would it make sense as a new feature, to optionally order "COPY 
<table> TO ..." based on primary key where available, and use that in 
pg_dump option?

Dimitris

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 12:52  Dimitrios Apostolou <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-27 12:52 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On Wednesday 2025-08-27 00:00, Ron Johnson wrote:
>
>This might be a silly question, but why are you using -Fc to create multi-TB dumps instead of -Fd?  If nothing else, the parallelization granted by -Fd will greatly speed up the dumps.

Hi Ron,

the primary reason is space. With -Fc I don't have to store the 10TB 
archive locally. And I don't have to wait for pg_dump to finish. I pipe 
it directly to the "borg create" command which does compression and 
deduplication and sends new chunks to a remote borgbackup server.

Regards,
Dimitris



P.S. please use "Reply-all" when replying, I've missed quite a few
      replies that I found on the list archives. :-)


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 14:06  Greg Sabino Mullane <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 0 replies; 22+ messages in thread

From: Greg Sabino Mullane @ 2025-08-27 14:06 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On Tue, Aug 26, 2025 at 3:44 PM Dimitrios Apostolou <[email protected]> wrote:

> I am storing dumps of a database (pg_dump custom format) in a
> de-duplicating backup server. Each dump is many terabytes in size, so
> deduplication is very important. And de-duplication itself is based on
> rolling checksums which is pretty flexible, it can compensate for blocks
> moving by some offset.
>

I suggest looking into pgBackRest, and it's block incremental feature,
which sounds similar to what you are doing. But it also does it with
parallel processes, and can do things like grab backup files from your
replicas, plus a lot of other features.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 15:25  Adrian Klaver <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Adrian Klaver @ 2025-08-27 15:25 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On 8/27/25 05:09, Dimitrios Apostolou wrote:
> On Wednesday 2025-08-27 00:54, Adrian Klaver wrote:
> 

>> What are the pg_dump/pg_restore commands?
>>
>> What are the Postgres versions involved?
>>
>> Are they community versions of Postgres or something else?
>>
>> What is the depduplication program?
>>
>>
> 
>

Comments in line below.

> Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
> 
> pg_dump -v --format=custom --compress=none --no-toast-compression -- 
> serializable-deferrable db_name  |  borg create ...
> 
> 
> As you can see the backup (and deduplicating) program is borgbackup.

Ok, I use BorgBackup and it is fairly forgiving of normal changes.

FYI, if you ever want to use compression check out gzip --rsyncable, I 
have found it plays well with Borg. For more  information see:

https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/


> 
> 
> Restore is in PostgreSQL 17:
> 
> I first create the empty tables by running the DDL commands in version 
> control to setup the database. And then I do pg_restore --data-only:
> 
> pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public -- 
> section=data  dump_file

If you are using only the --data section why not --data-only in the pg_dump?

Or is the pg_dump output used for other purposes?

> 
> 
> Worth noting is that the above pg_restore goes through the WAL, i.e. all 
> writes are done by walwriter, not the backend directly.

Please explain the above further.

The problem occurs when you do the pg_dump after this restore, correct?

Is it the same pg_dump command as you show above?

> 
> Postgres is standard open source running on own server. It has a couple 
> of custom patches that shouldn't matter in this codepath.

For completeness and just in case they may affect the output what do the 
patches do?


> 
> 
>>>  Thanks in advance,
>>>  Dimitris


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-27 16:10  Dimitrios Apostolou <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-27 16:10 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: [email protected]

On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:

> Comments in line below.
>
>>  Dump is from PostgreSQL 16, it's pg_dump writing to stdout:
>>
>>  pg_dump -v --format=custom --compress=none --no-toast-compression --
>>  serializable-deferrable db_name  |  borg create ...
>>
>>
>>  As you can see the backup (and deduplicating) program is borgbackup.
>
> Ok, I use BorgBackup and it is fairly forgiving of normal changes.
>
> FYI, if you ever want to use compression check out gzip --rsyncable, I have 
> found it plays well with Borg. For more  information see:
>
> https://beeznest.wordpress.com/2005/02/03/rsyncable-gzip/

Yes, zstd has also --rsyncable. In this case I let borg do per-chunk 
compression after deduplication, it has worked well so far.

>>  Restore is in PostgreSQL 17:
>>
>>  I first create the empty tables by running the DDL commands in version
>>  control to setup the database. And then I do pg_restore --data-only:
>>
>>  pg_restore -vvvv -j 8 -U db_owner -d db_name --schema=public --
>>  section=data  dump_file
>
> If you are using only the --data section why not --data-only in the pg_dump?

I want the dump to be as complete as possible. Didn't think it would 
create issues.

>
> Or is the pg_dump output used for other purposes?

It has happened that I have selectively restored user schemas from that 
dump.

>>  Worth noting is that the above pg_restore goes through the WAL, i.e. all
>>  writes are done by walwriter, not the backend directly.
>
> Please explain the above further.

The COPY FROM data is going through the WAL, as usual INSERTS do. The 
writes to disk happen by the walwriter process.

OTOH, If you have configured the server with wal_level=minimal and 
BEGIN a transaction, CREATE or TRUNCATE a table, and then COPY FROM into 
that table, then the backend process writes directly to the table 
without logging to the WAL.

This can be much faster, but most importantly it avoids situations of 
WAL overflow that are very difficult to predict and can mess your server 
up completely. [1]

[1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net

My patches are for activating that codepath in pg_restore, but they were 
not used on purpose and I took notice that the writes went via WAL.

>
> The problem occurs when you do the pg_dump after this restore, correct?

Correct. The first pg_dump from the restored pg17 is not deduplicated at 
all. Most of the tables have not changed (logically at least; apparently 
they have changed physically).

>
> Is it the same pg_dump command as you show above?

Yes.

>
>>
>>  Postgres is standard open source running on own server. It has a couple of
>>  custom patches that shouldn't matter in this codepath.
>
> For completeness and just in case they may affect the output what do the 
> patches do?

Two patches for speeding up scanning an archive without TOC, like the 
one I'm having (because it is piped into borg, instead of written to 
file). These were activated, but shouldn't matter. They only build the 
TOC in pg_restore's memory.

https://commitfest.postgresql.org/patch/5809/
https://commitfest.postgresql.org/patch/5817/

And two patches for speeding up pg_restore like mentioned above, under 
specific arguments that I didn't provide. (one speedup needs --clean, 
and the other needs --freeze).

https://commitfest.postgresql.org/patch/5821/
https://commitfest.postgresql.org/patch/5826/

IIRC I did not activate them (via --clean) because TRUNCATE fails when 
foreign keys exist. See the discussion threads.

Dimitris

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-30 18:19  Adrian Klaver <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Adrian Klaver @ 2025-08-30 18:19 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On 8/27/25 09:10, Dimitrios Apostolou wrote:
> 
> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:


>>
>> For completeness and just in case they may affect the output what do 
>> the patches do?
> 
> Two patches for speeding up scanning an archive without TOC, like the 
> one I'm having (because it is piped into borg, instead of written to 
> file). These were activated, but shouldn't matter. They only build the 
> TOC in pg_restore's memory.

Are you sure about that?

I just did:

pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres | 
borg create --stats --stdin-name pg_file  --stdin-user aklaver 
--stdin-group aklaver borg_test/::PgTest -

Then:

borg mount borg_test/ mnt_tmp/
cd mnt_tmp/PgTest/

and then:

pg_restore -l pg_file

and I got a TOC.

Or are you streaming the data out of the Borg archive?

> 
> https://commitfest.postgresql.org/patch/5809/
> https://commitfest.postgresql.org/patch/5817/
> 
> And two patches for speeding up pg_restore like mentioned above, under 
> specific arguments that I didn't provide. (one speedup needs --clean, 
> and the other needs --freeze).
> 
> https://commitfest.postgresql.org/patch/5821/
> https://commitfest.postgresql.org/patch/5826/
> 
> IIRC I did not activate them (via --clean) because TRUNCATE fails when 
> foreign keys exist. See the discussion threads.
> 
> 
> Dimitris


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-31 01:21  Dimitrios Apostolou <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-31 01:21 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.

This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.

Thank you for testing.
Dimitris 

On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:
>On 8/27/25 09:10, Dimitrios Apostolou wrote:
>> 
>> On Wednesday 2025-08-27 17:25, Adrian Klaver wrote:
>
>
>>> 
>>> For completeness and just in case they may affect the output what do the patches do?
>> 
>> Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg, instead of written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory.
>
>Are you sure about that?
>
>I just did:
>
>pg_dump -Fc --compress=none --no-toast-compression -d test -U postgres | borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest -
>
>Then:
>
>borg mount borg_test/ mnt_tmp/
>cd mnt_tmp/PgTest/
>
>and then:
>
>pg_restore -l pg_file
>
>and I got a TOC.
>
>Or are you streaming the data out of the Borg archive?
>
>> 
>> https://commitfest.postgresql.org/patch/5809/
>> https://commitfest.postgresql.org/patch/5817/
>> 
>> And two patches for speeding up pg_restore like mentioned above, under specific arguments that I didn't provide. (one speedup needs --clean, and the other needs --freeze).
>> 
>> https://commitfest.postgresql.org/patch/5821/
>> https://commitfest.postgresql.org/patch/5826/
>> 
>> IIRC I did not activate them (via --clean) because TRUNCATE fails when foreign keys exist. See the discussion threads.
>> 
>> 
>> Dimitris
>
>






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-31 15:41  Adrian Klaver <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Adrian Klaver @ 2025-08-31 15:41 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On 8/30/25 18:21, Dimitrios Apostolou wrote:
> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.
> 
> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.

It may be that my coffee is not strong enough, but I don't understand 
what you are trying to say.

Are you using, from previous post, the following?:

"Two patches for speeding up scanning an archive without TOC, like the 
one I'm having (because it is piped into borg, instead of written to 
file). These were activated, but shouldn't matter. They only build the 
TOC in pg_restore's memory. "

The part I don't see is how you get a dump file without a TOC?

When I do the pg_dump and pipe it to Borg the resulting file has a TOC.

Can you show the rest of the  |  borg create ...  command?

> 
> Thank you for testing.
> Dimitris
> 
> On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:

-- 
Adrian Klaver
[email protected]

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-08-31 17:52  Dimitrios Apostolou <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 22+ messages in thread

From: Dimitrios Apostolou @ 2025-08-31 17:52 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

When I first said "dump file without TOC" I actually meant "without offsets in the TOC".

The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.

All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with borg. ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has already been written to stdout in the beginning.

Dimitris

On 31 August 2025 17:41:34 CEST, Adrian Klaver <[email protected]> wrote:
>On 8/30/25 18:21, Dimitrios Apostolou wrote:
>> Sorry I was not remembering the details. Probably there is a TOC in your dump file, but it does not contain any positions for the data. The pg_restore command has to scan the whole file in advance, and fill in the TOC offsets in memory.
>> 
>> This scanning happens in a very inefficient way, with many seek calls and small block reads. Try strace to see them. This initial phase can take hours in a huge dump file, before even starting any actual restoration.
>
>It may be that my coffee is not strong enough, but I don't understand what you are trying to say.
>
>Are you using, from previous post, the following?:
>
>"Two patches for speeding up scanning an archive without TOC, like the one I'm having (because it is piped into borg, instead of written to file). These were activated, but shouldn't matter. They only build the TOC in pg_restore's memory. "
>
>The part I don't see is how you get a dump file without a TOC?
>
>When I do the pg_dump and pipe it to Borg the resulting file has a TOC.
>
>Can you show the rest of the  |  borg create ...  command?
>
>> 
>> Thank you for testing.
>> Dimitris
>> 
>> On 30 August 2025 20:19:13 CEST, Adrian Klaver <[email protected]> wrote:
>
>
>






^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-09-01 17:24  Adrian Klaver <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  0 siblings, 0 replies; 22+ messages in thread

From: Adrian Klaver @ 2025-09-01 17:24 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On 8/31/25 10:52, Dimitrios Apostolou wrote:
> When I first said "dump file without TOC" I actually meant "without offsets in the TOC".
> 
> The fact that you get a TOC printed does not prove that the dump file includes a TOC with offsets.

I did some digging in the code and see that the TOC is more then that, 
it stores a range of data. Still have not part where the offsets are 
ignored for writes to stdout, but will keep on digging.

Getting back to your OP:

"Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new."

I ran a test on a much smaller database and I do not see the above. The 
commands and the Borg info for the archive are in attached file.
> 
> All pg_dump -Fc commands that write to stdout, produce a file without offsets in the TOC. It has nothing to do with borg. ToC offsets must be filled in right before streaming each table, but this is impossible when the whole TOC has already been written to stdout in the beginning.
> 
> Dimitris



-- 
Adrian Klaver
[email protected]
--Dump from Postgres 17 instance and pipe to Borg create.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres  |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -

--Dump from 17 instance and then restore to 18b3 instance
pg_dump -Fc  -d pp_archive -U postgres -f pp_archive_17.out
pg_restore -c -C -d postgres -U postgres -p 5434 pp_archive_17.out

--Initial Postgres 18 dump after restore from Postgres 17 dump
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -

--Second Postgres 18 dump.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -


--Update single field across 644129 rows of a table.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -


borg info --last 4 borg_test/

Archive name: PgTest_17
Archive fingerprint: 14163251a84f93e992b8304bbeac15056daa45ce3c8c5f76655af9ee1ef641de
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:07:20
Time (end): Mon, 2025-09-01 10:07:23
Duration: 2.80 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              6.92 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_1
Archive fingerprint: 5fd88a99b5765d0af71d450b917157fa1d682691db33306e550ff7d0597fb05f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:15
Time (end): Mon, 2025-09-01 10:08:17
Duration: 2.24 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              1.58 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_2
Archive fingerprint: fcd113299afd6673c10e44dcd621722a46e35871d7171b24a3d7d61e3f48d1da
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:43
Time (end): Mon, 2025-09-01 10:08:45
Duration: 2.17 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              1.58 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_3
Archive fingerprint: dbc0ddcfdad566d38ab3e210e376490e3e5cd1510c9cd45861b9ab550a04913f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:09:33
Time (end): Mon, 2025-09-01 10:09:36
Duration: 2.65 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.09 MB             34.40 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324



Attachments:

  [text/plain] pg_borg_test.txt (4.6K, 2-pg_borg_test.txt)
  download | inline:
--Dump from Postgres 17 instance and pipe to Borg create.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres  |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -

--Dump from 17 instance and then restore to 18b3 instance
pg_dump -Fc  -d pp_archive -U postgres -f pp_archive_17.out
pg_restore -c -C -d postgres -U postgres -p 5434 pp_archive_17.out

--Initial Postgres 18 dump after restore from Postgres 17 dump
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -

--Second Postgres 18 dump.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -


--Update single field across 644129 rows of a table.
pg_dump -Fc --compress=none --no-toast-compression -d pp_archive -U postgres -p 5434 |
borg create --stats --stdin-name pg_file  --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -


borg info --last 4 borg_test/

Archive name: PgTest_17
Archive fingerprint: 14163251a84f93e992b8304bbeac15056daa45ce3c8c5f76655af9ee1ef641de
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:07:20
Time (end): Mon, 2025-09-01 10:07:23
Duration: 2.80 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_17 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              6.92 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_1
Archive fingerprint: 5fd88a99b5765d0af71d450b917157fa1d682691db33306e550ff7d0597fb05f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:15
Time (end): Mon, 2025-09-01 10:08:17
Duration: 2.24 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_1 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              1.58 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_2
Archive fingerprint: fcd113299afd6673c10e44dcd621722a46e35871d7171b24a3d7d61e3f48d1da
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:08:43
Time (end): Mon, 2025-09-01 10:08:45
Duration: 2.17 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_2 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.00 MB              1.58 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324

Archive name: PgTest_18_3
Archive fingerprint: dbc0ddcfdad566d38ab3e210e376490e3e5cd1510c9cd45861b9ab550a04913f
Comment:
Hostname: mauradog
Username: aklaver
Time (start): Mon, 2025-09-01 10:09:33
Time (end): Mon, 2025-09-01 10:09:36
Duration: 2.65 seconds
Number of files: 1
Command line: borg create --stats --stdin-name pg_file --stdin-user aklaver --stdin-group aklaver borg_test/::PgTest_18_3 -
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              208.87 MB             42.09 MB             34.40 MB
All archives:              835.47 MB            168.09 MB             84.90 MB

                       Unique chunks         Total chunks
Chunk index:                     162                  324


^ permalink  raw  reply  [nested|flat] 22+ messages in thread

* Re: In-order pg_dump (or in-order COPY TO)
@ 2025-09-04 12:02  Álvaro Herrera <[email protected]>
  parent: Dimitrios Apostolou <[email protected]>
  5 siblings, 0 replies; 22+ messages in thread

From: Álvaro Herrera @ 2025-09-04 12:02 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

On 2025-Aug-26, Dimitrios Apostolou wrote:

> I am storing dumps of a database (pg_dump custom format) in a de-duplicating
> backup server. Each dump is many terabytes in size, so deduplication is very
> important. And de-duplication itself is based on rolling checksums which is
> pretty flexible, it can compensate for blocks moving by some offset.

Hello,

It's generally considered nowadays that pg_dump is not the best option
to create backups of very large databases.  You may be better served by
using a binary backup tool -- something like Barman.  With current
Postgres releases you can create incremental backups, which would
probably be more effective at deduplicating than playing with pg_dump's
TOC, because it's based on what actually happens to the data.  Barman
provides support for hook scripts, which perhaps can be used to transfer
the backup files to Borg.  (I haven't actually tried to do this, but the
Barman developers talk about using them to transfer the backups to tape,
so I imagine getting them to play with Borg it's a Simple Matter of
Programming.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)

^ permalink  raw  reply  [nested|flat] 22+ messages in thread

end of thread, other threads:[~2025-09-04 12:02 UTC | newest]

Thread overview: 22+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-26 19:43 In-order pg_dump (or in-order COPY TO) Dimitrios Apostolou <[email protected]>
2025-08-26 20:31 ` David G. Johnston <[email protected]>
2025-08-26 22:08   ` Ron Johnson <[email protected]>
2025-08-26 22:17     ` Tom Lane <[email protected]>
2025-08-27 12:40   ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:00 ` Ron Johnson <[email protected]>
2025-08-27 12:52   ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:08 ` Tom Lane <[email protected]>
2025-08-26 22:12   ` Ron Johnson <[email protected]>
2025-08-26 22:23     ` Tom Lane <[email protected]>
2025-08-27 12:34   ` Dimitrios Apostolou <[email protected]>
2025-08-26 22:54 ` Adrian Klaver <[email protected]>
2025-08-27 12:09   ` Dimitrios Apostolou <[email protected]>
2025-08-27 15:25     ` Adrian Klaver <[email protected]>
2025-08-27 16:10       ` Dimitrios Apostolou <[email protected]>
2025-08-30 18:19         ` Adrian Klaver <[email protected]>
2025-08-31 01:21           ` Dimitrios Apostolou <[email protected]>
2025-08-31 15:41             ` Adrian Klaver <[email protected]>
2025-08-31 17:52               ` Dimitrios Apostolou <[email protected]>
2025-09-01 17:24                 ` Adrian Klaver <[email protected]>
2025-08-27 14:06 ` Greg Sabino Mullane <[email protected]>
2025-09-04 12:02 ` Álvaro Herrera <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox