public inbox for [email protected]  
help / color / mirror / Atom feed
From: Dimitrios Apostolou <[email protected]>
To: [email protected]
Subject: In-order pg_dump (or in-order COPY TO)
Date: Tue, 26 Aug 2025 21:43:44 +0200 (CEST)
Message-ID: <[email protected]> (raw)

Hello list,

I am storing dumps of a database (pg_dump custom format) in a 
de-duplicating backup server. Each dump is many terabytes in size, so 
deduplication is very important. And de-duplication itself is based on 
rolling checksums which is pretty flexible, it can compensate for blocks 
moving by some offset.

Unfortunately after I did pg_restore to a new server, I notice that the
dumps from the new server are not being de-duplicated, all blocks are
considered new.

This means that the data has been significantly altered. The new dumps 
contain the same rows but probably in very different order. Could the 
row-order have changed when doing COPY FROM with pg_restore? No idea, 
but now that I think about it this can happen by many operations, like 
CLUSTER, VACUUM FULL etc so the question still applies.

A *logical* dump of data shouldn't be affected by on-disk order. 
Internal representation shouldn't affect the output.

This makes me wonder: Is there a way to COPY TO in primary-key order?

If that is possible, then pg_dump could make use of it.


Thanks in advance,
Dimitris






view thread (22+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: In-order pg_dump (or in-order COPY TO)
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox