public inbox for [email protected]  
help / color / mirror / Atom feed
From: Dimitrios Apostolou <[email protected]>
To: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: In-order pg_dump (or in-order COPY TO)
Date: Wed, 27 Aug 2025 14:34:31 +0200 (CEST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On Wednesday 2025-08-27 00:08, Tom Lane wrote:

> However, restoring into an empty table would result in a table with
> minimal free space, whereas the original table probably had a
> meaningful amount of free space thanks to updates and deletes.  Thus
> for example TIDs would not be the same.  If your "rolling checksum"
> methodology is at all sensitive to page boundaries, the table would
> look quite different to it.

Thanks Tom. I'm not following how the empty space in the tables could 
affect the custom format dumps. Where can I read more about these TIDs? 
Are they stored in the pg_dump custom format archive?

The rolling checksum method should iron out shifting of data chunks that 
are around a couple MB in size. Shorter shifts will not be caught, and 
I assume that the "page boundaries" changes you mentioned would happen 
every 8KB. So that is definitely too fine grained for the deduplicated 
algorithm.

FYI something that I forgot to mention is that pg_restore is --data-only 
and writes go through walwriter. The database with the tables has been 
created from scratch so every table is empty before the pg_restore. Not 
sure how this affects the above.

> I'd expect pg_dump/pg_restore to preserve the physical row ordering,
> simply because it doesn't do anything that would change that.

Regardless of my specific case, it's scary to think that doing VACUUM 
FULL, CLUSTER, or who knows what other maintenance command, will modify 
the logical dumps. Some implicit ordering could be enforced by pg_dump 
if possible, for example when a primary key exists. Does it make sense? 
Is it even possible?


Thanks,
Dimitris







view thread (22+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: In-order pg_dump (or in-order COPY TO)
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox