public inbox for [email protected]  
help / color / mirror / Atom feed
From: Hannu Krosing <[email protected]>
To: Michael Banck <[email protected]>
Cc: David Rowley <[email protected]>
Cc: Ashutosh Bapat <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Nathan Bossart <[email protected]>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: Sat, 28 Mar 2026 16:32:23 +0100
Message-ID: <CAMT0RQRtLwi_CrOcD7KxYL0Gm1nGXb-HWmerVg=ajEs6JP7m+w@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAMT0RQT_0qVxcTT6ycM20QUN-pEQ6iMLbz6gLWgLpeF0NmNOUA@mail.gmail.com>
	<CAExHW5t54GPKFbW3KLzintJ6jMMRYwb-t2Fjm4JTxEcZbGDomA@mail.gmail.com>
	<CAMT0RQTHoL8S7OonFWC_aDSC-2oX7BGBBLAQ+OOBhRPcxV2eiw@mail.gmail.com>
	<CAMT0RQQAH1a8kY-mx7B07Uzn3T_zeaU9detqFFtW36_k67Su+A@mail.gmail.com>
	<CAMT0RQQr7KtPAY903+F42csiHc1EPHo70Xji-znkxEhwdoKa6w@mail.gmail.com>
	<CAMT0RQSNHFffbCmDNxQogVBD8H5gTDJNwhUR2btCVE+Lq1sGGw@mail.gmail.com>
	<CAMT0RQTEFGctCfgVx3u2XgVRCAj_QURV2tfdzL0HOQi=u0sV2A@mail.gmail.com>
	<CAApHDvr8ay+31Wd0TptDGp8cAg2-NOnWddx8csnUE3R03EbvZw@mail.gmail.com>
	<[email protected]>

The issue is that currently the value is given in "main table pages"
and it would be somewhat deceptive, or at least confusing, to try to
express this in any other unit.

As I explained in the commit message:

---------8<-------------------8<-------------------8<----------------
This --max-table-segment-pages number specifically applies to main table
pages which does not guarantee anything about output size.
The output could be empty if there are no live tuples in the page range.
Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
---------8<-------------------8<-------------------8<----------------

And I can think of no cheap and reliable way to change that equation.

I'll be very happy if you have any good ideas for either improving the
flag name, or even propose a way to better estimate the resulting dump
file size so we could give the chunk size in better units

---
Hannu





On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > Perhaps --max-table-segment-pages is a better name than
> > --huge-table-chunk-pages as it's quite subjective what the minimum
> > number of pages required to make a table "huge".
>
> I'm not sure that's better - without looking at the documentation,
> people might confuse segment here with the 1GB split of tables into
> segments. As pg_dump is a very common and basic user tool, I don't think
> implementation details like pages/page sizes and blocks should be part
> of its UX.
>
> Can't we just make it a storage size, like '10GB' and then rename it to
> --table-parallel-threshold or something? I agree it's bikeshedding, but
> I personally don't like either --max-table-segment-pages or
> --huge-table-chunk-pages.
>
>
> Michael





view thread (34+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
  In-Reply-To: <CAMT0RQRtLwi_CrOcD7KxYL0Gm1nGXb-HWmerVg=ajEs6JP7m+w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox