public inbox for [email protected]  
help / color / mirror / Atom feed
From: Dilip Kumar <[email protected]>
To: Hannu Krosing <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Nathan Bossart <[email protected]>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: Tue, 25 Nov 2025 10:20:01 +0530
Message-ID: <CAFiTN-scTeRAH0q2Ga3CLgkbcfcTi31cSw73ZVZntDQG7-fE+g@mail.gmail.com> (raw)
In-Reply-To: <CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com>
References: <CAMT0RQT_0qVxcTT6ycM20QUN-pEQ6iMLbz6gLWgLpeF0NmNOUA@mail.gmail.com>
	<CAFiTN-tV4jWKN75E5YLB-jSqb8j0E1PctiDjztv=ccfbe3YPmg@mail.gmail.com>
	<CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com>

On Tue, Nov 25, 2025 at 2:32 AM Hannu Krosing <[email protected]> wrote:
>
> The expectation was that as chunking is useful mainly in case of
> really huge tables the analyze should have been run "recently enough".
>
> Maybe we should use pg_relation_size() in case we have already
> determined that the table is large enough to warrant chunking? Maybe
> at least 1/2 of the requested chunk size?
>
> My reasoning was to not put too much extra load on pg_dump in case
> chunking is not required. But of course we can use the presence of a
> chunking request to decide to run pg_relation_size(), assuming the
> overhead won't be too large in this case.
>
>
> On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <[email protected]> wrote:
> >
> > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <[email protected]> wrote:
> > >
> > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > >
> > > Looking for feedback at this point:
> > >  1) what have I missed
> > >  2) should I implement something to avoid single-page chunks
> > >
> > > The flag --huge-table-chunk-pages which tells the directory format
> > > dump to dump tables where the main fork has more pages than this in
> > > multiple chunks of given number of pages,
> > >
> > > The main use case is speeding up parallel dumps in case of one or a
> > > small number of HUGE tables so parts of these can be dumped in
> > > parallel.
> > >
> >
> > +1 for the idea, I haven't done the detailed review but I was just
> > going through the patch, I noticed that we use pg_class->relpages to
> > identify whether to chunk the table or not, which should be fine but
> > don't you think if we use direct size calculation function like
> > pg_relation_size() we might get better idea and not dependent upon
> > whether the stats are updated or not?  This will make chunking
> > behavior more deterministic.

Yeah that makes sense, we can use relpages for initial identification
and then use pg_relation_size() if relpages says the table is large
enough.

-- 
Regards,
Dilip Kumar
Google





view thread (16+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
  In-Reply-To: <CAFiTN-scTeRAH0q2Ga3CLgkbcfcTi31cSw73ZVZntDQG7-fE+g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox