MIME-Version: 1.0
References: 
 <CAMT0RQT_0qVxcTT6ycM20QUN-pEQ6iMLbz6gLWgLpeF0NmNOUA@mail.gmail.com>
 <CAExHW5t54GPKFbW3KLzintJ6jMMRYwb-t2Fjm4JTxEcZbGDomA@mail.gmail.com>
 <CAMT0RQTHoL8S7OonFWC_aDSC-2oX7BGBBLAQ+OOBhRPcxV2eiw@mail.gmail.com>
 <CAMT0RQQAH1a8kY-mx7B07Uzn3T_zeaU9detqFFtW36_k67Su+A@mail.gmail.com>
 <CAMT0RQQr7KtPAY903+F42csiHc1EPHo70Xji-znkxEhwdoKa6w@mail.gmail.com>
 <CAMT0RQSNHFffbCmDNxQogVBD8H5gTDJNwhUR2btCVE+Lq1sGGw@mail.gmail.com>
In-Reply-To: 
 <CAMT0RQSNHFffbCmDNxQogVBD8H5gTDJNwhUR2btCVE+Lq1sGGw@mail.gmail.com>
From: Hannu Krosing <hannuk@google.com>
Date: Thu, 13 Nov 2025 21:34:23 +0100
Message-ID: 
 <CAMT0RQTEFGctCfgVx3u2XgVRCAj_QURV2tfdzL0HOQi=u0sV2A@mail.gmail.com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
To: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
	Nathan Bossart <nathandbossart@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAMT0RQTEFGctCfgVx3u2XgVRCAj_QURV2tfdzL0HOQi%3Du0sV2A%40mail.gmail.com>
Precedence: bulk

Added to https://commitfest.postgresql.org/patch/6219/

On Thu, Nov 13, 2025 at 9:26=E2=80=AFPM Hannu Krosing <hannuk@google.com> w=
rote:
>
> The reason for small chunk sizes is that they are determined by main
> heap table, and that was just over 1GB
>
> largetoastdb=3D> SELECT format('%I.%I', t.schemaname, t.relname) as table=
_name,
>        pg_table_size(t.relid) AS table_size,
>        sum(pg_relation_size(i.indexrelid)) AS total_index_size,
>        pg_relation_size(t.relid) AS main_table_size,
>        pg_relation_size(c.reltoastrelid) AS toast_table_size,
>        pg_relation_size(oi.indexrelid) AS toast_index_size,
>        t.n_live_tup AS row_count,
>        count(*) AS index_count,
>        array_to_json(array_agg(json_build_object(i.indexrelid::regclass,
> pg_relation_size(i.indexrelid))), true) AS index_info
>   FROM pg_stat_user_tables t
>   JOIN pg_stat_user_indexes i ON i.relid =3D t.relid
>   JOIN pg_class c ON c.oid =3D t.relid
>   LEFT JOIN pg_stat_sys_indexes AS oi ON oi.relid =3D c.reltoastrelid
>  GROUP BY 1, 2, 4, 5, 6, 7
>  ORDER BY 2 DESC, 7 DESC
>  LIMIT 25;
> =E2=94=8C=E2=94=80[ RECORD 1 ]=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=AC=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=90
> =E2=94=82 table_name       =E2=94=82 public.just_toasted                 =
=E2=94=82
> =E2=94=82 table_size       =E2=94=82 56718835712                         =
=E2=94=82
> =E2=94=82 total_index_size =E2=94=82 230064128                           =
=E2=94=82
> =E2=94=82 main_table_size  =E2=94=82 1191559168                          =
=E2=94=82
> =E2=94=82 toast_table_size =E2=94=82 54613336064                         =
=E2=94=82
> =E2=94=82 toast_index_size =E2=94=82 898465792                           =
=E2=94=82
> =E2=94=82 row_count        =E2=94=82 5625234                             =
=E2=94=82
> =E2=94=82 index_count      =E2=94=82 1                                   =
=E2=94=82
> =E2=94=82 index_info       =E2=94=82 [{"just_toasted_pkey" : 230064128}] =
=E2=94=82
> =E2=94=94=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=B4=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=98
>
> On Thu, Nov 13, 2025 at 9:24=E2=80=AFPM Hannu Krosing <hannuk@google.com>=
 wrote:
> >
> > Ran another test with a 53GB database where most of the data is in TOAS=
T
> >
> > CREATE TABLE just_toasted(
> >   id serial primary key,
> >   toasted1 char(2200) STORAGE EXTERNAL,
> >   toasted2 char(2200) STORAGE EXTERNAL,
> >   toasted3 char(2200) STORAGE EXTERNAL,
> >   toasted4 char(2200) STORAGE EXTERNAL
> > );
> >
> > and the toast fields were added in somewhat randomised order.
> >
> > Here the results are as follows
> >
> > Parallelism   |   chunk size (pages)   |    time (sec)
> >  1        |    -         |     240
> >  2        |  1000    |     129
> >  4        |  1000    |      64
> >  8        |  1000    |      36
> > 16       |  1000    |      30
> >
> >  4        |  9095    |      78
> >  8        |  9095    |      42
> > 16       |  9095    |      42
> >
> > The reason larger chunk sizes performed worse was that they often had
> > one or two stragglers left behind which
> >
> > Detailed run results below:
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres -f
> > /tmp/ltoastdb-1-plain.dump largetoastdb
> > real    3m59.465s
> > user    3m43.304s
> > sys     0m15.844s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D9095 -j 4 -f /tmp/ltoastdb-4.dump
> > largetoastdb
> > real    1m18.320s
> > user    3m49.236s
> > sys     0m19.422s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D9095 -j 8 -f /tmp/ltoastdb-8.dump
> > largetoastdb
> > real    0m42.028s
> > user    3m55.299s
> > sys     0m24.657s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D9095 -j 16 -f /tmp/ltoastdb-16.dump
> > largetoastdb
> > real    0m42.575s
> > user    4m11.011s
> > sys     0m26.110s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
> > largetoastdb
> > real    0m29.641s
> > user    6m16.321s
> > sys     0m49.345s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
> > largetoastdb
> > real    0m35.685s
> > user    3m58.528s
> > sys     0m26.729s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
> > largetoastdb
> > real    1m3.737s
> > user    3m50.251s
> > sys     0m18.507s
> >
> > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=3D1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
> > largetoastdb
> > real    2m8.708s
> > user    3m57.018s
> > sys     0m18.499s
> >
> > On Thu, Nov 13, 2025 at 7:39=E2=80=AFPM Hannu Krosing <hannuk@google.co=
m> wrote:
> > >
> > > Going up to 16 workers did not improve performance , but this is
> > > expected, as the disk behind the database can only do 4TB/hour of
> > > reads, which is now the bottleneck. (408/352/*3600 =3D 4172 GB/h)
> > >
> > > $ time ./pg_dump --format=3Ddirectory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=3D131072 -j 16 -f /tmp/parallel16.dump large=
db
> > > real    5m44.900s
> > > user    53m50.491s
> > > sys     5m47.602s
> > >
> > > And 4 workers showed near-linear speedup from single worker
> > >
> > > hannuk@pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > > --format=3Ddirectory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=3D131072 -j 4 -f /tmp/parallel4.dump largedb
> > > real    10m32.074s
> > > user    38m54.436s
> > > sys     2m58.216s
> > >
> > > The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> > > will be read in from the disk
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 13, 2025 at 7:02=E2=80=AFPM Hannu Krosing <hannuk@google.=
com> wrote:
> > > >
> > > > I just ran a test by generating a 408GB table and then dumping it b=
oth ways
> > > >
> > > > $ time pg_dump --format=3Ddirectory -h 10.58.80.2 -U postgres -f
> > > > /tmp/plain.dump largedb
> > > >
> > > > real    39m54.968s
> > > > user    37m21.557s
> > > > sys     2m32.422s
> > > >
> > > > $ time ./pg_dump --format=3Ddirectory -h 10.58.80.2 -U postgres
> > > > --huge-table-chunk-pages=3D131072 -j 8 -f /tmp/parallel8.dump large=
db
> > > >
> > > > real    5m52.965s
> > > > user    40m27.284s
> > > > sys     3m53.339s
> > > >
> > > > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > > > almost 7 times faster than the sequential dump.
> > > >
> > > > this was a table that had no TOAST part. I will run some more tests
> > > > with TOASTed tables next and expect similar or better improvements.
> > > >
> > > >
> > > >
> > > > On Wed, Nov 12, 2025 at 1:59=E2=80=AFPM Ashutosh Bapat
> > > > <ashutosh.bapat.oss@gmail.com> wrote:
> > > > >
> > > > > Hi Hannu,
> > > > >
> > > > > On Tue, Nov 11, 2025 at 9:00=E2=80=AFPM Hannu Krosing <hannuk@goo=
gle.com> wrote:
> > > > > >
> > > > > > Attached is a patch that adds the ability to dump table data in=
 multiple chunks.
> > > > > >
> > > > > > Looking for feedback at this point:
> > > > > >  1) what have I missed
> > > > > >  2) should I implement something to avoid single-page chunks
> > > > > >
> > > > > > The flag --huge-table-chunk-pages which tells the directory for=
mat
> > > > > > dump to dump tables where the main fork has more pages than thi=
s in
> > > > > > multiple chunks of given number of pages,
> > > > > >
> > > > > > The main use case is speeding up parallel dumps in case of one =
or a
> > > > > > small number of HUGE tables so parts of these can be dumped in
> > > > > > parallel.
> > > > >
> > > > > Have you measured speed up? Can you please share the numbers?
> > > > >
> > > > > --
> > > > > Best Wishes,
> > > > > Ashutosh Bapat