public inbox for [email protected]
help / color / mirror / Atom feedFrom: Adrian Klaver <[email protected]>
To: R Wahyudi <[email protected]>
Cc: Ron Johnson <[email protected]>
Cc: pgsql-generallists.postgresql.org <[email protected]>
Subject: Re: pg_restore scan
Date: Thu, 18 Sep 2025 14:45:17 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALWQLzRhq14t_skYeJKmFV=1pp8dCjme=zHhDoMZkSJZAxDS5w@mail.gmail.com>
References: <CALWQLzRmzT7bo0c6CUX9=L_oLD3oUN8fZ5yyGLEwe7y5rWoxmQ@mail.gmail.com>
<[email protected]>
<CALWQLzTWUnA09cAfsuzuv5Kmb+S8gcC9uBypRYW0UtsQgMyPJg@mail.gmail.com>
<CANzqJaCQo6A+QLqJGeXGasoK3aSFL5Ehf5x-Gt0T9DOaRFoxKw@mail.gmail.com>
<CALWQLzSGPUeVDDp15inXiAFJe_rS_JObr_17Qq6Ns0s-p0YSvQ@mail.gmail.com>
<CANzqJaAkTPeiuadRfZ7S4L2N7H7ayjW7bHqsfZ5wRDDvAmu89w@mail.gmail.com>
<CALWQLzRVnfD+jfO1bjgr=tfbgzY36VdwTWdWGzgTC6bOjCN8Ow@mail.gmail.com>
<[email protected]>
<CALWQLzRhq14t_skYeJKmFV=1pp8dCjme=zHhDoMZkSJZAxDS5w@mail.gmail.com>
On 9/18/25 2:36 PM, R Wahyudi wrote:
> I've been given a database dump file daily and I've been asked to
> restore it.
> I tried everything I could to speed up the process, including using -j 40.
>
> I discovered that at the later stage of the restore process, the
> following behaviour repeated a few times :
> 40 x pg_restore process doing 100% CPU
> 40 x postgres process doing COPY but using 0% CPU
> ..... and zero disk write activity
>
> I don't see this behaviour when restoring the database that was dumped
> with -Fd.
> Also with an un-piped backup file, I can restore a specific table
> without having to wait for hours.
From the docs:
https://www.postgresql.org/docs/current/app-pgrestore.html
"
-j number-of-jobs
Only the custom and directory archive formats are supported with this
option. The input must be a regular file or directory (not, for example,
a pipe or standard input). Also, multiple jobs cannot be used together
with the option --single-transaction.
"
>
>
> --
>
>
>
>
>
> On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 9/18/25 05:58, R Wahyudi wrote:
> > Hi All,
> >
> > Thanks for the quick and accurate response! I never been so happy
> > seeing IOwait on my system!
>
> Because?
>
> What did you find?
>
> >
> > I might be blind as I can't find information about 'offset' in
> pg_dump
> > documentation.
> > Where can I find more info about this?
>
> It is not in the user documentation.
>
> From the thread Ron referred to, there is an explanation here:
>
> https://www.postgresql.org/message-
> id/366773.1756749256%40sss.pgh.pa.us <https://www.postgresql.org/
> message-id/366773.1756749256%40sss.pgh.pa.us>
>
> I believe the actual code, for the -Fc format, is in pg_backup_custom.c
> here:
>
> https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/
> pg_backup_custom.c#L723 <https://github.com/postgres/postgres/blob/
> master/src/bin/pg_dump/pg_backup_custom.c#L723>
>
> Per comment at line 755:
>
> "
> If possible, re-write the TOC in order to update the data offset
> information. This is not essential, as pg_restore can cope in most
> cases without it; but it can make pg_restore significantly faster
> in some situations (especially parallel restore). We can skip this
> step if we're not dumping any data; there are no offsets to update
> in that case.
> "
>
> >
> > Regards,
> > Rianto
> >
> > On Wed, 17 Sept 2025 at 13:48, Ron Johnson
> <[email protected] <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
> >
> >
> > PG 17 has integrated zstd compression, while --
> format=directory lets
> > you do multi-threaded dumps. That's much faster than a single-
> > threaded pg_dump into a multi-threaded compression program.
> >
> > (If for _Reasons_ you require a single-file backup, then tar the
> > directory of compressed files using the --remove-files option.)
> >
> > On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi
> <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > Sorry for not including the full command - yes , its
> piping to a
> > compression command :
> > | lbzip2 -n <threadsforbzipgoeshere>--best >
> <filenamegoeshere>
> >
> >
> > I think we found the issue! I'll do further testing and
> see how
> > it goes !
> >
> >
> >
> >
> >
> > On Wed, 17 Sept 2025 at 11:02, Ron Johnson
> > <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >
> > So, piping or redirecting to a file? If so, then
> that's the
> > problem.
> >
> > pg_dump directly to a file puts file offsets in the TOC.
> >
> > This how I do custom dumps:
> > cd $BackupDir
> > pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
> > 2> ${db}.log
> >
> > On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
> > <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > pg_dump was done using the following command :
> > pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>
> >
> > On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
> > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
> >
> > On 9/16/25 15:25, R Wahyudi wrote:
> > >
> > > I'm trying to troubleshoot the slowness issue
> > with pg_restore and
> > > stumbled across a recent post about pg_restore
> > scanning the whole file :
> > >
> > > > "scanning happens in a very inefficient
> way,
> > with many seek calls and
> > > small block reads. Try strace to see them.
> This
> > initial phase can take
> > > hours in a huge dump file, before even
> starting
> > any actual restoration."
> > > see : https://www.postgresql.org/message-
> id/ <https://www.postgresql.org/message-id/;
> > E48B611D-7D61-4575-A820- <https://
> > www.postgresql.org/message-id/E48B611D-7D61-4575-A820- <http://
> www.postgresql.org/message-id/E48B611D-7D61-4575-A820->>
> > > B2C3EC2E0551%40gmx.net <http://40gmx.net;
> <http://40gmx.net <http://40gmx.net>;
> > <https://www.postgresql.org/message-id/
> <https://www.postgresql.org/message-id/; <https://
> > www.postgresql.org/message-id/ <http://www.postgresql.org/
> message-id/>>
> > > E48B611D-7D61-4575-A820-
> B2C3EC2E0551%40gmx.net <http://40gmx.net;
> > <http://40gmx.net <http://40gmx.net>>;
> >
> > This was for pg_dump output that was streamed
> to a
> > Borg archive and as
> > result had no object offsets in the TOC.
> >
> > How are you doing your pg_dump?
> >
> >
> >
> > --
> > Adrian Klaver
> > [email protected] <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>
> >
> >
> >
> > --
> > Death to <Redacted>, and butter sauce.
> > Don't boil me, I'm still alive.
> > <Redacted> lobster!
> >
> >
> >
> > --
> > Death to <Redacted>, and butter sauce.
> > Don't boil me, I'm still alive.
> > <Redacted> lobster!
> >
>
>
> --
> Adrian Klaver
> [email protected] <mailto:[email protected]>
>
--
Adrian Klaver
[email protected]
view thread (13+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: pg_restore scan
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox