public inbox for [email protected]  
help / color / mirror / Atom feed
From: R Wahyudi <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: Ron Johnson <[email protected]>
Cc: pgsql-generallists.postgresql.org <[email protected]>
Subject: Re: pg_restore scan
Date: Fri, 19 Sep 2025 09:45:22 +1000
Message-ID: <CALWQLzRr34aZ+Dk_vhvz2VYtFjsChe1PQp3Nc_F9ENKzw3c7Tg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CALWQLzRmzT7bo0c6CUX9=L_oLD3oUN8fZ5yyGLEwe7y5rWoxmQ@mail.gmail.com>
	<[email protected]>
	<CALWQLzTWUnA09cAfsuzuv5Kmb+S8gcC9uBypRYW0UtsQgMyPJg@mail.gmail.com>
	<CANzqJaCQo6A+QLqJGeXGasoK3aSFL5Ehf5x-Gt0T9DOaRFoxKw@mail.gmail.com>
	<CALWQLzSGPUeVDDp15inXiAFJe_rS_JObr_17Qq6Ns0s-p0YSvQ@mail.gmail.com>
	<CANzqJaAkTPeiuadRfZ7S4L2N7H7ayjW7bHqsfZ5wRDDvAmu89w@mail.gmail.com>
	<CALWQLzRVnfD+jfO1bjgr=tfbgzY36VdwTWdWGzgTC6bOjCN8Ow@mail.gmail.com>
	<[email protected]>
	<CALWQLzRhq14t_skYeJKmFV=1pp8dCjme=zHhDoMZkSJZAxDS5w@mail.gmail.com>
	<[email protected]>

>> The input must be a regular file or directory (not, for example, a pipe
or standard input).

Thanks again for the pointer!

I successfully ran a parallel restore with no warnings presented.
I didn't really pay attention to how the dump was taken until I
accidentally stumbled upon your post.


Regards,
Rianto




On Fri, 19 Sept 2025 at 07:45, Adrian Klaver <[email protected]>
wrote:

>
>
> On 9/18/25 2:36 PM, R Wahyudi wrote:
> > I've been given a database dump file daily and I've been asked to
> > restore it.
> > I tried everything I could to speed up the process, including using -j
> 40.
> >
> > I discovered that at the later stage of the restore process,  the
> > following behaviour repeated a few times :
> > 40 x pg_restore process doing 100% CPU
> > 40 x  postgres process doing COPY but using 0% CPU
> > ..... and zero disk write activity
> >
> > I don't see this behaviour when restoring the database that was dumped
> > with -Fd.
> > Also with an un-piped backup file, I can restore a specific table
> > without having to wait for hours.
>
>  From the docs:
>
> https://www.postgresql.org/docs/current/app-pgrestore.html
>
> "
> -j number-of-jobs
>
> Only the custom and directory archive formats are supported with this
> option. The input must be a regular file or directory (not, for example,
> a pipe or standard input). Also, multiple jobs cannot be used together
> with the option --single-transaction.
> "
>
>
> >
> >
> > --
> >
> >
> >
> >
> >
> > On Fri, 19 Sept 2025 at 01:54, Adrian Klaver <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     On 9/18/25 05:58, R Wahyudi wrote:
> >      > Hi All,
> >      >
> >      > Thanks for the quick and accurate response!  I never been so happy
> >      > seeing IOwait on my system!
> >
> >     Because?
> >
> >     What did you find?
> >
> >      >
> >      > I might be blind as  I can't find information about 'offset' in
> >     pg_dump
> >      > documentation.
> >      > Where can I find more info about this?
> >
> >     It is not in the user documentation.
> >
> >       From the thread Ron referred to, there is an explanation here:
> >
> >     https://www.postgresql.org/message-
> >     id/366773.1756749256%40sss.pgh.pa.us <https://www.postgresql.org/
> >     message-id/366773.1756749256%40sss.pgh.pa.us>
> >
> >     I believe the actual code, for the -Fc format, is in
> pg_backup_custom.c
> >     here:
> >
> >     https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/
> >     pg_backup_custom.c#L723 <https://github.com/postgres/postgres/blob/
> >     master/src/bin/pg_dump/pg_backup_custom.c#L723>
> >
> >     Per comment at line 755:
> >
> >     "
> >        If possible, re-write the TOC in order to update the data offset
> >     information.  This is not essential, as pg_restore can cope in most
> >     cases without it; but it can make pg_restore significantly faster
> >     in some situations (especially parallel restore).  We can skip this
> >     step if we're not dumping any data; there are no offsets to update
> >     in that case.
> >     "
> >
> >      >
> >      > Regards,
> >      > Rianto
> >      >
> >      > On Wed, 17 Sept 2025 at 13:48, Ron Johnson
> >     <[email protected] <mailto:[email protected]>
> >      > <mailto:[email protected]
> >     <mailto:[email protected]>>> wrote:
> >      >
> >      >
> >      >     PG 17 has integrated zstd compression, while --
> >     format=directory lets
> >      >     you do multi-threaded dumps.  That's much faster than a
> single-
> >      >     threaded pg_dump into a multi-threaded compression program.
> >      >
> >      >     (If for _Reasons_ you require a single-file backup, then tar
> the
> >      >     directory of compressed files using the --remove-files
> option.)
> >      >
> >      >     On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi
> >     <[email protected] <mailto:[email protected]>
> >      >     <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >      >
> >      >         Sorry for not including the full command - yes , its
> >     piping to a
> >      >         compression command :
> >      >           | lbzip2 -n <threadsforbzipgoeshere>--best >
> >     <filenamegoeshere>
> >      >
> >      >
> >      >         I think we found the issue! I'll do further testing and
> >     see how
> >      >         it goes !
> >      >
> >      >
> >      >
> >      >
> >      >
> >      >         On Wed, 17 Sept 2025 at 11:02, Ron Johnson
> >      >         <[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>>
> >     wrote:
> >      >
> >      >             So, piping or redirecting to a file?  If so, then
> >     that's the
> >      >             problem.
> >      >
> >      >             pg_dump directly to a file puts file offsets in the
> TOC.
> >      >
> >      >             This how I do custom dumps:
> >      >             cd $BackupDir
> >      >             pg_dump -Fc --compress=zstd:long -v -d${db} -f
> ${db}.dump
> >      >               2> ${db}.log
> >      >
> >      >             On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
> >      >             <[email protected] <mailto:[email protected]>
> >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >      >
> >      >                 pg_dump was done using the following command :
> >      >                 pg_dump -Fc -Z 0 -h <host> -U <user> -w -d
> <database>
> >      >
> >      >                 On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
> >      >                 <[email protected]
> >     <mailto:[email protected]>
> >      >                 <mailto:[email protected]
> >     <mailto:[email protected]>>> wrote:
> >      >
> >      >                     On 9/16/25 15:25, R Wahyudi wrote:
> >      >                      >
> >      >                      > I'm trying to troubleshoot the slowness
> issue
> >      >                     with pg_restore and
> >      >                      > stumbled across a recent post about
> pg_restore
> >      >                     scanning the whole file :
> >      >                      >
> >      >                      >  > "scanning happens in a very inefficient
> >     way,
> >      >                     with many seek calls and
> >      >                      > small block reads. Try strace to see them.
> >     This
> >      >                     initial phase can take
> >      >                      > hours in a huge dump file, before even
> >     starting
> >      >                     any actual restoration."
> >      >                      > see : https://www.postgresql.org/message-
> >     id/ <https://www.postgresql.org/message-id/;
> >      >                     E48B611D-7D61-4575-A820- <https://
> >      > www.postgresql.org/message-id/E48B611D-7D61-4575-A820- <http://
> >     www.postgresql.org/message-id/E48B611D-7D61-4575-A820->>
> >      >                      > B2C3EC2E0551%40gmx.net <http://40gmx.net;
> >     <http://40gmx.net <http://40gmx.net>;
> >      >                     <https://www.postgresql.org/message-id/
> >     <https://www.postgresql.org/message-id/; <https://
> >      > www.postgresql.org/message-id/ <http://www.postgresql.org/
> >     message-id/>>
> >      >                      > E48B611D-7D61-4575-A820-
> >     B2C3EC2E0551%40gmx.net <http://40gmx.net;
> >      >                     <http://40gmx.net <http://40gmx.net>>;
> >      >
> >      >                     This was for pg_dump output that was streamed
> >     to a
> >      >                     Borg archive and as
> >      >                     result had no object offsets in the TOC.
> >      >
> >      >                     How are you doing your pg_dump?
> >      >
> >      >
> >      >
> >      >                     --
> >      >                     Adrian Klaver
> >      > [email protected] <mailto:[email protected]>
> >      >                     <mailto:[email protected]
> >     <mailto:[email protected]>>
> >      >
> >      >
> >      >
> >      >             --
> >      >             Death to <Redacted>, and butter sauce.
> >      >             Don't boil me, I'm still alive.
> >      >             <Redacted> lobster!
> >      >
> >      >
> >      >
> >      >     --
> >      >     Death to <Redacted>, and butter sauce.
> >      >     Don't boil me, I'm still alive.
> >      >     <Redacted> lobster!
> >      >
> >
> >
> >     --
> >     Adrian Klaver
> >     [email protected] <mailto:[email protected]>
> >
>
> --
> Adrian Klaver
> [email protected]
>
>


view thread (13+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_restore scan
  In-Reply-To: <CALWQLzRr34aZ+Dk_vhvz2VYtFjsChe1PQp3Nc_F9ENKzw3c7Tg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox