public inbox for [email protected]
help / color / mirror / Atom feedFrom: Koen De Groote <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: PostgreSQL General <[email protected]>
Subject: Re: Basebackup fails without useful error message
Date: Sun, 20 Oct 2024 23:03:51 +0200
Message-ID: <CAGbX52ENsSHKoTyu5+XfN1o1bZ2w2CJaE1oQnxcm=fj2SyoZXg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAGbX52Fg2jxqGVZzeQ_QcHSZ8fgDjnVZUJy5NUb5-PAf8fvxkw@mail.gmail.com>
<[email protected]>
<CAGbX52EyjjOQJUjA0m4+0azs_yH1by63p6hhJg7Y66=VCCqzpA@mail.gmail.com>
<[email protected]>
Hello Adrian, and everyone else.
It has finally happened, the backup ran into an error again, and the
verbose output set me on the right path.
I'm getting this error message:
> pg_basebackup: could not receive data from WAL stream: server closed the
connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
Combined with the main server logging:
> terminating walsender process due to replication timeout
Now, the server is set up with an archive_command which gzips the WAL files
and writes them to a network filesystem.
From looking at machine metrics at the time, my conclusion is the following:
At the time of the error, the remote filesystem experienced a very high
queue size for new writes.
So I'm assuming the process of writing WAL files, if there is an
archive_command set, is only considered to be finished after the archive is
written, not just when the WAL file is written in pg_wal.
I'm also seeing in the documentation that the default WAL method for
pg_basebackup is "stream", which waits for these WAL files as they are
produced.
I suspect that I have 2 possible paths at this point:
1: increase wal_sender_timeout
2: run the basebackup with --wal-method=none since my restore_command is
set up to explicitly go to the very same network storage to get the
archived WAL files.
I'm going to be testing this. If someone could confirm that this is how
writing WAL files works, that being: that it is only considered "done" when
the archive_command is done, that would be great.
Regards,
Koen De Groote
On Sun, Sep 29, 2024 at 6:08 PM Adrian Klaver <[email protected]>
wrote:
> On 9/29/24 08:57, Koen De Groote wrote:
> > > What is the complete command you are using?
> >
> > The full command is:
> >
> > pg_basebackup -h localhost -p 5432 -U basebackup_user -D
> > /mnt/base_backup/dir -Ft -z -P
> >
> > So output Format as tar, gzipped, and with progress being printed.
> >
> > > Have you looked at the Postgres log?
> >
> > > Is --verbose being used?
> >
> > This is straight from the logs, it's the only output besides the %
> > progress counter.
> >
> > Will have a look at --verbose.
>
> When you report on that and if it does not report the error then what is?:
>
> Postgres version.
>
> OS and version.
>
> Anything special about the cluster like tablespaces, extensions,
> replication, etc.
>
>
> >
> > Regards,
> > Koen De Groote
> >
>
> --
> Adrian Klaver
> [email protected]
>
>
view thread (7+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Basebackup fails without useful error message
In-Reply-To: <CAGbX52ENsSHKoTyu5+XfN1o1bZ2w2CJaE1oQnxcm=fj2SyoZXg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox