Re: Postgres restore sometimes restores to a point 2 days in the past

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Koen De Groote <[email protected]>
To: PostgreSQL General <[email protected]>
Subject: Re: Postgres restore sometimes restores to a point 2 days in the past
Date: Thu, 19 Feb 2026 19:30:03 +0000
Message-ID: <CAGbX52H5S1zoqQryS_80oo69m5V7ueauJhU6RkD9GLLzPq8cBQ@mail.gmail.com> (raw)
In-Reply-To: <CAGbX52HkW6926c4tY781+iH01x_0qw6Sfo=kT+LHjo_mENqOfQ@mail.gmail.com>
References: <CAGbX52HkW6926c4tY781+iH01x_0qw6Sfo=kT+LHjo_mENqOfQ@mail.gmail.com>

Sorry for reviving this old thread.

I found the cause for this a few days after these messages: The WAL file
download was interrupted. It comes from S3, and for some reason the
downloaded file was not the same size as the one on S3. So, some network
error occurred.

PG tries to restore via the restore_command, which is "gzip - dc". The
restore fails because the file is incomplete, and PG concludes it's not
receiving files anymore and thus, that the restore is done.

This happens again the next day, because the system is set up to not
download files already found on local mount.

Mystery solved.



On Fri, Jan 31, 2025 at 9:47 AM Koen De Groote <[email protected]> wrote:

> I'm running postgres 16.6
>
> My backup strategy is: basebackup and WAL archive. These get uploaded to
> the cloud.
>
> The restore is on an isolated machine and is performed daily. It downloads
> the basebackup, unpacks it, sets a recovery.signal, and a script is
> provided as restore_command, to download the WAL archives %f and unpack
> them into %p
>
> In the script, the final unpacking is simply "gzip -dc %f > %p". The gz
> files are first checked with "gzip -t".
>
> If a WAL archive is asked that doesn't exist yet, the script naturally
> cannot find it, and exits with status code 1. This is the end of the
> recovery.
>
> There are a few tables that are known to receive new entries multiple
> times per day. However, the state of the recovery showed the latest item to
> be 2 days in the past. Checking the live DB, there are an expected amount
> of items since that ID.
>
> I checked the logs, the last WAL archive that got downloaded is indeed the
> last one that was available. The one that failed to download on the restore
> machine, was uploaded to the cloud 8 minutes later, according to the upload
> logs on the live DB.
>
> The postgres logs themselves seem perfectly normal. It logs all these WAL
> recoveries, switches the timeline, and becomes available.
>
> What could be going wrong? My main issue is that I don't know where to
> start looking, since nothing in the logs seems abnormal.
>
> Regards,
> Koen De Groote
>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Postgres restore sometimes restores to a point 2 days in the past
  In-Reply-To: <CAGbX52H5S1zoqQryS_80oo69m5V7ueauJhU6RkD9GLLzPq8cBQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox