Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vt9j8-0066Qj-31 for pgsql-general@arkaria.postgresql.org; Thu, 19 Feb 2026 19:30:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vt9j7-005WsO-2Q for pgsql-general@arkaria.postgresql.org; Thu, 19 Feb 2026 19:30:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vt9j7-005Wqm-1D for pgsql-general@lists.postgresql.org; Thu, 19 Feb 2026 19:30:17 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vt9j4-00000000Bkb-2NHy for pgsql-general@lists.postgresql.org; Thu, 19 Feb 2026 19:30:17 +0000 Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-65a431e305eso2293891a12.0 for ; Thu, 19 Feb 2026 11:30:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771529414; cv=none; d=google.com; s=arc-20240605; b=Y8NIwAw/FwD/0QAyUGboX5vAWnsYrLSD+3P0p47ANzOUcbP/nOpglEGSU3IR8c6BiS L4q5vyZUi9pxxbBroiXCEeBz9BxEZO0rwgRyclJv57u/g9v+ekhSvSKO4ToB9zlbN2In 3CGATaOQv32ZduXMWansoToclkEobJxjX1fDB/XNFwJG5+Fs8pDJPenfZ2jzmT06znix tGacvSeH5nzbJbZSp57ii25mKxD021zXtvI2IG9ekRVBFNJgxlzG7Hkqknoqld8MH2L6 9+QIQgIqksBGaKihOkhLwvZ7yEs9I+DXF4i7DTHk/hKmktGwAcvrL7xVjMjin0OyugEF elVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=G1M4INY+1b/rUUAcXWEisKjB7GCm8+12mCtA6Ifkc4g=; fh=Y8BKYCMFTC6XGikLxKxVSHtO54ZSwiXldzwlYadcvXU=; b=Z1B4xo8gt4dlLf1HjbD0bKvz/24rt+KnKoioDXECGmwWEYOWIdmC8Ejd14UKDiMpSD aixQNs/LE1AlJy0FcDOR/diHiby0xLwykBnQGQpfhZ4axlldSkk4rjo4gsdusl2EfRID /SJjjV8/3zzOhfP6ljbEY2m8kKr61a40VzKlDMFewcJPIe8RUw/kvYPGtlq+pAoXAJVo z7KKnIDmJHHEGHxFz1RwFvyeG9jz2ecmQFW1Uao5dDfUDpCGhMfVdGHP5NPj6MljMMFJ YKZMq2v8YygX11OQCZlwAQAr4JVciiWnybKR3gbzD5vjLLI2VoHwLHb2c6V+zZkWnXmh jQJw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771529414; x=1772134214; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=G1M4INY+1b/rUUAcXWEisKjB7GCm8+12mCtA6Ifkc4g=; b=FNEo+Gu4AjIV0D/yyLTyqoVFFygNYh8bRhfTeMp1GSHhfgPyeEajlKLAIBNyrdWfn5 aS+hUrQ9566gHnXHSABtjmvSWeSOg3dMFC0GJGcv1JNc1ZsoBkaeyoOF7eVRnzFPVqxW Ns7aFTjjfXfDmI//L1eaybYpT0AHhILCPJfSwaPUKabC5lj2DVSAjf2ZPy91qBC8eogM NAml64xk8EWfLESMTb6bQbuEsxAZmIlgRNSEzUlPKBNPDIq9smRwV78pzrZZOHWJ4+2x nBq0EoOX97dTL0KuuWJpHLB5O6muN5W/eI8ZFyK54b1JyQ/X40Q7jJhZqqJZSkBlqsfO lCSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771529414; x=1772134214; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=G1M4INY+1b/rUUAcXWEisKjB7GCm8+12mCtA6Ifkc4g=; b=ho2Ffo+BEIYJeMCB4SK52n2YkNRMAYA/R/Y4gyWVocUg1VjBFfOFI2elY+roDlFgdV +MgDUgvn3KKBy2GdX9WGKidibP6ERU2OblUTrLyj4p5Gu/jFEBEiTfKpDf9ZszwMlPEP 5e66ni1oiJZQTgtLK8pdMCMfgHVIXgEdQEPp+UA34vS9odqTzmXOmCokunfAlxy8Ioo/ 7KKa5gh0mSm7subzoUeUTudQHEUmBSK/J4i2c4gpj0zZ5AoqK4pKdYbdILXnkMuX7yYU 5ZqWZxUcywmrWDJX//19OZpSzMftwvPzQoEXahPsytvlcB6NIx+lhoa6SYn+K462QT/r rZuw== X-Gm-Message-State: AOJu0YwU9arlQ/THxK7USUQJvDEUBZLHphEVZhO8BBjGdPB95UEaBhVd 56IndEfTQNXlc5r88YPZvIvaguXPSZwTjuQXZZCRyDlxrPrvngl8QELy2H4lzW4yl2Tk9FYyxc9 0sgtD3eueShPHBU0bSB6XrVp4eTZvlsZJtOWl X-Gm-Gg: AZuq6aKSPQLwyk4yiymG5Nzunel+hF2RmnRk2kNTRUH4pOWyuNa+Bkewag20YP7Xy5h XAr+kHXu2XfN9Fmvca9MKJWHfKy9k2XN982usfEf9wsrePFr36b3GuTnahZD3olNyYTptLmdOZ0 GLenndzoCIkJ9bJqmEREBqo/lR/FhxWdm5Hx217CzdYhA+rFG/4MH3+4XmWoKWIW3Ve3abEdazo mxYPOt7KFq6B8uhuLqOFMG00/COJu8g8WXAqWf2SNb2SrA4EwwcSXY10muPntVMFhooDh455a9+ JPeHCMj91M7KoeqD8sV6bo9PVW/0eJ21MR4CFb3Ws82tyEVpoqcH X-Received: by 2002:a17:907:c04:b0:b86:fca7:3dc2 with SMTP id a640c23a62f3a-b8fc38f15d8mr1085741966b.10.1771529414326; Thu, 19 Feb 2026 11:30:14 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Koen De Groote Date: Thu, 19 Feb 2026 19:30:03 +0000 X-Gm-Features: AaiRm520x9wvhNWyc2uccCXen9FLoGBNn1kJt0mJczuxces-LKEDrSFBFUSbedg Message-ID: Subject: Re: Postgres restore sometimes restores to a point 2 days in the past To: PostgreSQL General Content-Type: multipart/alternative; boundary="000000000000733c66064b325500" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000733c66064b325500 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sorry for reviving this old thread. I found the cause for this a few days after these messages: The WAL file download was interrupted. It comes from S3, and for some reason the downloaded file was not the same size as the one on S3. So, some network error occurred. PG tries to restore via the restore_command, which is "gzip - dc". The restore fails because the file is incomplete, and PG concludes it's not receiving files anymore and thus, that the restore is done. This happens again the next day, because the system is set up to not download files already found on local mount. Mystery solved. On Fri, Jan 31, 2025 at 9:47=E2=80=AFAM Koen De Groote = wrote: > I'm running postgres 16.6 > > My backup strategy is: basebackup and WAL archive. These get uploaded to > the cloud. > > The restore is on an isolated machine and is performed daily. It download= s > the basebackup, unpacks it, sets a recovery.signal, and a script is > provided as restore_command, to download the WAL archives %f and unpack > them into %p > > In the script, the final unpacking is simply "gzip -dc %f > %p". The gz > files are first checked with "gzip -t". > > If a WAL archive is asked that doesn't exist yet, the script naturally > cannot find it, and exits with status code 1. This is the end of the > recovery. > > There are a few tables that are known to receive new entries multiple > times per day. However, the state of the recovery showed the latest item = to > be 2 days in the past. Checking the live DB, there are an expected amount > of items since that ID. > > I checked the logs, the last WAL archive that got downloaded is indeed th= e > last one that was available. The one that failed to download on the resto= re > machine, was uploaded to the cloud 8 minutes later, according to the uplo= ad > logs on the live DB. > > The postgres logs themselves seem perfectly normal. It logs all these WAL > recoveries, switches the timeline, and becomes available. > > What could be going wrong? My main issue is that I don't know where to > start looking, since nothing in the logs seems abnormal. > > Regards, > Koen De Groote > --000000000000733c66064b325500 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sorry for reviving this old thread.

I found the cause for this a few days after these messages: The WAL= file download was interrupted. It comes from S3, and for some reason the d= ownloaded file was not the same size as the one on S3. So, some network err= or occurred.

PG tries to restore via the restore_command,= which is "gzip - dc". The restore fails because the file is inco= mplete, and PG concludes it's not receiving files anymore and thus, tha= t the restore is done.

This happens again the next= day, because the system is set up to not download files already found on l= ocal mount.

Mystery solved.



On Fri, Jan 31, 2025 at 9:47=E2=80=AFAM = Koen De Groote <kdg.dev@gmail.com> wrote:
I'm running postgres 16.6

My backup s= trategy is: basebackup and WAL archive. These get uploaded to the cloud.

The restore is on an isolated machine and is perform= ed daily. It downloads the basebackup, unpacks it, sets a recovery.signal, = and a script is provided as restore_command, to download the WAL archives= =C2=A0%f and unpack them into %p

In the script, th= e final unpacking is simply "gzip -dc %f > %p". The gz files a= re first checked with "gzip -t".

If a WAL archiv= e is asked that doesn't exist yet, the script naturally cannot find it,= and exits with status code 1. This is the end of the recovery.

There are a few tables that are known to receive new entries multip= le times per day. However, the state of the recovery showed the latest item= to be 2 days in the past. Checking the live DB, there are an expected amou= nt of items since that ID.

I checked the logs, the= last WAL archive that got downloaded is indeed the last one that was avail= able. The one that failed to download on the restore machine, was uploaded = to the cloud 8 minutes later, according to the upload logs on the live DB.<= /div>

The postgres logs themselves seem perfectly normal= . It logs all these WAL recoveries, switches the timeline, and becomes avai= lable.

What could be going wrong? My main issue is= that I don't know where to start looking, since nothing in the logs se= ems abnormal.

Regards,
Koen De Groote
--000000000000733c66064b325500--