Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t3Ktv-00Bz4q-BL for pgsql-general@arkaria.postgresql.org; Tue, 22 Oct 2024 19:50:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t3Ktt-002hhm-Lj for pgsql-general@arkaria.postgresql.org; Tue, 22 Oct 2024 19:50:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t3Ktt-002hhT-4V for pgsql-general@lists.postgresql.org; Tue, 22 Oct 2024 19:50:41 +0000 Received: from mail-ej1-x630.google.com ([2a00:1450:4864:20::630]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t3Ktq-002Kvy-KT for pgsql-general@lists.postgresql.org; Tue, 22 Oct 2024 19:50:40 +0000 Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-a9a0f198d38so838166466b.1 for ; Tue, 22 Oct 2024 12:50:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729626637; x=1730231437; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=u3FBRYVJ8SqaFAQhjo646NdOX2eQjo1BHDYbd9dP4gM=; b=jkx20ZIUJdghD8pDXya5lhs0UY8GTiBwexfe5ZILdKcC/9IAoAHcop/sCbwLKh7ddl sBl79Vda/QYrTtuxqzZHtEOvZ/QIdoui/qklxTO3lPR3jCxoOQ4EvJJFqx25+RYWdy8O 8fXemdnSRF5Nlgr3aBwZCBn4wBvMRyatpToPwpVLZHTPWeCzu9F32FsdVhYgHdTBXnlx 1LQGZE2tDpuS9r65+lAcJKPp4c9cV2qpkZ/5bSjkykbQ8NWNohzuLCvGr4GcLjuBI7Y3 nZInwIfPNQ8Og0EJbroI+xHIOcSNdHaenBg7+UT1QHVYX+FKDGRpP/euIL1KDLTjC0xw cNWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729626637; x=1730231437; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=u3FBRYVJ8SqaFAQhjo646NdOX2eQjo1BHDYbd9dP4gM=; b=bZcK7P+TZ9r88htwQqkZJyfZmNir9Y0ozOAJeS1wLqR6sfCGz7rYxsf2BYgPuggzc5 L7bD5WnPZs3o5ozB8gaYuSwY/rvc5EHU3qYrP46Vxrd7uAEYmsyDC+bwOrGe5c7A3u8Q QIKlr/2LKLxZtIdHoqqjUiKjRqOPiWAdcVo35gVbitSYHLGRC9lETRBV2uirnH77C/u8 51y+U+rgZ6PtAE2+4x6B6DtuLhdeOoJxFx7nz5vrwrDv3MNsWrWKRSzLLXLhNXqx4TEE NhiqdhOe03kGeG0teAmvTaJOsf7vDzcKWRUOfoM8jtW9QZwfxlVA/I/jav42fd7w127a 3ZIw== X-Forwarded-Encrypted: i=1; AJvYcCV3eqfuMmDPxTp6D7Da2X3HXB9B+yFuf4Myq12LVafRUc3eH2pU06eaArJ0bON7qtopt9GBnDwaDiYuopo7@lists.postgresql.org X-Gm-Message-State: AOJu0YwEgzpTlt77BLe72rjUseQXloZbi1sj7EG/rPWlu/UUulqogWrx 8N5tQWhYUglNGmbOhoqazzy7/5ocHWANYvrm9+p20LjK3XJjgcshHpHOr4PfpfWq2oxdSFrgSc5 BXYZSLMMjzQKl9WVxu2MF3qQAMvE= X-Google-Smtp-Source: AGHT+IH/IdMeGRxU4CpVjtj5UxLDFrltJEEP6CsY9UXXCPW6+Ke7NKD9W/95TpFMrVPfKdFp682Ns9p8/G07PvBwAk4= X-Received: by 2002:a17:907:980e:b0:a9a:1115:486e with SMTP id a640c23a62f3a-a9abf94d85emr13862166b.45.1729626636563; Tue, 22 Oct 2024 12:50:36 -0700 (PDT) MIME-Version: 1.0 References: <52afa4c9-7393-4265-88bb-6393f1b0fb03@aklaver.com> <45e8c44b-2506-41b5-b999-5fdc42472644@aklaver.com> In-Reply-To: From: Koen De Groote Date: Tue, 22 Oct 2024 21:50:24 +0200 Message-ID: Subject: Re: Basebackup fails without useful error message To: "David G. Johnston" Cc: Adrian Klaver , PostgreSQL General Content-Type: multipart/alternative; boundary="000000000000443d0606251615f7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000443d0606251615f7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello David, I saw the backup fail. The backup logged that it terminated the walsender, and correlating the moment it failed to the metrics of my storage, shows the storage at that time was facing a huge IOWAIT. And this was a network mounted storage. The backup process continued, but because of a failure to stream WAL without error(due to a local issue) the entire backup was marked as failed. At the end, pg_basebackup will delete the backup, in this case. There's no flag to control this final behavior. I'll be testing restore soon without streaming WAL, since the actual restore I perform doesn't use the pg_wal.tar.gz file. It gets the archived WAL At least I think it doesn't need it, hence the need for testing. Regards, Koen De Groote On Tue, Oct 22, 2024 at 12:34=E2=80=AFAM David G. Johnston < david.g.johnston@gmail.com> wrote: > On Sunday, October 20, 2024, Koen De Groote wrote: >> >> >> I'm going to be testing this. If someone could confirm that this is how >> writing WAL files works, that being: that it is only considered "done" w= hen >> the archive_command is done, that would be great. >> > > The archiving of WAL files by the primary does not involve a replication > connection of any sort and thus the =E2=80=9CWAL sender=E2=80=9D settings= are not relevant > to it; or, here, whether or not you are archiving your WAL is immaterial > since you are streaming it as it gets produced. > > If you are streaming WAL it seems highly unusual that you=E2=80=99d end u= p in a > situation where the connection goes idle long enough that it gets killed, > especially if the backup is still happening. I=E2=80=99d probably go wit= h > performing the backup under a disabled (or extremely large?) timeout thou= gh > and move on to other things. > > That isn=E2=80=99t to say I fully understand what actually is happening h= ere=E2=80=A6 > > David J. > > --000000000000443d0606251615f7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello David,

I saw the backu= p fail. The backup logged that it terminated the walsender, and correlating= the moment it failed to the metrics of my storage, shows the storage at th= at time was facing a huge IOWAIT. And this was a network mounted storage.

The backup process continued, but because of a fail= ure to stream WAL without error(due to a local issue) the entire backup was= marked as failed. At the end, pg_basebackup will delete the backup, in thi= s case. There's no flag to control this final behavior.

<= /div>
I'll be testing restore soon without streaming WAL, since the= actual restore I perform doesn't use the pg_wal.tar.gz file. It gets t= he archived WAL At least I think it doesn't need it, hence the need for= testing.

Regards,
Koen De Groote

On Tue, Oct 22, 2024 at 12:34=E2=80=AFAM David G. Johnston <david.g.johnston@gmail.com> w= rote:
On Sunday,= October 20, 2024, Koen De Groote <kdg.dev@gmail.com> wrote:

I'm = going to be testing this. If someone could confirm that this is how writing= WAL files works, that being: that it is only considered "done" w= hen the archive_command is done, that would be great.

The archiving of WAL files by the primary does= not involve a replication connection of any sort and thus the =E2=80=9CWAL= sender=E2=80=9D settings are not relevant to it; or, here, whether or not = you are archiving your WAL is immaterial since you are streaming it as it g= ets produced.

If you are streaming WAL it seems hi= ghly unusual that you=E2=80=99d end up in a situation where the connection = goes idle long enough that it gets killed, especially if the backup is stil= l happening.=C2=A0 I=E2=80=99d probably go with performing the backup under= a disabled (or extremely large?) timeout though and move on to other thing= s.

That isn=E2=80=99t to say I fully understand wh= at actually is happening here=E2=80=A6

David J.

--000000000000443d0606251615f7--