Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um1A8-008cDW-Ow for pgsql-general@arkaria.postgresql.org; Wed, 13 Aug 2025 02:24:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1um1A7-00BGOU-8a for pgsql-general@arkaria.postgresql.org; Wed, 13 Aug 2025 02:24:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um1A6-00BGOL-Pp for pgsql-general@lists.postgresql.org; Wed, 13 Aug 2025 02:24:23 +0000 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1um1A4-000Hqu-1R for pgsql-general@lists.postgresql.org; Wed, 13 Aug 2025 02:24:22 +0000 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-506374fbecbso3816028137.0 for ; Tue, 12 Aug 2025 19:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755051859; x=1755656659; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=WlJwbnbpdpFFFZbyBueLv6dCFVM+7mXVvwAICEZyAsA=; b=SlLbS9/h9cTsfuSfwWWiTNZ3vhIVLLcUMa1/Oj1tvoMYxojmk77+WiH4KaezAhf2nX Omz/hdO/wxB0aHDpy1zpfXD+KtvA1SgxIeB3V786fCZdIgYKYWrRmjLM7Ieajs1FTe9M F9p2BE4MPVgyfybrKitegvyvT+eTio6o3xLt7Llv0dfwQuK9nAmICEy8g2/L9snHrKw1 HySi5jFrXEAxT5bXo4s1bGkalmE88BGY4BsaI8gmYra85V0uLi5iFI0aCtib+drMvHm1 GCH10IIAD6sId6iCrNLwbDtpdsHoykJHOU5xxppiB4GLFwM76/h7zp939nLaW3dvHbrt E1Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755051859; x=1755656659; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WlJwbnbpdpFFFZbyBueLv6dCFVM+7mXVvwAICEZyAsA=; b=WC7IPWeLTiJOiS+hZilXUh0iYjZ7svO0k+XqpJ+3DekdXBjvym+Wv2vLw3+puRRgmd DZ1NduiFdcHy5d9MctJY1h8/5wokNJ/lk3vU9j6cnSjs7ePkKT06MVU78HIFLnly/wsM xZC/BUuneHWbVwg90dUSsjwkRM22OxfOVAJovnXH7jcHTHFTGySnsuMDJHeMvqNhY6Q5 dW3nKl3Fnc7qISQqIjqbqETAMi2iGl4rB5hhu05oBSBlnyek8kNkAz+/nqKiXbKIVDdB +xgKFe0bIlxdunfjzTgWtVtZXYB6X61sLz5hHwstKqcEq5wylH1DKQQynDO8KBAzGJ/0 HRqA== X-Gm-Message-State: AOJu0YyTTy2XatKGRBH25MZTi27vOiy91zfJhtyozYhQR/iorYiER5Ar SRoPY6XK2UwvaMolN3OzsTz6wfFjAtLva/pukh3emIQk7uOTKeX6Ksubn6eWEFgiPJZM9JE8U23 DOo/X9DydNKzQyRA7tXLYc6+48Ttymlaa2rH6g2hWO7Cc X-Gm-Gg: ASbGncsjlgMMMf/M42k+r31gT94mNhlmwXPEGB1tdGcEn7JNBFMtBHXDR4Ly4E7ZVws d5YHjXLVpZVAbxwVqT9js5oDTfFAHUCDqkKKIwDpxZEvyDWt8Oy8W5re+U5xMy6+AZwXDqV76vC +Y9uuv/3B8/0Nkhin2nBEcTSGIWCh8pP7BXJPIY1UeniC2PMvv0X2YNkixKngIlHAkaXNfC3wDZ J82ub/HPIDOE4aL6w== X-Google-Smtp-Source: AGHT+IFo9B20SKfVQl8LOoEIUqxr3XUKiuK0WHZwde5lfmz4fwxgmaAn7uNYX9L7RHSiwGbZCe0x8bU3HgRlLp17xT4= X-Received: by 2002:a05:6102:32c4:b0:4e6:443d:9b1a with SMTP id ada2fe7eead31-50e512a3423mr587251137.24.1755051859438; Tue, 12 Aug 2025 19:24:19 -0700 (PDT) MIME-Version: 1.0 References: <73f3723b-f279-43c6-884d-d12b3151ec9e@aklaver.com> In-Reply-To: From: px shi Date: Wed, 13 Aug 2025 10:24:08 +0800 X-Gm-Features: Ac12FXzueiNYdIAc0J1L08e3NYoiHIPYb_qLK_ygRGn3asACyV2FYE54QsBJIIQ Message-ID: Subject: Re: Questions about the continuity of WAL archiving To: Ron Johnson Cc: "pgsql-generallists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000a4f0b5063c35dab9" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a4f0b5063c35dab9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > How often does your primary node crash, and then not recover due to WALs > corruption or WALs not existing? > > If it's _ever_ happened, you should _fix that_ instead of rolling your ow= n > WAL archival.process. > I once encountered a case where the recovery process failed to restore to the latest LSN due to missing WAL files in the archive. The root cause was multiple failovers between primary and standby. During one of the switchovers, the primary crashed before completing the archiving of all WAL files. When the standby was promoted to primary, it began archiving WAL files for the new timeline, resulting in a gap between the WAL files of the two timelines. Moreover, no base backup was taken during this period. Ron Johnson =E4=BA=8E2025=E5=B9=B48=E6=9C=8813=E6= =97=A5=E5=91=A8=E4=B8=89 10:11=E5=86=99=E9=81=93=EF=BC=9A > How often does your primary node crash, and then not recover due to WALs > corruption or WALs not existing? > > If it's _ever_ happened, you should _fix that_ instead of rolling your ow= n > WAL archival.process. > > On Tue, Aug 12, 2025 at 10:05=E2=80=AFPM px shi wro= te: > >> Hi, Adrian >> >> Given that you are using a less then capable storage solution(S3) why do >>> you think pushing the WAL from the standby to S3 would perform any >>> better then what is happening with the primary WAL? >>> >> >> I mean that archive_mode is set to on in primary and set to always in >> standby. >> This way, even if the primary crashes, the standby can still archive WAL >> files that the primary did not archive. >> >> The solution is to use a more capable storage platform. >>> >> >> However, I believe that even if we use a more capable storage platform, >> it is still impossible to archive WAL files in real time. As long as >> real-time archiving cannot be achieved, there will always be some WAL fi= les >> that are not archived if the primary node crashes. >> >> Adrian Klaver =E4=BA=8E2025=E5=B9=B48=E6=9C= =8813=E6=97=A5=E5=91=A8=E4=B8=89 00:14=E5=86=99=E9=81=93=EF=BC=9A >> >>> On 8/12/25 01:24, px shi wrote: >>> > >>> > 1) What is the current archiving setup on the primary and why is >>> > lagging? >>> > >>> > The archive command uses pgBackRest to archive to S3. Because it is >>> > uploaded to S3, the archiving speed is slow, which has caused lagging= . >>> > >>> > 2) Have you looked at archiving off the standby node while it is = in >>> > standby per: >>> > >>> > Yes, archiving on the standby node is disabled. Is it recommended to >>> > share the WAL archive between the primary and standby nodes to avoid >>> > interruptions in archiving? >>> >>> Given that you are using a less then capable storage solution(S3) why d= o >>> you think pushing the WAL from the standby to S3 would perform any >>> better then what is happening with the primary WAL? >>> >>> The solution is to use a more capable storage platform. >>> >>> > >>> > Adrian Klaver >> > > =E4=BA=8E2025=E5=B9=B48=E6=9C=888= =E6=97=A5=E5=91=A8=E4=BA=94 23:23=E5=86=99=E9=81=93=EF=BC=9A >>> > >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@aklaver.com >>> >> > > -- > Death to , and butter sauce. > Don't boil me, I'm still alive. > lobster! > --000000000000a4f0b5063c35dab9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
How often does your primary node crash, a= nd then not recover due to WALs corruption or WALs not existing?

If = it's _ever_ happened, you should _fix that_ instead of rolling your own= WAL archival.process.

=C2=A0I once enc= ountered a case where the recovery process failed to restore to the latest = LSN due to missing WAL files in the archive. The root cause was multiple fa= ilovers between primary and standby. During one of the switchovers, the pri= mary crashed before completing the archiving of all WAL files. When the sta= ndby was promoted to primary, it began archiving WAL files for the new time= line, resulting in a gap between the WAL files of the two timelines. Moreov= er, no base backup was taken during this period.


Ron Johnson <ronljohnson= jr@gmail.com> =E4=BA=8E2025=E5=B9=B48=E6=9C=8813=E6=97=A5=E5=91=A8= =E4=B8=89 10:11=E5=86=99=E9=81=93=EF=BC=9A
How often does your primary node crash,=C2=A0an= d then not recover due to WALs corruption or WALs not existing?

If it's _ever_ happened, you should _fix that= _ instead of rolling=C2=A0your own WAL archival.process.

On Tue, Aug 12, 202= 5 at 10:05=E2=80=AFPM px shi <spxlyy123@gmail.com> wrote:
Hi, Adrian

Given that you are us= ing a less then capable storage solution(S3) why do
you think pushing th= e WAL from the standby to S3 would perform any
better then what is happe= ning with the primary WAL?
=C2=A0
I me= an that archive_mode is set to on in primary and set to always in standby.= =C2=A0
This way, even if the primary crashes, the standby can sti= ll archive WAL files that the primary did not archive.

=
The solution is to use a more capable storage platform.

=C2=A0However, I believe that even if we = use a more capable storage platform, it is still impossible to archive WAL = files in real time. As long as real-time archiving cannot be achieved, ther= e will always be some WAL files that are not archived if the primary node c= rashes.

Adrian Klaver <adrian.klaver@aklaver.com> =E4=BA=8E2025=E5=B9=B4= 8=E6=9C=8813=E6=97=A5=E5=91=A8=E4=B8=89 00:14=E5=86=99=E9=81=93=EF=BC=9A
On 8/12/25 01:24, px shi wrote:
>
>=C2=A0 =C2=A0 =C2=A01) What is the current archiving setup on the prima= ry and why is
>=C2=A0 =C2=A0 =C2=A0lagging?
>
>=C2=A0 =C2=A0The archive command uses pgBackRest to archive to S3. Beca= use it is
> uploaded to S3, the archiving speed is slow, which has caused lagging.=
>
>=C2=A0 =C2=A0 =C2=A02) Have you looked at archiving off the standby nod= e while it is in
>=C2=A0 =C2=A0 =C2=A0standby per:
>
> Yes, archiving on the standby node is disabled. Is it recommended to <= br> > share the WAL archive between the primary and standby nodes to avoid <= br> > interruptions in archiving?

Given that you are using a less then capable storage solution(S3) why do you think pushing the WAL from the standby to S3 would perform any
better then what is happening with the primary WAL?

The solution is to use a more capable storage platform.

>
> Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> =E4=BA=8E2025=E5=B9=B48=E6=9C=888= =E6=97=A5=E5=91=A8=E4=BA=94 23:23=E5=86=99=E9=81=93=EF=BC=9A
>

--
Adrian Klaver
adrian.klave= r@aklaver.com


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--000000000000a4f0b5063c35dab9--