MIME-Version: 1.0
References: <CAAccyYKpNsQMD+S-A7a8YtDevFN0uRXkzg4tYWWBOFsv_jASNg@mail.gmail.com>
 <fd5c6708-3791-4339-83a2-e5fc389cd9cb@aklaver.com> <CAAccyYLYZmwQiNMoJcQgo5t+E24rDtu1ZeBUrER7ZTKNAcZesw@mail.gmail.com>
 <d3622bf1-6a62-4703-b14e-295f47b5e348@aklaver.com> <CAAccyYJ-07SzCRAEkGJ2Qa8EAPCHQM4qcpB=OvD8P0zDbCJ0KQ@mail.gmail.com>
 <73f3723b-f279-43c6-884d-d12b3151ec9e@aklaver.com> <CAAccyYLdp7AkqS9or3cc+=HSBo3bWMojR20KkzWSC5BenEn=VQ@mail.gmail.com>
 <CANzqJaD-pOt=0CBFXUPEXMtD8AMi_y76z4=Rak405+Rhui4hKQ@mail.gmail.com>
In-Reply-To: <CANzqJaD-pOt=0CBFXUPEXMtD8AMi_y76z4=Rak405+Rhui4hKQ@mail.gmail.com>
From: px shi <spxlyy123@gmail.com>
Date: Wed, 13 Aug 2025 10:24:08 +0800
Message-ID: <CAAccyY+tnPFNuc33c6L0wdExSAEU0VGg4hSn=kuo7gNvroiRRA@mail.gmail.com>
Subject: Re: Questions about the continuity of WAL archiving
To: Ron Johnson <ronljohnsonjr@gmail.com>
Cc: "pgsql-generallists.postgresql.org" <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000a4f0b5063c35dab9"
Archived-At: <https://www.postgresql.org/message-id/CAAccyY%2BtnPFNuc33c6L0wdExSAEU0VGg4hSn%3Dkuo7gNvroiRRA%40mail.gmail.com>
Precedence: bulk

--000000000000a4f0b5063c35dab9
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

>
> How often does your primary node crash, and then not recover due to WALs
> corruption or WALs not existing?
>
> If it's _ever_ happened, you should _fix that_ instead of rolling your ow=
n
> WAL archival.process.
>

 I once encountered a case where the recovery process failed to restore to
the latest LSN due to missing WAL files in the archive. The root cause was
multiple failovers between primary and standby. During one of the
switchovers, the primary crashed before completing the archiving of all WAL
files. When the standby was promoted to primary, it began archiving WAL
files for the new timeline, resulting in a gap between the WAL files of the
two timelines. Moreover, no base backup was taken during this period.


Ron Johnson <ronljohnsonjr@gmail.com> =E4=BA=8E2025=E5=B9=B48=E6=9C=8813=E6=
=97=A5=E5=91=A8=E4=B8=89 10:11=E5=86=99=E9=81=93=EF=BC=9A

> How often does your primary node crash, and then not recover due to WALs
> corruption or WALs not existing?
>
> If it's _ever_ happened, you should _fix that_ instead of rolling your ow=
n
> WAL archival.process.
>
> On Tue, Aug 12, 2025 at 10:05=E2=80=AFPM px shi <spxlyy123@gmail.com> wro=
te:
>
>> Hi, Adrian
>>
>> Given that you are using a less then capable storage solution(S3) why do
>>> you think pushing the WAL from the standby to S3 would perform any
>>> better then what is happening with the primary WAL?
>>>
>>
>> I mean that archive_mode is set to on in primary and set to always in
>> standby.
>> This way, even if the primary crashes, the standby can still archive WAL
>> files that the primary did not archive.
>>
>> The solution is to use a more capable storage platform.
>>>
>>
>>  However, I believe that even if we use a more capable storage platform,
>> it is still impossible to archive WAL files in real time. As long as
>> real-time archiving cannot be achieved, there will always be some WAL fi=
les
>> that are not archived if the primary node crashes.
>>
>> Adrian Klaver <adrian.klaver@aklaver.com> =E4=BA=8E2025=E5=B9=B48=E6=9C=
=8813=E6=97=A5=E5=91=A8=E4=B8=89 00:14=E5=86=99=E9=81=93=EF=BC=9A
>>
>>> On 8/12/25 01:24, px shi wrote:
>>> >
>>> >     1) What is the current archiving setup on the primary and why is
>>> >     lagging?
>>> >
>>> >   The archive command uses pgBackRest to archive to S3. Because it is
>>> > uploaded to S3, the archiving speed is slow, which has caused lagging=
.
>>> >
>>> >     2) Have you looked at archiving off the standby node while it is =
in
>>> >     standby per:
>>> >
>>> > Yes, archiving on the standby node is disabled. Is it recommended to
>>> > share the WAL archive between the primary and standby nodes to avoid
>>> > interruptions in archiving?
>>>
>>> Given that you are using a less then capable storage solution(S3) why d=
o
>>> you think pushing the WAL from the standby to S3 would perform any
>>> better then what is happening with the primary WAL?
>>>
>>> The solution is to use a more capable storage platform.
>>>
>>> >
>>> > Adrian Klaver <adrian.klaver@aklaver.com
>>> > <mailto:adrian.klaver@aklaver.com>> =E4=BA=8E2025=E5=B9=B48=E6=9C=888=
=E6=97=A5=E5=91=A8=E4=BA=94 23:23=E5=86=99=E9=81=93=EF=BC=9A
>>> >
>>>
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com
>>>
>>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

--000000000000a4f0b5063c35dab9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:r=
gb(204,204,204);padding-left:1ex">How often does your primary node crash, a=
nd then not recover due to WALs corruption or WALs not existing?<br><br>If =
it&#39;s _ever_ happened, you should _fix that_ instead of rolling your own=
 WAL archival.process.<br></blockquote><div><br></div><div>=C2=A0I once enc=
ountered a case where the recovery process failed to restore to the latest =
LSN due to missing WAL files in the archive. The root cause was multiple fa=
ilovers between primary and standby. During one of the switchovers, the pri=
mary crashed before completing the archiving of all WAL files. When the sta=
ndby was promoted to primary, it began archiving WAL files for the new time=
line, resulting in a gap between the WAL files of the two timelines. Moreov=
er, no base backup was taken during this period.</div><br></div><br><div cl=
ass=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_a=
ttr">Ron Johnson &lt;<a href=3D"mailto:ronljohnsonjr@gmail.com">ronljohnson=
jr@gmail.com</a>&gt; =E4=BA=8E2025=E5=B9=B48=E6=9C=8813=E6=97=A5=E5=91=A8=
=E4=B8=89 10:11=E5=86=99=E9=81=93=EF=BC=9A<br></div><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-l=
eft-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div d=
ir=3D"ltr"><div dir=3D"ltr">How often does your primary node crash,=C2=A0an=
d then not recover due to WALs corruption or WALs not existing?</div><div d=
ir=3D"ltr"><br></div><div>If it&#39;s _ever_ happened, you should _fix that=
_ instead of rolling=C2=A0your own WAL archival.process.</div><br><div clas=
s=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Aug 12, 202=
5 at 10:05=E2=80=AFPM px shi &lt;<a href=3D"mailto:spxlyy123@gmail.com" tar=
get=3D"_blank">spxlyy123@gmail.com</a>&gt; wrote:<br></div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;b=
order-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"=
><div dir=3D"ltr">Hi, Adrian<div><br><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid=
;border-left-color:rgb(204,204,204);padding-left:1ex">Given that you are us=
ing a less then capable storage solution(S3) why do<br>you think pushing th=
e WAL from the standby to S3 would perform any<br>better then what is happe=
ning with the primary WAL?<br></blockquote><div>=C2=A0</div></div><div>I me=
an that archive_mode is set to on in primary and set to always in standby.=
=C2=A0</div><div>This way, even if the primary crashes, the standby can sti=
ll archive WAL files that the primary did not archive.</div><div><br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);p=
adding-left:1ex">The solution is to use a more capable storage platform.<br=
></blockquote><div><br></div><div>=C2=A0However, I believe that even if we =
use a more capable storage platform, it is still impossible to archive WAL =
files in real time. As long as real-time archiving cannot be achieved, ther=
e will always be some WAL files that are not archived if the primary node c=
rashes.</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D=
"gmail_attr">Adrian Klaver &lt;<a href=3D"mailto:adrian.klaver@aklaver.com"=
 target=3D"_blank">adrian.klaver@aklaver.com</a>&gt; =E4=BA=8E2025=E5=B9=B4=
8=E6=9C=8813=E6=97=A5=E5=91=A8=E4=B8=89 00:14=E5=86=99=E9=81=93=EF=BC=9A<br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204=
,204);padding-left:1ex">On 8/12/25 01:24, px shi wrote:<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A01) What is the current archiving setup on the prima=
ry and why is<br>
&gt;=C2=A0 =C2=A0 =C2=A0lagging?<br>
&gt; <br>
&gt;=C2=A0 =C2=A0The archive command uses pgBackRest to archive to S3. Beca=
use it is <br>
&gt; uploaded to S3, the archiving speed is slow, which has caused lagging.=
<br>
&gt; <br>
&gt;=C2=A0 =C2=A0 =C2=A02) Have you looked at archiving off the standby nod=
e while it is in<br>
&gt;=C2=A0 =C2=A0 =C2=A0standby per:<br>
&gt; <br>
&gt; Yes, archiving on the standby node is disabled. Is it recommended to <=
br>
&gt; share the WAL archive between the primary and standby nodes to avoid <=
br>
&gt; interruptions in archiving?<br>
<br>
Given that you are using a less then capable storage solution(S3) why do <b=
r>
you think pushing the WAL from the standby to S3 would perform any <br>
better then what is happening with the primary WAL?<br>
<br>
The solution is to use a more capable storage platform.<br>
<br>
&gt; <br>
&gt; Adrian Klaver &lt;<a href=3D"mailto:adrian.klaver@aklaver.com" target=
=3D"_blank">adrian.klaver@aklaver.com</a> <br>
&gt; &lt;mailto:<a href=3D"mailto:adrian.klaver@aklaver.com" target=3D"_bla=
nk">adrian.klaver@aklaver.com</a>&gt;&gt; =E4=BA=8E2025=E5=B9=B48=E6=9C=888=
=E6=97=A5=E5=91=A8=E4=BA=94 23:23=E5=86=99=E9=81=93=EF=BC=9A<br>
&gt; <br>
<br>
-- <br>
Adrian Klaver<br>
<a href=3D"mailto:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klave=
r@aklaver.com</a><br>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Death to &lt;Redacted&gt;, and butter sauce.<div=
>Don&#39;t boil me, I&#39;m still alive.<br><div><div>&lt;Redacted&gt; lobs=
ter!</div></div></div></div></div></div>
</blockquote></div>

--000000000000a4f0b5063c35dab9--