MIME-Version: 1.0
References: <CAFeSbqh0Mj3bm9+aCaz5g4NhKn8+t4aGF=p5vOPc5oVssveATQ@mail.gmail.com>
 <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> <CAFeSbqijFCW9xFOfapTzebbPcv2sWgpgrS1kVfFNJ+F7sA8R=A@mail.gmail.com>
 <CAN3jBgFEX-fhXuNkrMYwCeWjtYK2_zSrvEmefkUZciLPHK7Psw@mail.gmail.com>
 <CAFeSbqhs_5M-oz136PB_a1RgNQh5PBGSj-k_zL8hsmfV9ZTptw@mail.gmail.com>
 <CAN3jBgG8W+hiQqUGtJaVkE3wfTmmLO_XqQt=PPvGQbHMcgmndg@mail.gmail.com>
 <f1ad892c-43fb-4c4b-96f3-01f71ae4f4bf@aklaver.com> <CAN3jBgGPTuGKUFU=7gvdzjfRfzHYv74NuCsVAQD+J9PhoERcVA@mail.gmail.com>
In-Reply-To: <CAN3jBgGPTuGKUFU=7gvdzjfRfzHYv74NuCsVAQD+J9PhoERcVA@mail.gmail.com>
From: Paul Brindusa <paulbrindusa88@gmail.com>
Date: Fri, 24 Jan 2025 11:20:22 +0000
Message-ID: <CAFeSbqjV+ZDs96+sm57qw1QYw-RqWDfUWn5yWKahL2WH72TXAA@mail.gmail.com>
Subject: Re: Return of the pg_wal issue..
To: Saul Perdomo <saul.perdomo@gmail.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>, pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000449efa062c71ea5a"
Archived-At: <https://www.postgresql.org/message-id/CAFeSbqjV%2BZDs96%2Bsm57qw1QYw-RqWDfUWn5yWKahL2WH72TXAA%40mail.gmail.com>
Precedence: bulk

--000000000000449efa062c71ea5a
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Good morning everyone,

Following some troubleshooting last night we have managed to resolve the
issue.
Plowing through the entire thing  we have actually came to the conclusion
that the cluster is running but not replicating.
So the number one lesson learned is to always check *replication* in the
cluster, for the sake of data  safety and not having to go through a
million things.

This cluster is set up without a VIP, therefore db01 will always be the
master. Having it set up this way we have found that pb_hba.conf had this
line:

 host    all             postgres        db01/cidr          trust

From my understanding this means that the master was trying to replicate to
itself and not trusting the other nodes?

To fix we have put down the entire network:

 host    all             postgres        network/cidr          trust

Following the config amendment we have restarted the replicas with
patronictl -c /path/patroni/config reinit <cluster> <host>

Happy to say that clean up of wal files kicked in and now are down 4% usage
of /var volume from 96%.

Now then, there is still the bit with the actual postgres logs not rotating
properly? lol, but ill leave that for another email.

Massive thank you to all of you for the support.


On Thu, Jan 23, 2025 at 7:02=E2=80=AFPM Saul Perdomo <saul.perdomo@gmail.co=
m> wrote:

> Thanks for the correction Adrian - my oversimplification went too far, an=
d
> into "plain wrong" territory.
>
> (The detail that I felt was too much for this explanation was: "and the
> way to simply get rid of them would be to set your archive command to
> '/bin/true', say".. but didn't want to make it seem like I was suggesting
> Paul do that)
>
> On Thu, Jan 23, 2025, 11:07=E2=80=AFa.m. Adrian Klaver <adrian.klaver@akl=
aver.com>
> wrote:
>
>> On 1/23/25 06:51, Saul Perdomo wrote:
>>
>> > This is why everybody will tell you "don't just delete these files,
>> > archive them properly!" Again, for operational purposes, you could jus=
t
>> > delete them. But you really want to make a /copy /of them before you
>> > do... you know, /just in case /something bad happens to your DB that
>> > makes you want to roll it back in time.
>>
>> No you can't just delete them for operational purposes without knowledge
>> of whether they are still needed or not.
>>
>> Per:
>>
>> https://www.postgresql.org/docs/current/wal-intro.html
>>
>> and
>>
>> https://www.postgresql.org/docs/current/wal-configuration.html
>>
>> Short version, a WAL file must remain until a checkpoint is done that
>> makes it's content no longer needed.
>>
>> > Cheers
>> > Saul
>> >
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>>
>>

--=20
Kind Regards,
Paul Brindusa
paulbrindusa88@gmail.com

--000000000000449efa062c71ea5a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Good morning everyone,<div><br></div><div>Following some t=
roubleshooting last night we have managed to resolve the issue.</div><div>P=
lowing through the entire thing=C2=A0 we have actually came to the conclusi=
on that the cluster is running but not replicating.</div><div>So the number=
 one lesson learned is to always check <u style=3D"font-weight:bold">replic=
ation</u>=C2=A0in the cluster, for the sake of data=C2=A0 safety and not ha=
ving to go through a million things.=C2=A0</div><div><br></div><div>This cl=
uster is set up without a VIP, therefore db01 will always be the master. Ha=
ving it set up this way we have found that pb_hba.conf had this line:</div>=
<div><br></div><div>=C2=A0host =C2=A0 =C2=A0all =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 postgres=C2=A0 =C2=A0 =C2=A0 =C2=A0 db01/cidr=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 trust</div><div><br></div><div>From my understanding t=
his means that the master was trying to replicate to itself and not trustin=
g the other nodes?=C2=A0</div><div><br></div><div>To fix we have put down t=
he entire network:=C2=A0</div><div><br></div><div><div>=C2=A0host =C2=A0 =
=C2=A0all =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 postgres=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 network/cidr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 trust</div><di=
v><br></div></div><div>Following the config amendment we have restarted the=
 replicas with patronictl -c /path/patroni/config reinit &lt;cluster&gt; &l=
t;host&gt;</div><div><br></div><div>Happy to say that clean up of wal files=
 kicked in and now are down 4% usage of /var volume from 96%.</div><div><br=
></div><div>Now then, there is still the bit with the actual postgres logs =
not rotating properly? lol, but ill leave that for another email.</div><div=
><br></div><div>Massive thank you to all of you for the support.</div><div>=
<br></div><div><br></div></div><br><div class=3D"gmail_quote gmail_quote_co=
ntainer"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jan 23, 2025 at 7:02=
=E2=80=AFPM Saul Perdomo &lt;<a href=3D"mailto:saul.perdomo@gmail.com">saul=
.perdomo@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex"><div dir=3D"auto">Thanks for the correction Adrian - my o=
versimplification went too far, and into &quot;plain wrong&quot; territory.=
<div dir=3D"auto"><br></div><div dir=3D"auto">(The detail that I felt was t=
oo much for this explanation was: &quot;and the way to simply get rid of th=
em would be to set your archive command to &#39;/bin/true&#39;, say&quot;..=
 but didn&#39;t want to make it seem like I was suggesting Paul do that)=C2=
=A0</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gma=
il_attr">On Thu, Jan 23, 2025, 11:07=E2=80=AFa.m. Adrian Klaver &lt;<a href=
=3D"mailto:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klaver@aklav=
er.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex">On 1/23/25 06:51, Saul Perdomo wrote:<br>
<br>
&gt; This is why everybody will tell you &quot;don&#39;t just delete these =
files, <br>
&gt; archive them properly!&quot; Again, for operational purposes, you coul=
d just <br>
&gt; delete them. But you really want to make a /copy /of them before=C2=A0=
you <br>
&gt; do... you know, /just in case /something bad happens to your DB that <=
br>
&gt; makes you want to roll it back in time.<br>
<br>
No you can&#39;t just delete them for operational purposes without knowledg=
e <br>
of whether they are still needed or not.<br>
<br>
Per:<br>
<br>
<a href=3D"https://www.postgresql.org/docs/current/wal-intro.html" rel=3D"n=
oreferrer noreferrer" target=3D"_blank">https://www.postgresql.org/docs/cur=
rent/wal-intro.html</a><br>
<br>
and<br>
<br>
<a href=3D"https://www.postgresql.org/docs/current/wal-configuration.html" =
rel=3D"noreferrer noreferrer" target=3D"_blank">https://www.postgresql.org/=
docs/current/wal-configuration.html</a><br>
<br>
Short version, a WAL file must remain until a checkpoint is done that <br>
makes it&#39;s content no longer needed.<br>
<br>
&gt; Cheers<br>
&gt; Saul<br>
&gt; <br>
<br>
-- <br>
Adrian Klaver<br>
<a href=3D"mailto:adrian.klaver@aklaver.com" rel=3D"noreferrer" target=3D"_=
blank">adrian.klaver@aklaver.com</a><br>
<br>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr"><div>Kind Regards,</div><div>Paul Brindusa</div>=
<div><a href=3D"mailto:paulbrindusa88@gmail.com" target=3D"_blank">paulbrin=
dusa88@gmail.com</a></div><div><br></div></div></div>

--000000000000449efa062c71ea5a--