MIME-Version: 1.0
References: <CAFeSbqjySUzp_Q4XNR_ajnV84O=om0f5rzpjPsQaNitDkxXjHw@mail.gmail.com>
 <CAPnRvGuJ5oxEJ1xmJ2ULU63VNCKMcpbvkU-vO9wEdehA2KyZRw@mail.gmail.com> <CAKAnmmJwRQekE-4bGAmVEoP9KQdLtWz9+=CzBYGa_11KTbeQ2g@mail.gmail.com>
In-Reply-To: <CAKAnmmJwRQekE-4bGAmVEoP9KQdLtWz9+=CzBYGa_11KTbeQ2g@mail.gmail.com>
From: Koen De Groote <kdg.dev@gmail.com>
Date: Sun, 3 Nov 2024 14:59:55 +0100
Message-ID: <CAGbX52E+rimrOsVW4r4Jn_brwGGVfw3qs2hipAoffcCYGfmexg@mail.gmail.com>
Subject: Re: pg_wal folder high disk usage
To: Greg Sabino Mullane <htamfids@gmail.com>
Cc: Muhammad Usman Khan <usman.k@bitnine.net>, Paul Brindusa <paulbrindusa88@gmail.com>, 
	pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000000779a7062602967d"
Archived-At: <https://www.postgresql.org/message-id/CAGbX52E%2BrimrOsVW4r4Jn_brwGGVfw3qs2hipAoffcCYGfmexg%40mail.gmail.com>
Precedence: bulk

--0000000000000779a7062602967d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

A possible reason for pg_wal buildup is that there is a sort of replication
going on(logical or physical replication) and the receiving side of the
replication has stopped somehow.

This means: a different server that has a connection to your server and is
expecting to receive data. And your server is then expecting to have to
send data(this is the important bit). There could be multiple of these
connections.

If even 1 of these receiving servers is down, or the network is out, or
there is some other reason that it is no longer requesting data from your
server, your server will notice it isn't getting confirmation from that
other side, that they have received the data. As such, your postgres server
will keep this data locally, expecting this situation to be solved in the
future, and at that point in time, send all the data the other side hasn't
gotten yet.

This is 1 option. As long as your server is configured to expect that other
server to be there, and to be receiving, the buildup will continue. Taking
the other server offline won't help, in fact it is likely the cause of the
issue. The official documentation explains how to get rid of replication
slots, ideally your DBA should handle this.

Laurenz's blogpost lays out all the options, for instance it can also
happen that your system is generating data so fast, the writing of the WAL
files cannot keep up. Or your setup also does WAL archiving and the
compression on that is slow.

The post offers some ways to verify things, I suggest checking them out.

And of course, if your DBA is back, have them look at it too.

Regards,
Koen De Groote


On Fri, Nov 1, 2024 at 2:10=E2=80=AFPM Greg Sabino Mullane <htamfids@gmail.=
com>
wrote:

> On Fri, Nov 1, 2024 at 2:40=E2=80=AFAM Muhammad Usman Khan <usman.k@bitni=
ne.net>
> wrote:
>
>> For immediate space, move older files from pg_Wal to another storage but
>> don't delete them.
>>
>
> No, do not do this! Figure out why WAL is not getting removed by Postgres
> and let it do its job once fixed. Please recall the original poster is
> trying to figure out what to do because they are not the database admin, =
so
> having them figure out which WAL are "older" and safe to move is not good
> advice.
>
> Resizing the disk is a better option. Could also see if there are other
> large files on that volume that can be removed or moved elsewhere, esp.
> large log files.
>
> Hopefully all of this is moot because their DBA is back from leave. :)
>
> Cheers,
> Greg
>
>
>

--0000000000000779a7062602967d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>A possible reason for pg_wal buildup is that there is=
 a sort of replication going on(logical or physical replication) and the re=
ceiving side of the replication has stopped somehow.<br></div><div><br></di=
v><div>This means: a different server that has a connection to your server =
and is expecting to receive data. And your server is then expecting to have=
 to send data(this is the important bit). There could be multiple of these =
connections.<br></div><div><br></div><div>If even 1 of these receiving serv=
ers is down, or the network is out, or there is some other reason that it i=
s no longer requesting data from your server, your server will notice it is=
n&#39;t getting confirmation from that other side, that they have received =
the data. As such, your postgres server will keep this data locally, expect=
ing this situation to be solved in the future, and at that point in time, s=
end all the data the other side hasn&#39;t gotten yet.<br></div><div><br></=
div><div>This is 1 option. As long as your server is configured to expect t=
hat other server to be there, and to be receiving, the buildup will continu=
e. Taking the other server offline won&#39;t help, in fact it is likely the=
 cause of the issue. The official documentation explains how to get rid of =
replication slots, ideally your DBA should handle this.<br></div><div><br><=
/div><div>
Laurenz&#39;s blogpost lays out all the options, for instance it can also h=
appen that your system is generating data so fast, the writing of the WAL f=
iles cannot keep up. Or your setup also does WAL archiving and the compress=
ion on that is slow.</div><div><br></div><div>The post offers some ways to =
verify things, I suggest checking them out.</div><div><br></div><div>And of=
 course, if your DBA is back, have them look at it too.</div><div><br></div=
><div>Regards,</div><div>Koen De Groote<br></div><div><br></div></div><br><=
div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Nov=
 1, 2024 at 2:10=E2=80=AFPM Greg Sabino Mullane &lt;<a href=3D"mailto:htamf=
ids@gmail.com">htamfids@gmail.com</a>&gt; wrote:<br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">On Fri, =
Nov 1, 2024 at 2:40=E2=80=AFAM Muhammad Usman Khan &lt;<a href=3D"mailto:us=
man.k@bitnine.net" target=3D"_blank">usman.k@bitnine.net</a>&gt; wrote:</di=
v><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1=
ex"><div dir=3D"ltr">For immediate space, move older files from pg_Wal to a=
nother storage but don&#39;t delete them.<br></div></blockquote><div><br></=
div><div>No, do not do this! Figure out why WAL is not getting removed by P=
ostgres and let it do its job once fixed. Please recall the original poster=
 is trying to figure out what to do because=C2=A0they are not the database =
admin, so having them figure out which WAL are &quot;older&quot; and safe t=
o move is not good advice.</div><div><br></div><div>Resizing the disk is a =
better option. Could also see if there are other large files on that volume=
 that can be removed or moved elsewhere, esp. large log files.</div><div><b=
r></div><div>Hopefully all of this is moot because their DBA is back from l=
eave. :)=C2=A0</div><div><br></div><div>Cheers,</div><div>Greg</div><div><b=
r></div><div><br></div></div></div>
</blockquote></div>

--0000000000000779a7062602967d--