MIME-Version: 1.0
References: <CAGRY4nz94+q_zVxj+dnk7zqm-McBz4mSza_wALKiw2==23MiGQ@mail.gmail.com>
In-Reply-To: <CAGRY4nz94+q_zVxj+dnk7zqm-McBz4mSza_wALKiw2==23MiGQ@mail.gmail.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Thu, 17 Jul 2025 23:34:09 -0400
Message-ID: <CANzqJaBHUaF5=5ebMxZrWJbDnrdCBD7LPW_azHNFQhS90UR4YQ@mail.gmail.com>
Subject: Re: Should we document the cost of pg_database_size()? Alternatives?
To: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000002dae8e063a2bcd3d"
Archived-At: <https://www.postgresql.org/message-id/CANzqJaBHUaF5%3D5ebMxZrWJbDnrdCBD7LPW_azHNFQhS90UR4YQ%40mail.gmail.com>
Precedence: bulk

--0000000000002dae8e063a2bcd3d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Jul 17, 2025 at 8:55=E2=80=AFPM Craig Ringer <craig.ringer@enterpri=
sedb.com>
wrote:
[snip]

>
> FS-based sizing isn't really enough
> ----------------
>
> Asking users to monitor at the filesystem level works, kind-of, but
> it'll lead to confusion due to WAL and temp files in simple installs.
> To get decent results they will need to have a separate dedicated
> volume for pg_wal. And which temp files are counted will differ; IIRC
> pg_database_size() does not count extents created by an in-progress
> REINDEX etc, but DOES count temp table sizes, for example. FS-based
> monitoring will also include things like spilled pg_replslot spilled
> reorder buffers, which can be considerable and aren't reasonably
> considered part of the "database size" or included in
> pg_database_size(). And of course it can see only the sum of all
> database sizes on a multi-database postgres instance unless the user
> has one volume per database using distinct tablespaces. So
> filesystem-based monitoring is not really a proper replacement.
>

Whether the filesystem creeps above 90%, 95%, etc because of WAL files or
temp files or because of REINDEX or VACUUM FULL / CLUSTER / PG_REPACK is
irrelevant. it's the filesystem at 100% that will ruin your day,

Thus, we monitor filesystems, and don't monitor database size.

If the alarm does ever go off, *then* I check the cause.  (This isn't as
reactionary as it sounds, because I regularly check replication backlog,
for orphan slots, do REINDEXING and CLUSTER one table at a time, and don't
let junk onto the cluster disk.)

--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--0000000000002dae8e063a2bcd3d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Thu, Jul 17, 2025 at 8:55=E2=80=AFPM C=
raig Ringer &lt;<a href=3D"mailto:craig.ringer@enterprisedb.com">craig.ring=
er@enterprisedb.com</a>&gt; wrote:</div><div class=3D"gmail_quote gmail_quo=
te_container"><div>[snip]</div><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left=
:1ex">
<br>
FS-based sizing isn&#39;t really enough<br>
----------------<br>
<br>
Asking users to monitor at the filesystem level works, kind-of, but<br>
it&#39;ll lead to confusion due to WAL and temp files in simple installs.<b=
r>
To get decent results they will need to have a separate dedicated<br>
volume for pg_wal. And which temp files are counted will differ; IIRC<br>
pg_database_size() does not count extents created by an in-progress<br>
REINDEX etc, but DOES count temp table sizes, for example. FS-based<br>
monitoring will also include things like spilled pg_replslot spilled<br>
reorder buffers, which can be considerable and aren&#39;t reasonably<br>
considered part of the &quot;database size&quot; or included in<br>
pg_database_size(). And of course it can see only the sum of all<br>
database sizes on a multi-database postgres instance unless the user<br>
has one volume per database using distinct tablespaces. So<br>
filesystem-based monitoring is not really a proper replacement.<br></blockq=
uote><div><br></div><div>Whether the filesystem creeps above 90%, 95%, etc =
because of WAL files or temp files or because of REINDEX or VACUUM FULL / C=
LUSTER / PG_REPACK is irrelevant. it&#39;s the filesystem at 100% that will=
 ruin your day,</div><div><br></div><div>Thus, we monitor filesystems, and =
don&#39;t monitor database size.</div><div><div><br class=3D"gmail-Apple-in=
terchange-newline">If the alarm does ever go off,=C2=A0<i>then</i>=C2=A0I c=
heck the cause.=C2=A0 (This isn&#39;t as reactionary as it sounds, because =
I regularly check replication backlog, for orphan slots, do REINDEXING and =
CLUSTER one table at a time, and don&#39;t let=C2=A0junk onto the cluster d=
isk.)</div><div><br></div></div></div><span class=3D"gmail_signature_prefix=
">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr"=
>Death to &lt;Redacted&gt;, and butter sauce.<div>Don&#39;t boil me, I&#39;=
m still alive.<br><div><div>&lt;Redacted&gt; lobster!</div></div></div></di=
v></div></div>

--0000000000002dae8e063a2bcd3d--