public inbox for [email protected]  
help / color / mirror / Atom feed
From: Nathan Bossart <[email protected]>
To: Sami Imseih <[email protected]>
Cc: David Rowley <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Jeremy Schneider <[email protected]>
Cc: [email protected]
Subject: Re: another autovacuum scheduling thread
Date: Mon, 27 Oct 2025 16:15:23 -0500
Message-ID: <aP_g61kSkGAQOu3F@nathan> (raw)
In-Reply-To: <CAA5RZ0u2Mbks+O2DKBYen94AH3OMUcg+A7wvxrXYkmjTddBx4g@mail.gmail.com>
References: <CAA5RZ0vSPqd5vP4-17E6QELRgQzaoKChgp5TDPK9GhZEK=0Gjg@mail.gmail.com>
	<aPp4VyLo2Zqk7oCV@nathan>
	<CAA5RZ0sfQ-VSCSafsrvyJ7wsW1utLwtPVJ5N6hB0726BGRDrgQ@mail.gmail.com>
	<CAApHDvpxE8ci83d02dRE3-fMetb4Dc89-80FrjkGDz2q+ByJog@mail.gmail.com>
	<CAA5RZ0upTpKqgrdNfMSX7UJdjx=+=CsQ6Xct+vcCZPvUVhdZvw@mail.gmail.com>
	<CAApHDvp1=FOs6GneTzLSCHnCmC7z1_80=U3M=CKd82-pwS3YHg@mail.gmail.com>
	<aPuWev3D9M4iGCUt@nathan>
	<CAApHDvoM5MEHHBc0TNdrzkpq39WdEHSZhdWrtnx9zOWNXTSFGw@mail.gmail.com>
	<aP-YgrcPi0EhgR9x@nathan>
	<CAA5RZ0u2Mbks+O2DKBYen94AH3OMUcg+A7wvxrXYkmjTddBx4g@mail.gmail.com>

On Mon, Oct 27, 2025 at 12:47:15PM -0500, Sami Imseih wrote:
> 1/ Should we add documentation explaining this prioritization behavior in [0]?
> 
> I wrote a sql that returns the tables and scores, which I found was
> useful when I was testing this out, so having the actually rules spelled out
> in docs will actually be super useful.

Can you elaborate on how it would be useful?  I'd be open to adding a short
note that autovacuum attempts to prioritize the tables in a smart way, but
I'm not sure I see the value of documenting every detail.  I also don't
want to add too much friction to future changes to the prioritization
logic.

> If we don't want to go that much in depth, at minimum the docs should say:
> 
> "Autovacuum prioritizes tables based on how far they exceed their thresholds
> or if they are approaching wraparound limits." so a DBA can understand
> this behavior.

Yeah, I would probably choose to keep it relatively vague like this.

> * The score is calculated as the maximum of the ratios of each of the table's
> * relevant values to its threshold. For example, if the number of inserted
> * tuples is 100, and the insert threshold for the table is 80, the insert
> * score is 1.25.
> 
> Should we consider clamping down on the score when
> reltuples = -1, otherwise the scores for such tables ( new tables
> with a large amount of ingested data ) will be over-inflated? Perhaps,
> if reltuples = -1 ( # of reltuples not known ), then give a score of .5,
> so we are not over-prioritizing but not pushing down to the bottom?

I'm not sure it's worth expending too much energy to deal with this.  In
the worst case, the table will be given an arbitrarily high priority the
first time it is vacuumed, but AFAICT that's it.  But that's already the
case, as the thresholds will be artificially low before the first
VACUUM/ANALYZE.

-- 
nathan





view thread (143+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: another autovacuum scheduling thread
  In-Reply-To: <aP_g61kSkGAQOu3F@nathan>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox