public inbox for [email protected]
help / color / mirror / Atom feedFrom: Greg Burd <[email protected]>
To: Nathan Bossart <[email protected]>
To: David Rowley <[email protected]>
Cc: Sami Imseih <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Robert Treat <[email protected]>
Cc: Jeremy Schneider <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: another autovacuum scheduling thread
Date: Thu, 19 Mar 2026 05:55:49 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <abrOO8GCgkhMXvcq@nathan>
References: <abAzhNqs5F8sJDdm@nathan>
<abBEiOICOdbbhsEI@nathan>
<CAA5RZ0seLxDmH9UftUA3vS8DrW+7T9HV5XK8yAuQxA8BaRkg2g@mail.gmail.com>
<CAA5RZ0vfhAnFBp4HrBQc+ALaJMx6vCvMtnBi39ST_4nH9PZEjA@mail.gmail.com>
<abGP87A3JPIXDG2I@nathan>
<CAA5RZ0tWonKY=E5U51i-J8s2EcRoLqtUQyWv+pQ4_rjFKSn+AA@mail.gmail.com>
<abGmNAVIPSqX124E@nathan>
<CAA5RZ0sZE-gfJ0c9HJkOk9XeFQwZL2wuJwrtOX+ZfUBLDcpFMA@mail.gmail.com>
<abMR6MQGQyyKWep4@nathan>
<CAApHDvqHXYCb=x0hHeR1e21ksqFNk4q3_zoPk6V-LLu0VL491Q@mail.gmail.com>
<abrOO8GCgkhMXvcq@nathan>
On Wed, Mar 18, 2026, at 12:09 PM, Nathan Bossart wrote:
> On Wed, Mar 18, 2026 at 12:06:34PM +1300, David Rowley wrote:
>> I think it would have been better to have done this about 3 months
>> ago, but I think it's probably still fine to do now. Feature freeze is
>> still quite a long way from release. I do expect that the most likely
>> time that someone would find issues with this change would be during
>> beta or RC, as that's when people would give PostgreSQL production
>> workloads to try out. During the dev cycle, I expect it's *mostly*
>> just hackers giving the database toy workloads in a very targeted way
>> to something specific that they're hacking on. Anyway, now since
>> you've added the GUCs to control the weights, there's a way for users
>> to have some influence, so at least there's an escape hatch.
>
> Thanks for chiming in.
Hey Nathan, et. al., I'll chime in too! Apologies in advance for the length of the message. I need to learn how to be more concise...
First off, thanks for looking into this! I think your work is a significant improvement over where we are today and should be in v19. Put simply, I think that the v12 patch is no worse than pg_class order we use today and in many cases much better. That said, I think we should tweak it a bit more while we're at it.
>> I think the GUCs are probably a good idea. I expect the most likely
>> change that people might want to make would be to raise the priority
>> of analyze over vacuum since that's often much faster to complete. We
>> know that some workloads are very sensitive to outdated statistics.
>>
>> On the other hand, we shouldn't be taking adding 5 new autovacuum GUCs
>> lightly as there are already so many. If we are going to come up with
>> something better than this in a few years then it could be better to
>> wait to reduce the pain of having to remove GUCs in the future. I
>> don't personally have any better ideas, so I'd rather see it go in
>> than not.
>
> Yeah, adding these GUCs feels a bit like etching in stone, but if folks
> want configurability, and nobody has better ideas, I'm not sure what else
> to do.
I'm late in the review process. I know David Rowley proposed the unified scoring approach that became the foundation of this patch, and I think that's a great direction. However, I'm concerned that the patch's default scoring weights don't give XID-age urgency sufficient priority over dead-tuple urgency. The weight GUCs (autovacuum_vacuum_score_weight, etc.) can address this, but they max at 1.0, meaning you can only reduce dead-tuple priority, not increase XID priority.
>> I didn't look at the patch in detail, but noticed that you might want
>> to add a "See Section N.N.N for more information." to the new GUCs to
>> link to the text you've added on how they're used.
>
> Good idea. I've added that.
>
>> Do you think it's worth making the call to
>> list_sort(tables_to_process, TableToProcessComparator) conditional on
>> a variable set like: sort_required |= (score != 0.0);? I recall that
>> someone had concerns that the actual sort itself could add overhead.
>
> I don't think it's necessary. I tested sorting 1M and 10M tables with
> randomly generated scores (on my MacBook, with assertions enabled). The
> former took ~150 milliseconds, and the latter took ~1770 milliseconds. I
> suspect there are far bigger problems to worry about if you have anywhere
> near that many tables.
>
> --
> nathan
>
> Attachments:
> * v12-0001-autovacuum-scheduling-improvements.patch
I decided to model before/after behavior using discrete-event simulation. With a bit of LLM help, we no have autovacuum_simulation.py and autovacuum_simulation_fixed.py to compare all three approaches (Before/v12/Proposed small fix to v12) across 20 runs with randomized OID assignments.
Results from the models are at the end of this email. They simulate the before/after/fixed behavior of the v12 autovacuum scheduling patch and a suggested fix (I'll get to why in a minute).
They test a contention scenario: 100 tables (5 already past freeze_max_age, 15 approaching it at staggered rates, and 80 high-churn tables constantly exceeding their dead-tuple vacuum thresholds) competing for 3 autovacuum workers over 7 days. I'm sure there are other scenarios that we could test, this felt representational enough to me based on what I've seen.
With the first test, the "Before" scheduler processes tables in OID order; the "After" scheduler uses the patch's urgency scoring with all default GUC values.
The model of the v12 patch produces a clear system-wide improvement: the number of tables simultaneously past freeze_max_age peaks at roughly 20 under score-based scheduling versus 80+ under OID order. This happens because high-churn tables earn high dead-tuple scores, get vacuumed frequently, and their relfrozenxid ages reset as a side effect - preventing them from ever reaching freeze_max_age. Under OID order, high-OID active tables are *starved* (for no other reason than they were created later than their counterparts!) and their XID ages grow needlessly into the danger zone. The v12 patch also correctly prioritizes the most dangerous force-vacuum tables first (those closest to failsafe age), whereas OID order's success with any particular table is coincidental.
But there's a downside in v12, the simulation reveals a scoring-scale concern under default weights. Active tables accumulate dead-tuple scores of 18–70+ within minutes of their last vacuum, while force-vacuum tables that have just crossed freeze_max_age carry XID scores of only 1.0–4.0 (age/freeze_max_age). The exponential boost doesn't activate until failsafe age (1.6B), which is 8× the freeze_max_age threshold. In the gap between 200M and 1.6B, force-vacuum tables are consistently outranked by ordinary dead-tuple work. In the tested scenario this meant only 2 of 20 at-risk tables were actually vacuumed by the score-based scheduler (versus 5 by OID due to luck), and average per-table wraparound exposure was 26% worse.
One possible remedy within the current design would be to either raise the default autovacuum_freeze_score_weight or apply a floor multiplier when force_vacuum is true. For example, ensuring any force-vacuum score is at least as large as the maximum non-force-vacuum score in the current candidate set. Alternatively, the weight GUC range could be expanded above 1.0 to allow DBAs to explicitly boost XID-age priority. The existing 0.0–1.0 range only allows reducing component priority, which makes it difficult to express "wraparound prevention is more important than bloat control.
Tiered sorting using the existing wraparound flag might be the simplest and safest fix. While the thread discussed using scoring with exponential boost to prioritize wraparound tables, I think a two-tier approach (wraparound tier vs routine tier) would be more robust. We already treat wraparound as non-negotiable (force_vacuum bypasses av_enabled, triggers emergency behavior). The scoring system should reflect this by making wraparound a hard priority tier, not a score component competing with bloat cleanup.
Instead of sorting solely by score, sort first by wraparound, then by score within each tier. The score computation stays exactly as-is; it's only used for relative ordering among force_vacuum tables and among non-force-vacuum tables.
Something like:
/*
* Comparison function for sorting TableToProcess candidates.
*
* Tables at risk of wraparound are always processed before routine
* maintenance work. Within each tier, tables are sorted by descending
* urgency score.
*/
static int
TableToProcessComparator(const ListCell *a, const ListCell *b)
{
TableToProcess *ta = (TableToProcess *) lfirst(a);
TableToProcess *tb = (TableToProcess *) lfirst(b);
/* Wraparound prevention always takes priority */
if (ta->wraparound && !tb->wraparound)
return -1;
if (!ta->wraparound && tb->wraparound)
return 1;
/* Within same tier, highest score first */
if (ta->score > tb->score)
return -1;
if (ta->score < tb->score)
return 1;
return 0;
}
The simulation code, workload generator, and visualizations are attached (in "foo_tgz" because my last attempt at this email was stuck in the moderation queue). I'd welcome feedback on whether the scenario is representative. The autovacuum_simulation_fixed.py includes the proposed fix and also runs a number of iteration where it randomized the table Oids so as to remove any dependency on ordering that may be implied in our current algorithm (the before, or v18 algorithm).
The proposed addition to your fix is in the v20260318 patch attached.
best.
-greg
$ ./autovacuum_simulation.py
======================================================================
AUTOVACUUM SCHEDULING SIMULATION (v12 patch)
Accurate model of score-based prioritization
======================================================================
PostgreSQL config:
autovacuum_freeze_max_age = 200,000,000
vacuum_failsafe_age = 1,600,000,000
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor= 0.2
autovacuum_max_workers = 3
Simulation: 7 days, 60s steps, seed=42
Generating tables...
5 critical — already past freeze_max_age
15 aging — approaching freeze_max_age
80 active — high dead-tuple rate
BEFORE simulation (catalog OID order):
[OID order ] 0%
[OID order ] 20%
[OID order ] 40%
[OID order ] 60%
[OID order ] 80%
[OID order ] 100%
AFTER simulation (score-based priority):
[Score-based] 0%
[Score-based] 20%
[Score-based] 40%
[Score-based] 60%
[Score-based] 80%
[Score-based] 100%
==============================================================================
RESULTS: Exposure time (minutes at risk before vacuum)
Table OID Crossed Before After Change
------------------------------------------------------------------------------
critical_0 16465 day 0.0 10080m 10080m +0%
critical_1 16398 day 0.0 14m 10080m -71900%
critical_2 16387 day 0.0 4m 10080m -251900%
critical_3 16478 day 0.0 10080m 4m +100%
critical_4 16419 day 0.0 10080m 4m +100%
aging_0 16415 day 0.3 9648m 9648m +0%
aging_1 16412 day 0.7 9114m 9114m +0%
aging_7 16459 day 0.7 9027m 9027m +0%
aging_6 16395 day 0.8 9m 8982m -99700%
aging_10 16481 day 0.9 8721m 8721m +0%
aging_11 16472 day 0.9 8721m 8721m +0%
aging_2 16401 day 1.0 415m 8579m -1967%
aging_3 16397 day 1.0 12m 8578m -71383%
aging_4 16470 day 1.1 8499m 8499m +0%
aging_12 16411 day 1.1 8483m 8483m +0%
aging_8 16438 day 1.2 8296m 8296m +0%
aging_5 16453 day 1.3 8178m 8178m +0%
aging_14 16448 day 2.1 7101m 7101m +0%
aging_9 16388 day 2.2 4m 6979m -174375%
aging_13 16413 day 2.2 6959m 6959m +0%
------------------------------------------------------------------------------
AVERAGE 6172m 7806m -26%
MAXIMUM 10080m 10080m +0%
==============================================================================
Generating visualization...
✓ output/autovacuum_scheduling_impact.png
Done.
-------------------------------------------------------
$ ./autovacuum_simulation_fixed.py
========================================================================
THREE-WAY AUTOVACUUM SCHEDULING COMPARISON
Before (OID) vs v12 Patch (Score) vs Proposed Fix (Tiered)
========================================================================
Config: 3 workers, 7-day sim, 60s steps, 20 runs
Tables: 5 critical + 15 aging + 80 active = 100
freeze_max_age = 200,000,000
Estimated runtime: 3-8 minutes
Run OID avg Score avg Tiered avg
-------------------------------------------
1 7222m 7892m 4m
2 7642m 7892m 4m
3 7961m 7892m 4m
4 6333m 7892m 4m
5 8110m 7892m 4m
6 6359m 7892m 4m
7 6629m 7892m 4m
8 8526m 7892m 4m
9 6385m 7892m 4m
10 6813m 7892m 4m
11 8588m 7892m 4m
12 7261m 7892m 4m
13 6682m 7892m 4m
14 8035m 7892m 4m
15 5667m 7892m 4m
16 7595m 7892m 4m
17 6394m 7892m 4m
18 6686m 7892m 4m
19 7819m 7892m 4m
20 7178m 7892m 4m
========================================================================
AGGREGATE RESULTS
========================================================================
Avg exposure per run (minutes):
OID : 7194 ± 816 (min=5667, max=8588)
Score : 7892 ± 0 (min=7892, max=7892)
Tiered : 4 ± 0 (min=4, max=4)
Peak concurrent force_vacuum tables:
OID : 82 ± 3 (min=79, max=88)
Score : 20 ± 0 (min=20, max=20)
Tiered : 5 ± 0 (min=5, max=5)
Pairwise wins (lower avg exposure = better):
Score beats OID: 5/20 loses: 15/20 ties: 0/20
Tiered beats OID: 20/20 loses: 0/20 ties: 0/20
Tiered beats Score: 20/20 loses: 0/20 ties: 0/20
Variance (std dev of avg exposure across runs):
OID : 816 min
Score : 0 min
Tiered : 0 min
Per-table mean exposure (minutes):
Table OID Score Tiered
--------------------------------------------------------------
critical_0 7564±4470 10080±0 7±0
critical_1 8577±3671 10080±0 7±0
critical_2 9073±3099 8938±0 3±0
critical_3 8581±3661 4±0 4±0
critical_4 9167±2826 3±0 3±0
aging_0 7253±4256 9648±0 4±0
aging_1 7324±3675 9114±0 4±0
aging_2 6865±3517 8579±0 5±0
aging_3 7253±3714 8851±0 4±0
aging_4 6018±4014 8579±0 4±0
aging_5 6856±2951 8064±0 4±0
aging_6 6799±3482 8496±0 3±0
aging_7 5765±4318 8853±0 4±0
aging_8 7091±3227 8644±0 4±0
aging_9 6358±2738 7479±0 4±0
aging_10 7106±3620 8870±0 4±0
aging_11 7175±3673 8965±0 4±0
aging_12 6499±3299 8106±0 4±0
aging_13 6080±3108 7595±0 4±0
aging_14 6479±3913 8891±0 4±0
========================================================================
Completed in 108 seconds (1.8 minutes)
========================================================================
Generating visualization...
✓ output/three_way_comparison.png
Attachments:
[application/octet-stream] v20260318-0001-autovacuum-scheduling-improvements.patch (29.9K, 2-v20260318-0001-autovacuum-scheduling-improvements.patch)
download | inline diff:
From b907e947e495f717832644f5c740ec75782dc5d9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Fri, 10 Oct 2025 12:28:37 -0500
Subject: [PATCH v20260318] autovacuum scheduling improvements
---
doc/src/sgml/config.sgml | 90 ++++++
doc/src/sgml/maintenance.sgml | 92 ++++++
src/backend/postmaster/autovacuum.c | 263 ++++++++++++++----
src/backend/utils/misc/guc_parameters.dat | 40 +++
src/backend/utils/misc/postgresql.conf.sample | 5 +
src/include/postmaster/autovacuum.h | 6 +-
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 449 insertions(+), 48 deletions(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8cdd826fbd3..229f41353eb 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9395,6 +9395,96 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
</listitem>
</varlistentry>
+ <varlistentry id="guc-autovacuum-freeze-score-weight" xreflabel="autovacuum_freeze_score_weight">
+ <term><varname>autovacuum_freeze_score_weight</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_freeze_score_weight</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the scaling factor of the transaction ID age component of
+ the score used by autovacuum for prioritization purposes. The default
+ is <literal>1.0</literal>. This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server command
+ line. See <xref linkend="autovacuum-priority"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-autovacuum-multixact-freeze-score-weight" xreflabel="autovacuum_multixact_freeze_score_weight">
+ <term><varname>autovacuum_multixact_freeze_score_weight</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_multixact_freeze_score_weight</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the scaling factor of the multixact ID age component of the
+ score used by autovacuum for prioritization purposes. The default is
+ <literal>1.0</literal>. This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server command
+ line. See <xref linkend="autovacuum-priority"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-autovacuum-vacuum-score-weight" xreflabel="autovacuum_vacuum_score_weight">
+ <term><varname>autovacuum_vacuum_score_weight</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_vacuum_score_weight</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the scaling factor of the vacuum threshold component of the
+ score used by autovacuum for prioritization purposes. The default is
+ <literal>1.0</literal>. This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server command
+ line. See <xref linkend="autovacuum-priority"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-autovacuum-vacuum-insert-score-weight" xreflabel="autovacuum_vacuum_insert_score_weight">
+ <term><varname>autovacuum_vacuum_insert_score_weight</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_vacuum_insert_score_weight</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the scaling factor of the vacuum insert threshold component
+ of the score used by autovacuum for prioritization purposes. The
+ default is <literal>1.0</literal>. This parameter can only be set in
+ the <filename>postgresql.conf</filename> file or on the server command
+ line. See <xref linkend="autovacuum-priority"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="guc-autovacuum-analyze-score-weight" xreflabel="autovacuum_analyze_score_weight">
+ <term><varname>autovacuum_analyze_score_weight</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>autovacuum_analyze_score_weight</varname></primary>
+ <secondary>configuration parameter</secondary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Specifies the scaling factor of the analyze threshold component of the
+ score used by autovacuum for prioritization purposes. The default is
+ <literal>1.0</literal>. This parameter can only be set in the
+ <filename>postgresql.conf</filename> file or on the server command
+ line. See <xref linkend="autovacuum-priority"/> for more information.
+ </para>
+ </listitem>
+ </varlistentry>
+
</variablelist>
</sect2>
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 7c958b06273..b609f05be07 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -1061,6 +1061,98 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
effectively prevent autovacuums from ever completing.
</para>
</warning>
+
+ <sect3 id="autovacuum-priority">
+ <title>Autovacuum Prioritization</title>
+
+ <para>
+ Autovacuum decides what to process in two steps: first it chooses a
+ database, then it chooses the tables within that database. The autovacuum
+ launcher process prioritizes databases at risk of transaction ID or
+ multixact ID wraparound, else it chooses the database processed least
+ recently. As an exception, it skips databases with no connections or no
+ activity since the last statistics reset, unless at risk of wraparound.
+ </para>
+
+ <para>
+ Within a database, the autovacuum worker process builds a list of tables
+ that require vacuum or analyze and sorts them using a scoring system. It
+ scores each table by taking the maximum value of several component scores
+ representing various criteria important to vacuum or analyze. Those
+ components are as follows:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ The <emphasis>transaction ID</emphasis> component measures the age in
+ transactions of the table's
+ <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
+ field as compared to <xref linkend="guc-autovacuum-freeze-max-age"/>.
+ Furthermore, this component increases greatly once the age surpasses
+ <xref linkend="guc-vacuum-failsafe-age"/>. The final value for this
+ component can be adjusted via
+ <xref linkend="guc-autovacuum-freeze-score-weight"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <emphasis>multixact ID</emphasis> component measures the age in
+ multixacts of the table's
+ <structname>pg_class</structname>.<structfield>relminmxid</structfield>
+ field as compared to
+ <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>. Furthermore,
+ this component increases greatly once the age surpasses
+ <xref linkend="guc-vacuum-multixact-failsafe-age"/>. The final value
+ for this component can be adjusted via
+ <xref linkend="guc-autovacuum-multixact-freeze-score-weight"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <emphasis>vacuum</emphasis> component measures the number of updated
+ or deleted tuples as compared to the threshold calculated with
+ <xref linkend="guc-autovacuum-vacuum-threshold"/>,
+ <xref linkend="guc-autovacuum-vacuum-scale-factor"/>, and
+ <xref linkend="guc-autovacuum-vacuum-max-threshold"/>. The final value
+ for this component can be adjusted via
+ <xref linkend="guc-autovacuum-vacuum-score-weight"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <emphasis>vacuum insert</emphasis> component measures the number of
+ inserted tuples as compared to the threshold calculated with
+ <xref linkend="guc-autovacuum-vacuum-insert-threshold"/> and
+ <xref linkend="guc-autovacuum-vacuum-insert-scale-factor"/>. The final
+ value for this component can be adjusted via
+ <xref linkend="guc-autovacuum-vacuum-insert-score-weight"/>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <emphasis>analyze</emphasis> component measures the number of
+ inserted, updated, or deleted tuples as compared to the threshold
+ calculated with
+ <xref linkend="guc-autovacuum-analyze-threshold"/> and
+ <xref linkend="guc-autovacuum-analyze-scale-factor"/>. The final value
+ for this component can be adjusted via
+ <xref linkend="guc-autovacuum-analyze-score-weight"/>.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ To revert to the prioritization strategy used before
+ <productname>PostgreSQL</productname> 19 (i.e., the order the tables are
+ listed in the <literal>pg_class</literal> system catalog), set all of the
+ aforementioned "weight" parameters to <literal>0.0</literal>.
+ </para>
+ </sect3>
</sect2>
</sect1>
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 219673db930..53712f80849 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -62,6 +62,7 @@
*/
#include "postgres.h"
+#include <math.h>
#include <signal.h>
#include <sys/time.h>
#include <unistd.h>
@@ -130,7 +131,11 @@ int autovacuum_anl_thresh;
double autovacuum_anl_scale;
int autovacuum_freeze_max_age;
int autovacuum_multixact_freeze_max_age;
-
+double autovacuum_freeze_score_weight = 1.0;
+double autovacuum_multixact_freeze_score_weight = 1.0;
+double autovacuum_vacuum_score_weight = 1.0;
+double autovacuum_vacuum_insert_score_weight = 1.0;
+double autovacuum_analyze_score_weight = 1.0;
double autovacuum_vac_cost_delay;
int autovacuum_vac_cost_limit;
@@ -312,6 +317,13 @@ static AutoVacuumShmemStruct *AutoVacuumShmem;
static dlist_head DatabaseList = DLIST_STATIC_INIT(DatabaseList);
static MemoryContext DatabaseListCxt = NULL;
+typedef struct
+{
+ Oid oid;
+ double score;
+ bool wraparound;
+} TableToProcess;
+
/*
* Dummy pointer to persuade Valgrind that we've not leaked the array of
* avl_dbase structs. Make it global to ensure the compiler doesn't
@@ -350,7 +362,8 @@ static void relation_needs_vacanalyze(Oid relid, AutoVacOpts *relopts,
Form_pg_class classForm,
PgStat_StatTabEntry *tabentry,
int effective_multixact_freeze_max_age,
- bool *dovacuum, bool *doanalyze, bool *wraparound);
+ bool *dovacuum, bool *doanalyze, bool *wraparound,
+ double *score);
static void autovacuum_do_vac_analyze(autovac_table *tab,
BufferAccessStrategy bstrategy);
@@ -1867,6 +1880,33 @@ get_database_list(void)
return dblist;
}
+/*
+ * Comparison function for sorting autovac_table candidates.
+ *
+ * Tables at risk of wraparound (force_vacuum) are always processed
+ * before routine maintenance work. Within each tier, tables are
+ * sorted by descending urgency score.
+ */
+static int
+TableToProcessComparator(const ListCell *a, const ListCell *b)
+{
+ TableToProcess *ta = (TableToProcess *) lfirst(a);
+ TableToProcess *tb = (TableToProcess *) lfirst(b);
+
+ /* Wraparound prevention always takes priority */
+ if (ta->wraparound && !tb->wraparound)
+ return -1;
+ if (!ta->wraparound && tb->wraparound)
+ return 1;
+
+ /* Within same tier, highest score first */
+ if (ta->score > tb->score)
+ return -1;
+ if (ta->score < tb->score)
+ return 1;
+ return 0;
+}
+
/*
* Process a database table-by-table
*
@@ -1880,7 +1920,7 @@ do_autovacuum(void)
HeapTuple tuple;
TableScanDesc relScan;
Form_pg_database dbForm;
- List *table_oids = NIL;
+ List *tables_to_process = NIL;
List *orphan_oids = NIL;
HASHCTL ctl;
HTAB *table_toast_map;
@@ -1992,6 +2032,7 @@ do_autovacuum(void)
bool dovacuum;
bool doanalyze;
bool wraparound;
+ double score = 0.0;
if (classForm->relkind != RELKIND_RELATION &&
classForm->relkind != RELKIND_MATVIEW)
@@ -2032,11 +2073,19 @@ do_autovacuum(void)
/* Check if it needs vacuum or analyze */
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
effective_multixact_freeze_max_age,
- &dovacuum, &doanalyze, &wraparound);
+ &dovacuum, &doanalyze, &wraparound,
+ &score);
- /* Relations that need work are added to table_oids */
+ /* Relations that need work are added to tables_to_process */
if (dovacuum || doanalyze)
- table_oids = lappend_oid(table_oids, relid);
+ {
+ TableToProcess *table = palloc_object(TableToProcess);
+
+ table->oid = relid;
+ table->score = score;
+ table->wraparound = wraparound;
+ tables_to_process = lappend(tables_to_process, table);
+ }
/*
* Remember TOAST associations for the second pass. Note: we must do
@@ -2092,6 +2141,7 @@ do_autovacuum(void)
bool dovacuum;
bool doanalyze;
bool wraparound;
+ double score = 0.0;
/*
* We cannot safely process other backends' temp tables, so skip 'em.
@@ -2124,11 +2174,18 @@ do_autovacuum(void)
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
effective_multixact_freeze_max_age,
- &dovacuum, &doanalyze, &wraparound);
+ &dovacuum, &doanalyze, &wraparound,
+ &score);
/* ignore analyze for toast tables */
if (dovacuum)
- table_oids = lappend_oid(table_oids, relid);
+ {
+ TableToProcess *table = palloc_object(TableToProcess);
+
+ table->oid = relid;
+ table->score = score;
+ tables_to_process = lappend(tables_to_process, table);
+ }
/* Release stuff to avoid leakage */
if (free_relopts)
@@ -2252,6 +2309,8 @@ do_autovacuum(void)
MemoryContextSwitchTo(AutovacMemCxt);
}
+ list_sort(tables_to_process, TableToProcessComparator);
+
/*
* Optionally, create a buffer access strategy object for VACUUM to use.
* We use the same BufferAccessStrategy object for all tables VACUUMed by
@@ -2280,9 +2339,9 @@ do_autovacuum(void)
/*
* Perform operations on collected tables.
*/
- foreach(cell, table_oids)
+ foreach_ptr(TableToProcess, table, tables_to_process)
{
- Oid relid = lfirst_oid(cell);
+ Oid relid = table->oid;
HeapTuple classTup;
autovac_table *tab;
bool isshared;
@@ -2513,7 +2572,7 @@ deleted:
pg_atomic_test_set_flag(&MyWorkerInfo->wi_dobalance);
}
- list_free(table_oids);
+ list_free_deep(tables_to_process);
/*
* Perform additional work items, as requested by backends.
@@ -2915,6 +2974,7 @@ recheck_relation_needs_vacanalyze(Oid relid,
bool *wraparound)
{
PgStat_StatTabEntry *tabentry;
+ double score;
/* fetch the pgstat table entry */
tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
@@ -2922,15 +2982,12 @@ recheck_relation_needs_vacanalyze(Oid relid,
relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
effective_multixact_freeze_max_age,
- dovacuum, doanalyze, wraparound);
+ dovacuum, doanalyze, wraparound,
+ &score);
/* Release tabentry to avoid leakage */
if (tabentry)
pfree(tabentry);
-
- /* ignore ANALYZE for toast tables */
- if (classForm->relkind == RELKIND_TOASTVALUE)
- *doanalyze = false;
}
/*
@@ -2971,6 +3028,43 @@ recheck_relation_needs_vacanalyze(Oid relid,
* autovacuum_vacuum_threshold GUC variable. Similarly, a vac_scale_factor
* value < 0 is substituted with the value of
* autovacuum_vacuum_scale_factor GUC variable. Ditto for analyze.
+ *
+ * This function also returns a score that can be used to sort the list of
+ * tables to process. The idea is to have autovacuum prioritize tables that
+ * are furthest beyond their thresholds (e.g., a table nearing transaction ID
+ * wraparound should be vacuumed first). This prioritization scheme is
+ * certainly far from perfect; there are simply too many possibilities for any
+ * scoring technique to work across all workloads, and the situation might
+ * change significantly between the time we calculate the score and the time
+ * that autovacuum gets to processing it. However, we have attempted to
+ * develop something that is expected to work for a large portion of workloads
+ * with reasonable parameter settings.
+ *
+ * The score is calculated as the maximum of the ratios of each of the table's
+ * relevant values to its threshold. For example, if the number of inserted
+ * tuples is 100, and the insert threshold for the table is 80, the insert
+ * score is 1.25. If all other scores are below that value, the returned score
+ * will be 1.25. The other criteria considered for the score are the table
+ * ages (both relfrozenxid and relminmxid) compared to the corresponding
+ * freeze-max-age setting, the number of updated/deleted tuples compared to the
+ * vacuum threshold, and the number of inserted/updated/deleted tuples compared
+ * to the analyze threshold.
+ *
+ * One exception to the previous paragraph is for tables nearing wraparound,
+ * i.e., those that have surpassed the effective failsafe ages. In that case,
+ * the relfrozen/relminmxid-based score is scaled aggressively so that the
+ * table has a decent chance of sorting to the top of the list.
+ *
+ * To adjust how strongly each component contributes to the score, the
+ * following parameters can be adjusted from their default of 1.0 to anywhere
+ * between 0.0 and 1.0 (inclusive). Setting all of these to 0.0 restores
+ * pre-v19 prioritization behavior:
+ *
+ * autovacuum_freeze_score_weight
+ * autovacuum_multixact_freeze_score_weight
+ * autovacuum_vacuum_score_weight
+ * autovacuum_vacuum_insert_score_weight
+ * autovacuum_analyze_score_weight
*/
static void
relation_needs_vacanalyze(Oid relid,
@@ -2981,7 +3075,8 @@ relation_needs_vacanalyze(Oid relid,
/* output params below */
bool *dovacuum,
bool *doanalyze,
- bool *wraparound)
+ bool *wraparound,
+ double *score)
{
bool force_vacuum;
bool av_enabled;
@@ -3010,11 +3105,16 @@ relation_needs_vacanalyze(Oid relid,
int multixact_freeze_max_age;
TransactionId xidForceLimit;
TransactionId relfrozenxid;
+ MultiXactId relminmxid;
MultiXactId multiForceLimit;
Assert(classForm != NULL);
Assert(OidIsValid(relid));
+ *score = 0.0;
+ *dovacuum = false;
+ *doanalyze = false;
+
/*
* Determine vacuum/analyze equation parameters. We have two possible
* sources: the passed reloptions (which could be a main table or a toast
@@ -3062,17 +3162,17 @@ relation_needs_vacanalyze(Oid relid,
av_enabled = (relopts ? relopts->enabled : true);
+ relfrozenxid = classForm->relfrozenxid;
+ relminmxid = classForm->relminmxid;
+
/* Force vacuum if table is at risk of wraparound */
xidForceLimit = recentXid - freeze_max_age;
if (xidForceLimit < FirstNormalTransactionId)
xidForceLimit -= FirstNormalTransactionId;
- relfrozenxid = classForm->relfrozenxid;
force_vacuum = (TransactionIdIsNormal(relfrozenxid) &&
TransactionIdPrecedes(relfrozenxid, xidForceLimit));
if (!force_vacuum)
{
- MultiXactId relminmxid = classForm->relminmxid;
-
multiForceLimit = recentMulti - multixact_freeze_max_age;
if (multiForceLimit < FirstMultiXactId)
multiForceLimit -= FirstMultiXactId;
@@ -3081,13 +3181,58 @@ relation_needs_vacanalyze(Oid relid,
}
*wraparound = force_vacuum;
+ /* Update the score. */
+ if (force_vacuum)
+ {
+ uint32 xid_age;
+ uint32 mxid_age;
+ double xid_score;
+ double mxid_score;
+ int effective_xid_failsafe_age;
+ int effective_mxid_failsafe_age;
+
+ /*
+ * To calculate the (M)XID age portion of the score, divide the age by
+ * its respective *_freeze_max_age parameter.
+ */
+ xid_age = TransactionIdIsNormal(relfrozenxid) ? recentXid - relfrozenxid : 0;
+ mxid_age = MultiXactIdIsValid(relminmxid) ? recentMulti - relminmxid : 0;
+
+ xid_score = (double) xid_age / freeze_max_age;
+ mxid_score = (double) mxid_age / multixact_freeze_max_age;
+
+ /*
+ * To ensure tables are given increased priority once they begin
+ * approaching wraparound, we scale the score aggressively if the ages
+ * surpass vacuum_failsafe_age or vacuum_multixact_failsafe_age.
+ *
+ * As in vacuum_xid_failsafe_check(), the effective failsafe age is no
+ * less than 105% the value of the respective *_freeze_max_age
+ * parameter. Note that per-table settings could result in a low
+ * score even if the table surpasses the failsafe settings. However,
+ * this is a strange enough corner case that we don't bother trying to
+ * handle it.
+ */
+ effective_xid_failsafe_age = Max(vacuum_failsafe_age,
+ autovacuum_freeze_max_age * 1.05);
+ effective_mxid_failsafe_age = Max(vacuum_multixact_failsafe_age,
+ autovacuum_multixact_freeze_max_age * 1.05);
+
+ if (xid_age >= effective_xid_failsafe_age)
+ xid_score = pow(xid_score, Max(1.0, (double) xid_age / 100000000));
+ if (mxid_age >= effective_mxid_failsafe_age)
+ mxid_score = pow(mxid_score, Max(1.0, (double) mxid_age / 100000000));
+
+ xid_score *= autovacuum_freeze_score_weight;
+ mxid_score *= autovacuum_multixact_freeze_score_weight;
+
+ *score = Max(xid_score, mxid_score);
+ *dovacuum = true;
+ }
+
/* User disabled it in pg_class.reloptions? (But ignore if at risk) */
if (!av_enabled && !force_vacuum)
- {
- *doanalyze = false;
- *dovacuum = false;
return;
- }
/*
* If we found stats for the table, and autovacuum is currently enabled,
@@ -3136,34 +3281,58 @@ relation_needs_vacanalyze(Oid relid,
vac_ins_scale_factor * reltuples * pcnt_unfrozen;
anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
+ /*
+ * Determine if this table needs vacuum, and update the score if it
+ * does.
+ */
+ if (vactuples > vacthresh)
+ {
+ double vacthresh_score;
+
+ vacthresh_score = (double) vactuples / Max(vacthresh, 1);
+ vacthresh_score *= autovacuum_vacuum_score_weight;
+
+ *score = Max(*score, vacthresh_score);
+ *dovacuum = true;
+ }
+
+ if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh)
+ {
+ double vacinsthresh_score;
+
+ vacinsthresh_score = (double) instuples / Max(vacinsthresh, 1);
+ vacinsthresh_score *= autovacuum_vacuum_insert_score_weight;
+
+ *score = Max(*score, vacinsthresh_score);
+ *dovacuum = true;
+ }
+
+ /*
+ * Determine if this table needs analyze, and update the score if it
+ * does. Note that we don't analyze TOAST tables and pg_statistic.
+ */
+ if (anltuples > anlthresh &&
+ relid != StatisticRelationId &&
+ classForm->relkind != RELKIND_TOASTVALUE)
+ {
+ double anlthresh_score;
+
+ anlthresh_score = (double) anltuples / Max(anlthresh, 1);
+ anlthresh_score *= autovacuum_analyze_score_weight;
+
+ *score = Max(*score, anlthresh_score);
+ *doanalyze = true;
+ }
+
if (vac_ins_base_thresh >= 0)
- elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: %.0f (threshold %.0f), anl: %.0f (threshold %.0f), score: %.3f",
NameStr(classForm->relname),
- vactuples, vacthresh, instuples, vacinsthresh, anltuples, anlthresh);
+ vactuples, vacthresh, instuples, vacinsthresh, anltuples, anlthresh, *score);
else
- elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: (disabled), anl: %.0f (threshold %.0f)",
+ elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), ins: (disabled), anl: %.0f (threshold %.0f), score %.3f",
NameStr(classForm->relname),
- vactuples, vacthresh, anltuples, anlthresh);
-
- /* Determine if this table needs vacuum or analyze. */
- *dovacuum = force_vacuum || (vactuples > vacthresh) ||
- (vac_ins_base_thresh >= 0 && instuples > vacinsthresh);
- *doanalyze = (anltuples > anlthresh);
+ vactuples, vacthresh, anltuples, anlthresh, *score);
}
- else
- {
- /*
- * Skip a table not found in stat hash, unless we have to force vacuum
- * for anti-wrap purposes. If it's not acted upon, there's no need to
- * vacuum it.
- */
- *dovacuum = force_vacuum;
- *doanalyze = false;
- }
-
- /* ANALYZE refuses to work with pg_statistic */
- if (relid == StatisticRelationId)
- *doanalyze = false;
}
/*
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0c9854ad8fc..b47bc534ee1 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -136,6 +136,14 @@
max => '100.0',
},
+{ name => 'autovacuum_analyze_score_weight', type => 'real', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
+ short_desc => 'Scaling factor of analyze score for autovacuum prioritization.',
+ variable => 'autovacuum_analyze_score_weight',
+ boot_val => '1.0',
+ min => '0.0',
+ max => '1.0',
+},
+
{ name => 'autovacuum_analyze_threshold', type => 'int', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
short_desc => 'Minimum number of tuple inserts, updates, or deletes prior to analyze.',
variable => 'autovacuum_anl_thresh',
@@ -154,6 +162,14 @@
max => '2000000000',
},
+{ name => 'autovacuum_freeze_score_weight', type => 'real', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
+ short_desc => 'Scaling factor of freeze score for autovacuum prioritization.',
+ variable => 'autovacuum_freeze_score_weight',
+ boot_val => '1.0',
+ min => '0.0',
+ max => '1.0',
+},
+
{ name => 'autovacuum_max_workers', type => 'int', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
short_desc => 'Sets the maximum number of simultaneously running autovacuum worker processes.',
variable => 'autovacuum_max_workers',
@@ -171,6 +187,14 @@
max => '2000000000',
},
+{ name => 'autovacuum_multixact_freeze_score_weight', type => 'real', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
+ short_desc => 'Scaling factor of multixact freeze score for autovacuum prioritization.',
+ variable => 'autovacuum_multixact_freeze_score_weight',
+ boot_val => '1.0',
+ min => '0.0',
+ max => '1.0',
+},
+
{ name => 'autovacuum_naptime', type => 'int', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
short_desc => 'Time to sleep between autovacuum runs.',
flags => 'GUC_UNIT_S',
@@ -207,6 +231,14 @@
max => '100.0',
},
+{ name => 'autovacuum_vacuum_insert_score_weight', type => 'real', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
+ short_desc => 'Scaling factor of vacuum insert score for autovacuum prioritization.',
+ variable => 'autovacuum_vacuum_insert_score_weight',
+ boot_val => '1.0',
+ min => '0.0',
+ max => '1.0',
+},
+
{ name => 'autovacuum_vacuum_insert_threshold', type => 'int', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
short_desc => 'Minimum number of tuple inserts prior to vacuum.',
long_desc => '-1 disables insert vacuums.',
@@ -233,6 +265,14 @@
max => '100.0',
},
+{ name => 'autovacuum_vacuum_score_weight', type => 'real', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
+ short_desc => 'Scaling factor of vacuum score for autovacuum prioritization.',
+ variable => 'autovacuum_vacuum_score_weight',
+ boot_val => '1.0',
+ min => '0.0',
+ max => '1.0',
+},
+
{ name => 'autovacuum_vacuum_threshold', type => 'int', context => 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
short_desc => 'Minimum number of tuple updates or deletes prior to vacuum.',
variable => 'autovacuum_vac_thresh',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e4abe6c0077..ab9cbebbc3f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -733,6 +733,11 @@
#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age
# before forced vacuum
# (change requires restart)
+#autovacuum_freeze_score_weight = 1.0
+#autovacuum_multixact_freeze_score_weight = 1.0
+#autovacuum_vacuum_score_weight = 1.0
+#autovacuum_vacuum_insert_score_weight = 1.0
+#autovacuum_analyze_score_weight = 1.0
#autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for
# autovacuum, in milliseconds;
# -1 means use vacuum_cost_delay
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index 5aa0f3a8ac1..b21d111d4d5 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -43,7 +43,11 @@ extern PGDLLIMPORT int autovacuum_freeze_max_age;
extern PGDLLIMPORT int autovacuum_multixact_freeze_max_age;
extern PGDLLIMPORT double autovacuum_vac_cost_delay;
extern PGDLLIMPORT int autovacuum_vac_cost_limit;
-
+extern PGDLLIMPORT double autovacuum_freeze_score_weight;
+extern PGDLLIMPORT double autovacuum_multixact_freeze_score_weight;
+extern PGDLLIMPORT double autovacuum_vacuum_score_weight;
+extern PGDLLIMPORT double autovacuum_vacuum_insert_score_weight;
+extern PGDLLIMPORT double autovacuum_analyze_score_weight;
extern PGDLLIMPORT int Log_autovacuum_min_duration;
extern PGDLLIMPORT int Log_autoanalyze_min_duration;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 174e2798443..bc4e6444a2d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -3088,6 +3088,7 @@ TableScanDesc
TableScanDescData
TableSpaceCacheEntry
TableSpaceOpts
+TableToProcess
TablespaceList
TablespaceListCell
TapeBlockTrailer
--
2.51.2
[application/octet-stream] foo_tgz (608.7K, 3-foo_tgz)
download
[image/png] three_way_comparison.png (377.2K, 4-three_way_comparison.png)
download | view image
view thread (143+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: another autovacuum scheduling thread
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox