MIME-Version: 1.0
References: 
 <CAApHDvp1=FOs6GneTzLSCHnCmC7z1_80=U3M=CKd82-pwS3YHg@mail.gmail.com>
 <aPuWev3D9M4iGCUt@nathan>
 <CAApHDvoM5MEHHBc0TNdrzkpq39WdEHSZhdWrtnx9zOWNXTSFGw@mail.gmail.com>
 <aP-YgrcPi0EhgR9x@nathan>
 <CAA5RZ0u2Mbks+O2DKBYen94AH3OMUcg+A7wvxrXYkmjTddBx4g@mail.gmail.com>
 <aP_g61kSkGAQOu3F@nathan>
 <CAA5RZ0sybfRyKp+DY+r=2U+-r7HfSF4GL1oVOOcVtEWmk2ewUw@mail.gmail.com>
 <CAApHDvpVE5F-_8rpPC+-L98mA0yK0S_jtQGqLn69fkRevf726g@mail.gmail.com>
 <aQEwRD5XW4wfJE6G@nathan>
 <CAA5RZ0uGbpaom3v+2K=bNuitnSGX_Lw4yPj3SbJdABYKgGur_A@mail.gmail.com>
 <aQI7tGEs8IOPxG64@nathan>
 <CAGjGUAJ-XqLN6VU+HeXn0iooiJrrnN4Pqp+xVU8VKKNsZcfR2w@mail.gmail.com>
 <CAApHDvrhEH8okuL+WVHxrC_W7JBSuvk6A5i3mzmrGTgp4RAvqg@mail.gmail.com>
In-Reply-To: 
 <CAApHDvrhEH8okuL+WVHxrC_W7JBSuvk6A5i3mzmrGTgp4RAvqg@mail.gmail.com>
From: wenhui qiu <qiuwenhuifx@gmail.com>
Date: Thu, 30 Oct 2025 14:48:18 +0800
Message-ID: 
 <CAGjGUA+ZJR0wxciBqXTpGs724k8oToseJ-n0AM-jo5cNRX56DQ@mail.gmail.com>
Subject: Re: another autovacuum scheduling thread
To: David Rowley <dgrowleyml@gmail.com>
Cc: Nathan Bossart <nathandbossart@gmail.com>,
 Sami Imseih <samimseih@gmail.com>,
	Robert Haas <robertmhaas@gmail.com>,
 Jeremy Schneider <schneider@ardentperf.com>,
	pgsql-hackers@postgresql.org
Content-Type: multipart/alternative; boundary="0000000000000d5f1406425aa325"
Archived-At: 
 <https://www.postgresql.org/message-id/CAGjGUA%2BZJR0wxciBqXTpGs724k8oToseJ-n0AM-jo5cNRX56DQ%40mail.gmail.com>
Precedence: bulk

--0000000000000d5f1406425aa325
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

HI
     I think there might be some misunderstanding =E2=80=94 I=E2=80=99m onl=
y suggesting
changing
effective_xid_failsafe_age =3D Max(vacuum_failsafe_age,
                                 autovacuum_freeze_max_age * 1.05);
to
effective_xid_failsafe_age =3D (vacuum_failsafe_age +
autovacuum_freeze_max_age) / 2.0;
In the current logic, effective_xid_failsafe_age is almost always equal to
vacuum_failsafe_age.
As a result, increasing the vacuum priority only when a table=E2=80=99s age=
 reaches
vacuum_failsafe_age is too late.


Thanks

On Thu, Oct 30, 2025 at 11:42=E2=80=AFAM David Rowley <dgrowleyml@gmail.com=
> wrote:

> On Thu, 30 Oct 2025 at 15:58, wenhui qiu <qiuwenhuifx@gmail.com> wrote:
> > In fact, with the introduction of the
> vacuum_max_eager_freeze_failure_rate feature, if a table=E2=80=99s age st=
ill
> exceeds more than 1.x times the autovacuum_freeze_max_age, it suggests th=
at
> the vacuum freeze process is not functioning properly. Once the age
> surpasses vacuum_failsafe_age, wraparound issues are likely to occur
> soon.Taking the average of vacuum_failsafe_age and
> autovacuum_freeze_max_age is not a complex approach. Under the default
> configuration, this average already exceeds four times the
> autovacuum_freeze_max_age. At that stage, a DBA should have already
> intervened to investigate and resolve why the table age is not decreasing=
.
>
> I don't think anyone would like to modify PostgreSQL in any way that
> increases the chances that a table gets as old as vacuum_failsafe_age.
> Regardless of the order in which tables are vacuumed, if a table gets
> as old as that then vacuum is configured to run too slowly, or there
> are not enough workers configured to cope with the given amount of
> work. I think we need to tackle prioritisation and rate limiting as
> two separate items. Nathan is proposing to improve the prioritisation
> in this thread and it seems to me that your concerns are with rate
> limiting. I've suggested an idea that might help with reducing the
> cost_delay based on the score of the table in this thread. I'd rather
> not introduce that as a topic for further discussion here (I imagine
> Nathan agrees). It's not as if the server is going to consume 1
> billion xids in 5 mins. It's at least going to take a day to days or
> longer for that to happen and if autovacuum has not managed to get on
> top of the workload in that time, then it's configured to run too
> slowly and the cost_limit or delay needs to be adjusted.
>
> My concern is that there are countless problems with autovacuum and if
> you try and lump them all into a single thread to fix them all at
> once, we'll get nowhere. Autovacuum was added to core in 8.1, 20 years
> ago and I don't believe we've done anything to change the ratelimiting
> aside from reducing the default cost_delay since then. It'd be good to
> fix that at some point, just not here, please.
>
> FWIW, I agree with Nathan about keeping the score calculation
> non-magical. The score should be simple and easy to document. We can
> introduce complexity to it as and when it's needed and when the
> supporting evidence arrives, rather than from people waving their
> hands.
>
> David
>

--0000000000000d5f1406425aa325
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">HI=C2=A0<div>=C2=A0 =C2=A0 =C2=A0I think there might be so=
me misunderstanding =E2=80=94 I=E2=80=99m only suggesting changing<br>effec=
tive_xid_failsafe_age =3D Max(vacuum_failsafe_age,<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0autovacuum_freeze_max_age * 1.05);<br>to<br>effecti=
ve_xid_failsafe_age =3D (vacuum_failsafe_age + autovacuum_freeze_max_age) /=
 2.0;<br>In the current logic, effective_xid_failsafe_age is almost always =
equal to vacuum_failsafe_age.<br>As a result, increasing the vacuum priorit=
y only when a table=E2=80=99s age reaches vacuum_failsafe_age is too late.<=
br></div><div><br></div><div><br></div><div>Thanks</div></div><br><div clas=
s=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_att=
r">On Thu, Oct 30, 2025 at 11:42=E2=80=AFAM David Rowley &lt;<a href=3D"mai=
lto:dgrowleyml@gmail.com">dgrowleyml@gmail.com</a>&gt; wrote:<br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);paddi=
ng-left:1ex">On Thu, 30 Oct 2025 at 15:58, wenhui qiu &lt;<a href=3D"mailto=
:qiuwenhuifx@gmail.com" target=3D"_blank">qiuwenhuifx@gmail.com</a>&gt; wro=
te:<br>
&gt; In fact, with the introduction of the vacuum_max_eager_freeze_failure_=
rate feature, if a table=E2=80=99s age still exceeds more than 1.x times th=
e autovacuum_freeze_max_age, it suggests that the vacuum freeze process is =
not functioning properly. Once the age surpasses vacuum_failsafe_age, wrapa=
round issues are likely to occur soon.Taking the average of vacuum_failsafe=
_age and autovacuum_freeze_max_age is not a complex approach. Under the def=
ault configuration, this average already exceeds four times the autovacuum_=
freeze_max_age. At that stage, a DBA should have already intervened to inve=
stigate and resolve why the table age is not decreasing.<br>
<br>
I don&#39;t think anyone would like to modify PostgreSQL in any way that<br=
>
increases the chances that a table gets as old as vacuum_failsafe_age.<br>
Regardless of the order in which tables are vacuumed, if a table gets<br>
as old as that then vacuum is configured to run too slowly, or there<br>
are not enough workers configured to cope with the given amount of<br>
work. I think we need to tackle prioritisation and rate limiting as<br>
two separate items. Nathan is proposing to improve the prioritisation<br>
in this thread and it seems to me that your concerns are with rate<br>
limiting. I&#39;ve suggested an idea that might help with reducing the<br>
cost_delay based on the score of the table in this thread. I&#39;d rather<b=
r>
not introduce that as a topic for further discussion here (I imagine<br>
Nathan agrees). It&#39;s not as if the server is going to consume 1<br>
billion xids in 5 mins. It&#39;s at least going to take a day to days or<br=
>
longer for that to happen and if autovacuum has not managed to get on<br>
top of the workload in that time, then it&#39;s configured to run too<br>
slowly and the cost_limit or delay needs to be adjusted.<br>
<br>
My concern is that there are countless problems with autovacuum and if<br>
you try and lump them all into a single thread to fix them all at<br>
once, we&#39;ll get nowhere. Autovacuum was added to core in 8.1, 20 years<=
br>
ago and I don&#39;t believe we&#39;ve done anything to change the ratelimit=
ing<br>
aside from reducing the default cost_delay since then. It&#39;d be good to<=
br>
fix that at some point, just not here, please.<br>
<br>
FWIW, I agree with Nathan about keeping the score calculation<br>
non-magical. The score should be simple and easy to document. We can<br>
introduce complexity to it as and when it&#39;s needed and when the<br>
supporting evidence arrives, rather than from people waving their<br>
hands.<br>
<br>
David<br>
</blockquote></div>

--0000000000000d5f1406425aa325--