Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEMSb-00Amk9-58 for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 06:48:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEMSa-006Ke9-2Y for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 06:48:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEMSZ-006Ke1-Nz for pgsql-hackers@lists.postgresql.org; Thu, 30 Oct 2025 06:48:34 +0000 Received: from mail-ed1-x532.google.com ([2a00:1450:4864:20::532]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vEMSW-0051TH-1q for pgsql-hackers@postgresql.org; Thu, 30 Oct 2025 06:48:34 +0000 Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-63c45c11be7so1000590a12.3 for ; Wed, 29 Oct 2025 23:48:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761806910; x=1762411710; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=94tMZ6V5COWISJh6hKXT7XCKOYKmQvFyxMkWWCYyLRw=; b=ndhfmUaGmL/Vxr/ohRJdDG6m8yEOAUj1lXeHc++tj1d690nPN0e+AMlGp+RMNud8LX 1ZivjGkxZNK3JtZcVPIPfviQhnX5WUteJVVh3gHRsX8P2X2PuTdZf3phutNU+vmGGPT1 5JbyLUhp8ORzOlXLgopFJfyuzcBDeTririieKw8C6ftqeBMSbRO4RBq/AhD94g/xhizW lg1bUoYYNvWUtRDADOtW0oPzpM2iCpK7pUtDVy8wcV33ckbvi+sOhMKzq+t4tqBPZRJY 1Sgvk9IruZPvGN0Ltuh6gKH/U1UTwuzWa7IHR+hz/xTjOb6KNPm/kEO9QrVulHKuqkq6 IHfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761806910; x=1762411710; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=94tMZ6V5COWISJh6hKXT7XCKOYKmQvFyxMkWWCYyLRw=; b=T3EnGjdHZIJCpOk0TtyBEyl9TTwh+j0x0GgwuQmr5/Tuh+lVrOEFEuA1W7lJak/3i3 W1okLX85f+qkbU64Avdp4ByEMmV7pYhGBJASrodDy+9BnR6UYnmLKZxG2wVdqHFk0mPg pMPRp9iy3V7Y8eFAmMI1qYtJQTZB41Mco7KVpI4XYqTyB+qKCpoUVH5fcxyIuYxTMQ9M SFTJPEO5ETau37XWeY4hS0Sa48hTHaET3WFrk/5hXJWdPLEu1jDnQ/4RlFh5CE9D0Ig+ sKi9kTb8FUoF0gqvtzCbK//dJmuf51LtcUbXN9HiXtdHo99FxcH8cffLImBejnJDIzEp NLYg== X-Forwarded-Encrypted: i=1; AJvYcCVi8yCRVkYps93UFD1JkNS2Mdsb8p5XcTLBdj26yMvGB71hgMOw9pn9V2OqQcjg+0gT/lR7nvid6U4AjB6A@postgresql.org X-Gm-Message-State: AOJu0YyQIVe+iU9MG/822AwtqiYaaFxzlW/TKWZtWod++sQtlo2EcNLV BwRKjpEyU0ZwxnQsySXFIEXoBRDJuPCgMS0n+4iPDvWhobwL3MIvlHxtT8fB+EQWRhXz+kgLid1 g/aUou9pQV+rJKVqgduZOuEydbojW+YU= X-Gm-Gg: ASbGncv+cxJnK5z1vGbUZjJK544bfVsIy7PxyjSkXFYlafxrEIwOc8FuwSWO00Z5rsu ZlFAEUz7szWwPm+J8BYP5yMXIAYJzeyxN9YThe+06Ktw+mi2Xea22WRzdk8T6yUDU9nSnauJDgw fu6kD65qQUqvELX13Px1siGZW7CLChZ/Lu9HCC7ckpeJ2RATyY0CImhq6rBAfd53oTKm7rY0PbK F0ilZ9HX5QAFbkQAnpHPQujPWEwP+tvBo17HjEFWtVfqiM/YzQ3d0bCQ6UvqcQx8NGBxA== X-Google-Smtp-Source: AGHT+IHfDIqG0lC/STuou4irdgOEfQO1AD3THAX72XXoNdclkM59FXwsQNKFDuK9vgNOl/EAkj2D8OckOpoExIkgV7s= X-Received: by 2002:a05:6402:35cd:b0:639:4c9:9c9e with SMTP id 4fb4d7f45d1cf-640441a9304mr4575766a12.10.1761806910315; Wed, 29 Oct 2025 23:48:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: wenhui qiu Date: Thu, 30 Oct 2025 14:48:18 +0800 X-Gm-Features: AWmQ_bmrB6Wc0o0BFVS3vkF0NYXPWA5oQhCiAwFr7uklbUXxMgn_3xpKXef5Klg Message-ID: Subject: Re: another autovacuum scheduling thread To: David Rowley Cc: Nathan Bossart , Sami Imseih , Robert Haas , Jeremy Schneider , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="0000000000000d5f1406425aa325" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000d5f1406425aa325 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable HI I think there might be some misunderstanding =E2=80=94 I=E2=80=99m onl= y suggesting changing effective_xid_failsafe_age =3D Max(vacuum_failsafe_age, autovacuum_freeze_max_age * 1.05); to effective_xid_failsafe_age =3D (vacuum_failsafe_age + autovacuum_freeze_max_age) / 2.0; In the current logic, effective_xid_failsafe_age is almost always equal to vacuum_failsafe_age. As a result, increasing the vacuum priority only when a table=E2=80=99s age= reaches vacuum_failsafe_age is too late. Thanks On Thu, Oct 30, 2025 at 11:42=E2=80=AFAM David Rowley wrote: > On Thu, 30 Oct 2025 at 15:58, wenhui qiu wrote: > > In fact, with the introduction of the > vacuum_max_eager_freeze_failure_rate feature, if a table=E2=80=99s age st= ill > exceeds more than 1.x times the autovacuum_freeze_max_age, it suggests th= at > the vacuum freeze process is not functioning properly. Once the age > surpasses vacuum_failsafe_age, wraparound issues are likely to occur > soon.Taking the average of vacuum_failsafe_age and > autovacuum_freeze_max_age is not a complex approach. Under the default > configuration, this average already exceeds four times the > autovacuum_freeze_max_age. At that stage, a DBA should have already > intervened to investigate and resolve why the table age is not decreasing= . > > I don't think anyone would like to modify PostgreSQL in any way that > increases the chances that a table gets as old as vacuum_failsafe_age. > Regardless of the order in which tables are vacuumed, if a table gets > as old as that then vacuum is configured to run too slowly, or there > are not enough workers configured to cope with the given amount of > work. I think we need to tackle prioritisation and rate limiting as > two separate items. Nathan is proposing to improve the prioritisation > in this thread and it seems to me that your concerns are with rate > limiting. I've suggested an idea that might help with reducing the > cost_delay based on the score of the table in this thread. I'd rather > not introduce that as a topic for further discussion here (I imagine > Nathan agrees). It's not as if the server is going to consume 1 > billion xids in 5 mins. It's at least going to take a day to days or > longer for that to happen and if autovacuum has not managed to get on > top of the workload in that time, then it's configured to run too > slowly and the cost_limit or delay needs to be adjusted. > > My concern is that there are countless problems with autovacuum and if > you try and lump them all into a single thread to fix them all at > once, we'll get nowhere. Autovacuum was added to core in 8.1, 20 years > ago and I don't believe we've done anything to change the ratelimiting > aside from reducing the default cost_delay since then. It'd be good to > fix that at some point, just not here, please. > > FWIW, I agree with Nathan about keeping the score calculation > non-magical. The score should be simple and easy to document. We can > introduce complexity to it as and when it's needed and when the > supporting evidence arrives, rather than from people waving their > hands. > > David > --0000000000000d5f1406425aa325 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
HI=C2=A0
=C2=A0 =C2=A0 =C2=A0I think there might be so= me misunderstanding =E2=80=94 I=E2=80=99m only suggesting changing
effec= tive_xid_failsafe_age =3D Max(vacuum_failsafe_age,
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0autovacuum_freeze_max_age * 1.05);
to
effecti= ve_xid_failsafe_age =3D (vacuum_failsafe_age + autovacuum_freeze_max_age) /= 2.0;
In the current logic, effective_xid_failsafe_age is almost always = equal to vacuum_failsafe_age.
As a result, increasing the vacuum priorit= y only when a table=E2=80=99s age reaches vacuum_failsafe_age is too late.<= br>


Thanks

On Thu, Oct 30, 2025 at 11:42=E2=80=AFAM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 30 Oct 2025 at 15:58, wenhui qiu <qiuwenhuifx@gmail.com> wro= te:
> In fact, with the introduction of the vacuum_max_eager_freeze_failure_= rate feature, if a table=E2=80=99s age still exceeds more than 1.x times th= e autovacuum_freeze_max_age, it suggests that the vacuum freeze process is = not functioning properly. Once the age surpasses vacuum_failsafe_age, wrapa= round issues are likely to occur soon.Taking the average of vacuum_failsafe= _age and autovacuum_freeze_max_age is not a complex approach. Under the def= ault configuration, this average already exceeds four times the autovacuum_= freeze_max_age. At that stage, a DBA should have already intervened to inve= stigate and resolve why the table age is not decreasing.

I don't think anyone would like to modify PostgreSQL in any way that increases the chances that a table gets as old as vacuum_failsafe_age.
Regardless of the order in which tables are vacuumed, if a table gets
as old as that then vacuum is configured to run too slowly, or there
are not enough workers configured to cope with the given amount of
work. I think we need to tackle prioritisation and rate limiting as
two separate items. Nathan is proposing to improve the prioritisation
in this thread and it seems to me that your concerns are with rate
limiting. I've suggested an idea that might help with reducing the
cost_delay based on the score of the table in this thread. I'd rather not introduce that as a topic for further discussion here (I imagine
Nathan agrees). It's not as if the server is going to consume 1
billion xids in 5 mins. It's at least going to take a day to days or longer for that to happen and if autovacuum has not managed to get on
top of the workload in that time, then it's configured to run too
slowly and the cost_limit or delay needs to be adjusted.

My concern is that there are countless problems with autovacuum and if
you try and lump them all into a single thread to fix them all at
once, we'll get nowhere. Autovacuum was added to core in 8.1, 20 years<= br> ago and I don't believe we've done anything to change the ratelimit= ing
aside from reducing the default cost_delay since then. It'd be good to<= br> fix that at some point, just not here, please.

FWIW, I agree with Nathan about keeping the score calculation
non-magical. The score should be simple and easy to document. We can
introduce complexity to it as and when it's needed and when the
supporting evidence arrives, rather than from people waving their
hands.

David
--0000000000000d5f1406425aa325--