Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vDVoL-00Acz3-53 for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Oct 2025 22:35:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vDVoI-009ayT-Ju for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Oct 2025 22:35:29 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vDVoI-009ayL-A1 for pgsql-hackers@lists.postgresql.org; Mon, 27 Oct 2025 22:35:29 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vDVoF-0046LT-1t for pgsql-hackers@postgresql.org; Mon, 27 Oct 2025 22:35:28 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-63c12ff0c5eso10367975a12.0 for ; Mon, 27 Oct 2025 15:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761604525; x=1762209325; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1asep/YUEvXE0x7HBIVGtu9EKTe7qfOcWkNs6bbpd9s=; b=MFw4Qxx+iNMl70UhDuVcwXn+WQU2aj8AI0QCLiz/sQeSy/jRF6dZeBlwncD6BqszPb RnA7wxq1wtLwLm0C/NqJpMzb0glnrVlYfnYOd7HyoHb8MnQIs8ifV2skul2x14IZXZOD YzJlxdL86t1hVERLcAgAChEZrOvZIoI2e6j/GNze2AxKVx/C9/rtEdNDJaTm+bkX82c5 ZRTEhRQXXv/ZiZIAebMh4+NWqHIGKuCyvrAwCnFuthaWZ9rWEyjFyUyg/fbg/8bREi5E QmUAse/q2u+sT1djJsh4/pwsEsGLdjLiy2fArCeq9r5SngR7JR0kz2HSiL+GDrYK86+T zy/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761604525; x=1762209325; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1asep/YUEvXE0x7HBIVGtu9EKTe7qfOcWkNs6bbpd9s=; b=fHesEfX4uskxGvPW+4vwSJHYddHjNrb9QleS3RnxtEz7y0zgGQTs4CASdA0p88rn2d GO7YUeZLohX/TkpEPoXmcDBwTAFm4qunvFAS4k4lbFQyjyE+TWYfj36LqrfopliAgs3F MkUp5Xzne6xopMhuOL4i9PeEf5ZwgfrUBx/iE9aLvDNe5R9dk2ECK9AgdBKmpFEcgPN2 8qFqUY2boOopm0shuWgi7a/gwxynURuQMZEK2go4eFJv/qNhgi2z/3vNaodi1Og1Bv+D Qe9Ly0RnhfQUZzGjr17HUyD2rJ5t1I3WMTkTk9p1gNWnRvXCBePJy77MHwFJ14i22DRn iCxw== X-Forwarded-Encrypted: i=1; AJvYcCUWS+knuRZcSyoitJCGfR0Ex1JpsP23kjW2o4uYWBuQyG5K8Vhlp1DtVl4JmVYXdqz8iikMl/nWr+9OsmVY@postgresql.org X-Gm-Message-State: AOJu0YxHjBnh7CicyjJHm5gZ/zgwWmk6nlwz4izmzNCxlEuofjA0Y2TB K4CCuy2SydEsid5Tj5gG3PUVZNCuAC1sUzHjKblGJtW1O3xYU2lne/xNLErydASUJzRtwGk9tVU xAHV4S/nERGP+Z/73MHTzahuxqVkEhNc= X-Gm-Gg: ASbGncsEZo2QhTXBYt6ooMN1UgHdC81GsUyDOsXivQ9Mp7Q6jPjGvOwBtdjp5MuRx99 SocIZTYK+31JZmawLDhOP4KW4ticLM4F3ZLxoe5XIwlNWC9XjbBx3p9VDmKIBIRarnu7Vb9/Fgk /LcKAuT1U4Ve8medfcoEA4BEJ5ZdonIZksviqwJRd8HphtQT9dYiQ1y8XnF1whvIA8eMNIR/2Oy lrKO50vXSpibsx6AikpJkUXCzhvVs1rQOq835J4WMM8FfUrz23GwzYq6oFm X-Google-Smtp-Source: AGHT+IEBJHylqZ9F01iRysAj8nwXAEFpj7+LTglyIE0YC235VS2BtscrJl44ZL8b7zd1DaVP6sBLOqLXErzyUowuz4c= X-Received: by 2002:a05:6402:1144:b0:63c:274a:4f16 with SMTP id 4fb4d7f45d1cf-63f4bce4cdcmr716340a12.16.1761604524643; Mon, 27 Oct 2025 15:35:24 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Sami Imseih Date: Mon, 27 Oct 2025 17:35:12 -0500 X-Gm-Features: AWmQ_bmL-rGrFfFk3BI8u_8pDxnMnAUDON0DsLwsgOCCJ1DMBDdcUcFIT0VhhZs Message-ID: Subject: Re: another autovacuum scheduling thread To: Nathan Bossart Cc: David Rowley , Robert Haas , Jeremy Schneider , pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > > I wrote a sql that returns the tables and scores, which I found was > > useful when I was testing this out, so having the actually rules spelled out > > in docs will actually be super useful. > > Can you elaborate on how it would be useful? I'd be open to adding a short > note that autovacuum attempts to prioritize the tables in a smart way, but > I'm not sure I see the value of documenting every detail. We discuss the threshold calculations in the documentation, and users can write scripts to monitor which tables are eligible. However, there is nothing that indicates which table autovacuum will work on next (I have been asked that question by users a few times, sometimes out of curiosity, or because they are monitoring vacuum activity and wondering when their important table will get a vacuum cycle, or if they should kick off a manual vacuum). With the scoring system, it will be much more difficult to explain, unless someone walks through the code. > I also don't > want to add too much friction to future changes to the prioritization > logic. Maybe future changes is a good reason to document the way autovacuum prioritizes, since this is a user-facing change. > > If we don't want to go that much in depth, at minimum the docs should say: > > > > "Autovacuum prioritizes tables based on how far they exceed their thresholds > > or if they are approaching wraparound limits." so a DBA can understand > > this behavior. > > Yeah, I would probably choose to keep it relatively vague like this. With all the above said, starting with something small is definitely better than nothing. > > * The score is calculated as the maximum of the ratios of each of the table's > > * relevant values to its threshold. For example, if the number of inserted > > * tuples is 100, and the insert threshold for the table is 80, the insert > > * score is 1.25. > > > > Should we consider clamping down on the score when > > reltuples = -1, otherwise the scores for such tables ( new tables > > with a large amount of ingested data ) will be over-inflated? Perhaps, > > if reltuples = -1 ( # of reltuples not known ), then give a score of .5, > > so we are not over-prioritizing but not pushing down to the bottom? > > I'm not sure it's worth expending too much energy to deal with this. In > the worst case, the table will be given an arbitrarily high priority the > first time it is vacuumed, but AFAICT that's it. But that's already the > case, as the thresholds will be artificially low before the first > VACUUM/ANALYZE. I can think of scenarios where they may be workloads that create/drops staging tables and load some data ( like batch processing ) where this may become an issue because we are now forcing such tables to the top of the list, potentially impacting other tables from getting vacuum cycles. It could happen now, but the difference with this change is we are forcing these tables to the top of the priority; based on an unknown value (pg_class.reltuples = -1). -- Sami Imseih Amazon Web Services (AWS)