Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0fAA-0026MT-3D for pgsql-bugs@arkaria.postgresql.org; Thu, 12 Mar 2026 12:29:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0fA9-00EmHY-1I for pgsql-bugs@arkaria.postgresql.org; Thu, 12 Mar 2026 12:29:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0cHy-00E0B6-1E for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 09:25:06 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w0cHw-00000001kPB-2SnR for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 09:25:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:Cc:From:To:Subject: Content-Transfer-Encoding:MIME-Version:Content-Type:Sender:Content-ID: Content-Description:In-Reply-To:References; bh=a9Yga3f8XljCodCe/VrB/g2ASOfPUgv07APFNBO45OI=; b=67/74l0cJWgSsecC+9xXnhiWUn XEhlAkekFO1iyvSp+DQzY1Zvw4sfypR3hGzcTGt0ihm1iPv3Zu8Mw7hYO5ghOQB5Q0e5VahEsWuCO NkPdsW0e0lVilyly5QU4h6OA80g4kAR+cI1EXWcCiip2L+ckDIaB1X1HTqxN5ld1mpL6I9bY1k0K0 +eUhPtBkRH0sm1EkzgKD6OW41Nt+O0x3YzhMOS+/sA+GdHqcED+TBLQTuQd5U4sW56E3+ehHTK2Xe DrVPNiCMRFPTRBDZ9+x+2KmGSWl74SK4HmbwLYYmU9Kfi4O088Ul/kSaz9Wy7S9EcGdE0gDyLpFey 7LenckWA==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0cHw-004KTl-0q for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 09:25:04 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0cHt-008pTc-2V for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 09:25:02 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: BUG #19430: Autovacuums stopped working possible due to problem with vacuuming shared catalog pg_authid To: pgsql-bugs@lists.postgresql.org From: PG Bug reporting form Cc: alexius.work@gmail.com Reply-To: alexius.work@gmail.com, pgsql-bugs@lists.postgresql.org Date: Thu, 12 Mar 2026 09:24:21 +0000 Message-ID: <19430-db8d55f587ae3546@postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The following bug has been logged on the website: Bug reference: 19430 Logged by: Alexey Ermakov Email address: alexius.work@gmail.com PostgreSQL version: 14.15 Operating system: Ubuntu 22.04 Description: =20 On a PostgreSQL 14.15 server with about 100 databases running on Ubuntu 22.04 I observed following problem: 1) According to the DB logs (log_autovacuum_min_duration =3D 1s) and monitoring graphs autovacuums almost completely stopped working across all DBs at once 2) This is also confirmed by checking pg_stat_user_tables.last_autovacuum in all databases, there were no autovacuums anywhere for several days 3) Counters in pg_stat_user_tables updated normally, no anomalies there 4) pg_stat_all_tables.last_autovacuum shows one anomaly with pg_authid catalog: only that catalog was autovacuumed recently (there are not many dead rows there, looks normal). There might be other anomalies in other databases as I only checked one but I doubt it. 5) I found nothing interesting in the logs around the time when problem started (I checked for ERROR/WARNING/FATAL messages and possible DDL queries but sadly log_statement =3D none there, log_min_duration_statement =3D 1s) 6) Autovacuum launcher process appears to be running normally, periodically wakes up and consume 1-3% CPU of single core, pg_stat_activity shows the wait event as AutoVacuumMain 7) Just in case, I checked dmesg entries and the time offset graph in monitoring (to verify that time wasn't moving backwards for some reason) - no anomalies there 8) Running vacuum on one of the bloated tables (which cause performance problems) worked without issues 9) There is no long running transactions or lagging standbys with hot_standby_feedback on 10) pg_authid catalog has `age(relfrozenxid)` around 290M but autovacuum_freeze_max_age is 200M and anti-wraparound autovacuum should've been triggered 11) After running `vacuum verbose pg_authid;` nothing changed (it wasn't an aggressive vacuum) but running `vacuum (freeze, verbose) pg_authid;` fixed problem and autovacuums started working properly. Output of these commands are at the end of the report. I assume that problem might have started when age(relfrozenxid) for pg_authid approached autovacuum_freeze_max_age. A rough estimate based on the current rate of age(relfrozenxid) confirms that we might indeed did 90M transactions during the autovacuum malfunction. An interesting observation: the query `ALTER ROLE xxx WITH ;` from some permission-granting business logic triggered several autovacuums on ~half of databases at same second for some time but didn't fix the problem completely. Perhaps some cache invalidation did that? Unfortunately, 14.15 is more than 1 year old. An update to 14.22 was discussed recently but haven't been made yet. There were several fixes related to vacuum/autovacuum but they didn't seem directly related to the issue but I might be wrong. Output of vacuum commands: vacuum verbose pg_authid; INFO: vacuuming "pg_catalog.pg_authid" INFO: table "pg_authid": found 0 removable, 7 nonremovable row versions in 1 out of 57 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3828262030 Skipped 0 pages due to buffer pins, 55 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: vacuuming "pg_toast.pg_toast_1260" INFO: table "pg_toast_1260": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3828262030 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM vacuum (freeze, verbose) pg_authid; INFO: aggressively vacuuming "pg_catalog.pg_authid" INFO: table "pg_authid": found 0 removable, 196 nonremovable row versions in 4 out of 57 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3828278683 Skipped 0 pages due to buffer pins, 53 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: aggressively vacuuming "pg_toast.pg_toast_1260" INFO: table "pg_toast_1260": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 3828278683 Skipped 0 pages due to buffer pins, 0 frozen pages. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM Thanks, Alexey Ermakov