Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0m0x-002CF1-2O for pgsql-bugs@arkaria.postgresql.org; Thu, 12 Mar 2026 19:48:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0m0w-00HD7E-0D for pgsql-bugs@arkaria.postgresql.org; Thu, 12 Mar 2026 19:48:10 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0m0v-00HD76-29 for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 19:48:10 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0m0u-00000001ovo-0BlQ for pgsql-bugs@lists.postgresql.org; Thu, 12 Mar 2026 19:48:09 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-660dcafc85aso2636967a12.0 for ; Thu, 12 Mar 2026 12:48:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1773344886; x=1773949686; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=Fd/kTFaSLgSLts+1OQp989B4rzf/ZtVuM9C+Bd5935Q=; b=ZWGRKgowx2S8ZE6510upDwZoG8MhuRp0bx0ACdtSLkV+FASbrtmYQrP0XzG0fKIqev nUimKTKbVpCx0Lh8qpyjk0eT1/+Njlw91rfbVnfaAmCdGng2RvOhrBl6PcIkZgiuWRJU F4sHEPN35nNQs2N+uDOreVmmgozjU0Bu00MzveHnY4NSRXdiw62to3vNhb4u1PX/dWmH YkBs1IRzjbHRkeTSKtUiQiKmHG4h1Urt0hD1CKYjKWJmL3JyHU9c/nzd0hybmKyM4z1W neWF5ZpDP4nxk6F+2se3Qhe1P0SAu7AQYlP/SIvPpp9vA7e8x6f+uxaLRVfaCX1m51EE YsPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773344886; x=1773949686; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Fd/kTFaSLgSLts+1OQp989B4rzf/ZtVuM9C+Bd5935Q=; b=qmk1pEJnp3Eh9nKJHsos6w7Br9AkoyGSk3jrLATnxgNqb+T0xkRKrWN5UY4cU+wl1s Vpypv0CVKOvuzWunivMlHIzq3kjT9/zDkc/EmYQxIigeNpGxrk4pwSUjqBIpO2tGmyUt Pt2/00qmIj0xXCJLcxUCVGvo+mXrx8aOMGria0T3adkr+DtXcp6/snZ3wiQLRafY2FwB jKhfi0XR2Rua+YvUp5wauUgWwjHimPKDSHNFRvrQA8/N2pRDK99aZmWuds4VfSVHu3Ig zfMJQMvevL54MqF0eTpXzAIVDaP/VTVRSjX/Iyph07yMtuLvhYjjdZmreLS5TaQX9W9v MKMg== X-Forwarded-Encrypted: i=1; AJvYcCXcpUS5UDUxShTUyYROsKjOTEbTVXX9hciKV5tLCOD/PusNHoTfe5ay/3/k3gtuXirpcBKeZQeZxN4J@lists.postgresql.org X-Gm-Message-State: AOJu0Yzhxvo9r1Xrq+eyQtV5s0zfA9OH5zQIAs8JsSsXLRip+Vg1re+X 3E60oGxiVQhf8L3j0OvJob/4Szftcr3LEfRj2IxWNYUwBfm7QiDkwLBTK42sdR4o/C0= X-Gm-Gg: ATEYQzxODkzqb+D/243ZVOICMgJDXvEW0L1+eeD6bsOdOeWiU41UGtTg7u7B7ssEeB+ 8GJtDTuIMQWw9ZfOJQ3ND+0yS74TSwfHs0VMLdjtFvhtuIpIEuDXRT5Y1g1yXD+PLejoV3q0IJ7 sXiTsbKSqlRU6d4LdFF4jUet97tAgUR5/QHdxbLdnA6M39vVeJ/pgrfIXIREyEgdfMwVUzIdAE5 J5Ppgad3HwR4viHtL4jcUeelPTuGjfzXyxqGiybgZx+NJT3yqluKoA4alBBJJwWO03F2yNHTYm3 PI5APTpk3DZ6Dl1zf1S2Nh8GtCbOVp2IK9f2VkX7dAW3DMjRU4WX0JY8HC0nVxKo+pAviJgyKmr X7Ehrh/BtKVjFZ1IoVNtz5pyT1OWn8zQsEVjGSbyvVhZWdSTDP09n9VBDUpmsVGKsZnzynfDF+G eiNXsCykuXW3K39dbOmGSX2K8ocuXe3KnG83ybg0FMzaW5S8GtqmwfdAUuPoZAiV8/b6KJLvzyN ZZ+5g9XApQyuX7YH8BnRXwGBAqXzTsrlSpg X-Received: by 2002:a17:907:1b1a:b0:b8f:a85c:95c5 with SMTP id a640c23a62f3a-b9765345681mr33563666b.37.1773344886258; Thu, 12 Mar 2026 12:48:06 -0700 (PDT) Received: from laurenz.albe-K4N0CV00F97414D (dynamic-046-114-168-089.46.114.pool.telefonica.de. [46.114.168.89]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b9764b9d335sm15140666b.58.2026.03.12.12.48.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Mar 2026 12:48:05 -0700 (PDT) Message-ID: <6cfc9d223ea826028d5f0c9f8ee4c4c30a825254.camel@cybertec.at> Subject: Re: BUG #19430: Autovacuums stopped working possible due to problem with vacuuming shared catalog pg_authid From: Laurenz Albe To: alexius.work@gmail.com, pgsql-bugs@lists.postgresql.org Date: Thu, 12 Mar 2026 20:47:53 +0100 In-Reply-To: <19430-db8d55f587ae3546@postgresql.org> References: <19430-db8d55f587ae3546@postgresql.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.58.3 (3.58.3-1.fc43) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 2026-03-12 at 09:24 +0000, PG Bug reporting form wrote: > PostgreSQL version: 14.15 > Operating system: Ubuntu 22.04 >=20 > On a PostgreSQL 14.15 server with about 100 databases running on Ubuntu > 22.04 I observed following problem: >=20 > 1) According to the DB logs (log_autovacuum_min_duration =3D 1s) and > monitoring graphs autovacuums almost completely stopped working across al= l > DBs at once > 2) This is also confirmed by checking pg_stat_user_tables.last_autovacuum= in > all databases, there were no autovacuums anywhere for several days > 3) Counters in pg_stat_user_tables updated normally, no anomalies there > 4) pg_stat_all_tables.last_autovacuum shows one anomaly with pg_authid > catalog: only that catalog was autovacuumed recently (there are not many > dead rows there, looks normal). There might be other anomalies in other > databases as I only checked one but I doubt it. > 5) I found nothing interesting in the logs around the time when problem > started (I checked for ERROR/WARNING/FATAL messages and possible DDL quer= ies > but sadly log_statement =3D none there, log_min_duration_statement =3D 1s= ) > 6) Autovacuum launcher process appears to be running normally, periodical= ly > wakes up and consume 1-3% CPU of single core, pg_stat_activity shows the > wait event as AutoVacuumMain > 7) Just in case, I checked dmesg entries and the time offset graph in > monitoring (to verify that time wasn't moving backwards for some reason) = - > no anomalies there > 8) Running vacuum on one of the bloated tables (which cause performance > problems) worked without issues > 9) There is no long running transactions or lagging standbys with > hot_standby_feedback on > 10) pg_authid catalog has `age(relfrozenxid)` around 290M but > autovacuum_freeze_max_age is 200M and anti-wraparound autovacuum should'v= e > been triggered > 11) After running `vacuum verbose pg_authid;` nothing changed (it wasn't = an > aggressive vacuum) but running `vacuum (freeze, verbose) pg_authid;` fixe= d > problem and autovacuums started working properly. Output of these command= s > are at the end of the report. >=20 > I assume that problem might have started when age(relfrozenxid) for > pg_authid approached autovacuum_freeze_max_age. A rough estimate based on > the current rate of age(relfrozenxid) confirms that we might indeed did 9= 0M > transactions during the autovacuum malfunction. >=20 > An interesting observation: the query `ALTER ROLE xxx WITH ;` from some > permission-granting business logic triggered several autovacuums on ~half= of > databases at same second for some time but didn't fix the problem > completely. Perhaps some cache invalidation did that? >=20 > Unfortunately, 14.15 is more than 1 year old. An update to 14.22 was > discussed recently but haven't been made yet. There were several fixes > related to vacuum/autovacuum but they didn't seem directly related to the > issue but I might be wrong. >=20 > Output of vacuum commands: >=20 > vacuum verbose pg_authid; > INFO: vacuuming "pg_catalog.pg_authid" > INFO: table "pg_authid": found 0 removable, 7 nonremovable row versions = in > 1 out of 57 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 38282620= 30 > Skipped 0 pages due to buffer pins, 55 frozen pages. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > INFO: vacuuming "pg_toast.pg_toast_1260" > INFO: table "pg_toast_1260": found 0 removable, 0 nonremovable row versi= ons > in 0 out of 0 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 38282620= 30 > Skipped 0 pages due to buffer pins, 0 frozen pages. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > VACUUM >=20 > vacuum (freeze, verbose) pg_authid; > INFO: aggressively vacuuming "pg_catalog.pg_authid" > INFO: table "pg_authid": found 0 removable, 196 nonremovable row version= s > in 4 out of 57 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 38282786= 83 > Skipped 0 pages due to buffer pins, 53 frozen pages. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > INFO: aggressively vacuuming "pg_toast.pg_toast_1260" > INFO: table "pg_toast_1260": found 0 removable, 0 nonremovable row versi= ons > in 0 out of 0 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 38282786= 83 > Skipped 0 pages due to buffer pins, 0 frozen pages. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > VACUUM This change in 14.21 looks relevant: - Fix issues around in-place catalog updates (Noah Misch) Send a nontransactional invalidation message for an in-place update, sinc= e such an update will survive transaction rollback. Also ensure that the update is WAL-logged before other sessions can see it. These fixes primarily prevent scenarios in which relations' frozen-XID attributes become inconsistent, possibly allowing premature CLOG truncation and subsequent =E2=80=9Ccould not access status of transaction=E2=80=9D error= s. Yours, Laurenz Albe