Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuCfY-004c8q-Ex for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Sep 2025 16:18:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uuCfX-000J6e-Dy for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Sep 2025 16:18:39 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuCfW-000J6W-Uv for pgsql-hackers@lists.postgresql.org; Thu, 04 Sep 2025 16:18:39 +0000 Received: from mail.postgrespro.ru ([93.174.132.70]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1uuCfT-000XA6-2Y for pgsql-hackers@postgresql.org; Thu, 04 Sep 2025 16:18:38 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1757002711; bh=FUvcvFKCC3P+CO47KK8lGsvyk6byVoDGT+5o6KGHFb8=; h=Message-ID:Date:User-Agent:Subject:From:To:Cc:References: In-Reply-To:From; b=xZFN4ElHuXRDgqsNBZ8v4dy3udE5+JJIprSmBuSfaj5c7SpRv+VYcYn72TLyN/Apy FWitIrR4rltnV9jNfonxc2I+t9tXW6C5J+oS+jBBk4aUDBlVEislW1a8hS9cI9vN+V gVSc5hKEcN/UUi2tAgqn5bD+Wpd9zFgPlf39Ec9IYhndvtCSvLPWyVrl0mMMLOPChG EMEocR/Q/vYcfqUmzSBRRZ5U1ZOFhJOg83KssZwDkK7yul04huRqqu4YRhpFaau+Si WRMcp8tsm46FKON5Zh5c16AxTv2+IinN8719Lwdba0lpCouhjAxQBUmYAzUnQHoQxS by9BddYgogq5w== Received: from [10.0.85.1] (unknown [151.236.12.133]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/465) with ESMTPSA id C229460841; Thu, 4 Sep 2025 19:18:26 +0300 (MSK) Message-ID: <79490c8c-2c35-464d-bb3e-3d2ea5aca9ab@postgrespro.ru> Date: Thu, 4 Sep 2025 19:18:21 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Vacuum statistics From: Alena Rybakina To: Alexander Korotkov Cc: Amit Kapila , pgsql-hackers , Jim Nasby , Bertrand Drouvot , Ilia Evdokimov , Kirill Reshke , Andrei Zubkov , Masahiko Sawada , Melanie Plageman , jian he , a.lepikhov@postgrespro.ru, Sami Imseih , vignesh C References: <86f76aa5-1ab5-4e2e-9b15-405051852a2a@postgrespro.ru> <333c2306-c401-4959-9f0c-a44c670a11a9@postgrespro.ru> <513f0188-b093-4cc8-98cf-4c324570d525@postgrespro.ru> <47a7b784-5218-43f2-96e3-65f9a729c5a5@tantorlabs.com> <5583261b-eede-4341-b3b1-91650fefc1cf@postgrespro.ru> <2a04ad18-5572-4633-848b-eb57209e7ac0@postgrespro.ru> <2f48399f-2959-4483-938f-64edb863ca76@postgrespro.ru> <9a94f2a0-4698-40e3-9010-2567d3b8ba55@postgrespro.ru> <026722b5-6e61-44d8-816d-482ad24423a9@postgrespro.ru> Content-Language: en-US In-Reply-To: <026722b5-6e61-44d8-816d-482ad24423a9@postgrespro.ru> Content-Type: text/plain; charset=UTF-8; format=flowed X-KSMG-AntiPhishing: NotDetected, bases: 2025/09/04 15:27:00 X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2025/09/04 15:26:00 #27763105 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk To be honest, I haven=E2=80=99t provided extensions for the PostgreSQL [0= ] to=20 hackers yet, nor have I encountered this situation in general. Just in=20 case, I created an open repository on GitHub with the code and added a=20 description in the README. [0] https://github.com/Alena0704/vacuum_statistics# On 04.09.2025 18:49, Alena Rybakina wrote: > Hi, all! > > On 02.06.2025 19:50, Alena Rybakina wrote: >> >> On 02.06.2025 19:25, Alexander Korotkov wrote: >>> On Tue, May 13, 2025 at 12:49=E2=80=AFPM Alena Rybakina >>> wrote: >>>> On 12.05.2025 08:30, Amit Kapila wrote: >>>>> On Fri, May 9, 2025 at 5:34=E2=80=AFPM Alena Rybakina=20 >>>>> wrote: >>>>>> I did a rebase and finished the part with storing statistics=20 >>>>>> separately from the relation statistics - now it is possible to=20 >>>>>> disable the collection of statistics for relationsh using gucs and >>>>>> this allows us to solve the problem with the memory consumed. >>>>>> >>>>> I think this patch is trying to collect data similar to what we do=20 >>>>> for >>>>> pg_stat_statements for SQL statements. So, can't we follow a simila= r >>>>> idea such that these additional statistics will be collected once=20 >>>>> some >>>>> external module like pg_stat_statements is enabled? That module=20 >>>>> should >>>>> be responsible for accumulating and resetting the data, so we won't >>>>> have this memory consumption issue. >>>> The idea is good, it will require one hook for the=20 >>>> pgstat_report_vacuum >>>> function, the extvac_stats_start and extvac_stats_end functions can = be >>>> run if the extension is loaded, so as not to add more hooks. >>> +1 >>> Nice idea of a hook.=C2=A0 Given the volume of the patch, it might be= a >>> good idea to keep this as an extension. >> Okay, I'll realize it and apply the patch) >>> >>>> But I see a problem here with tracking deleted objects for which >>>> statistics are no longer needed. There are two solutions to this and= I >>>> don't like both of them, to be honest. >>>> The first way is to add a background process that will go through th= e >>>> table with saved statistics and check whether the relation or the >>>> database are relevant now or not and if not, then >>>> delete the vacuum statistics information for it. This may be >>>> resource-intensive. The second way is to add hooks for deleting the >>>> database and relationships (functions dropdb, index_drop, >>>> heap_drop_with_catalog). >>> Can we workaround this with object_access_hook? >> >> I think this could fix the problem. For the OAT-DROP access type, we=20 >> can call a function to reset the vacuum statistics for relations that=20 >> are about to be dropped. >> >> At the moment, I don=E2=80=99t see any limitations to using this appro= ach. >> > I=E2=80=99ve prepared the first working version of the extension. > > I haven=E2=80=99t yet implemented writing the statistics to a file and=20 > reloading them into a hash table and shared memory at instance=20 > startup, and I also haven=E2=80=99t implemented a proper output for=20 > database-level statistics yet. > > I structured the extension as follows: statistics are stored in a hash=20 > table keyed by a composite key - database OID, relation OID, and=20 > object type (index, table, or database). When VACUUM or a worker=20 > processes a table or index, an exclusive lock is taken to update the=20 > corresponding record; a shared lock is taken when reading the=20 > statistics. For database-level output, I plan to compute the totals by=20 > summing table and index statistics on demand. > > To optimize that, I plan to keep entries in the hash table ordered by=20 > database OID. When accessing the first element by the partial key=20 > (database OID), I=E2=80=99ll scan forward and aggregate until the parti= tial=20 > database key changes. > > Right now this requires adding the extension to=20 > `shared_preload_libraries`. I haven=E2=80=99t found a way to avoid that= =20 > because of shared-memory setup, and I=E2=80=99m not sure it=E2=80=99s e= ven possible. > > I=E2=80=99m also unsure whether it=E2=80=99s better to store the statis= tics in the=20 > cumulative statistics system (as done here) or entirely inside the=20 > extension. Note that the code added to the core to support the=20 > extension executes regardless of whether the extension is enabled.