public inbox for [email protected]  
help / color / mirror / Atom feed
From: Alena Rybakina <[email protected]>
To: Alexander Korotkov <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Jim Nasby <[email protected]>
Cc: Bertrand Drouvot <[email protected]>
Cc: Ilia Evdokimov <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Andrei Zubkov <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: jian he <[email protected]>
Cc: [email protected]
Cc: Sami Imseih <[email protected]>
Cc: vignesh C <[email protected]>
Subject: Re: Vacuum statistics
Date: Thu, 4 Sep 2025 19:18:21 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAPpHfduoJEuoixPTTg2tjhnXqrdobuMaQGxriqxJ9TjN1uxOuA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAMFBP2oXkhX_k9FTqtW-LdTBepVq0PDuBEGO8-LpNGbyHTBrNw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAA4eK1JOUn+EqWSfRgKfgBZOXT7Q2dw2enmSZZgOhoMOFwopPA@mail.gmail.com>
	<[email protected]>
	<CAPpHfdtQd29O15Cmp1qeqTCerQF0Y+BGh63qtX3RkA7k=0TZ1Q@mail.gmail.com>
	<[email protected]>
	<[email protected]>

To be honest, I haven’t provided extensions for the PostgreSQL [0] to 
hackers yet, nor have I encountered this situation in general. Just in 
case, I created an open repository on GitHub with the code and added a 
description in the README.

[0] https://github.com/Alena0704/vacuum_statistics#

On 04.09.2025 18:49, Alena Rybakina wrote:
> Hi, all!
>
> On 02.06.2025 19:50, Alena Rybakina wrote:
>>
>> On 02.06.2025 19:25, Alexander Korotkov wrote:
>>> On Tue, May 13, 2025 at 12:49 PM Alena Rybakina
>>> <[email protected]> wrote:
>>>> On 12.05.2025 08:30, Amit Kapila wrote:
>>>>> On Fri, May 9, 2025 at 5:34 PM Alena Rybakina 
>>>>> <[email protected]> wrote:
>>>>>> I did a rebase and finished the part with storing statistics 
>>>>>> separately from the relation statistics - now it is possible to 
>>>>>> disable the collection of statistics for relationsh using gucs and
>>>>>> this allows us to solve the problem with the memory consumed.
>>>>>>
>>>>> I think this patch is trying to collect data similar to what we do 
>>>>> for
>>>>> pg_stat_statements for SQL statements. So, can't we follow a similar
>>>>> idea such that these additional statistics will be collected once 
>>>>> some
>>>>> external module like pg_stat_statements is enabled? That module 
>>>>> should
>>>>> be responsible for accumulating and resetting the data, so we won't
>>>>> have this memory consumption issue.
>>>> The idea is good, it will require one hook for the 
>>>> pgstat_report_vacuum
>>>> function, the extvac_stats_start and extvac_stats_end functions can be
>>>> run if the extension is loaded, so as not to add more hooks.
>>> +1
>>> Nice idea of a hook.  Given the volume of the patch, it might be a
>>> good idea to keep this as an extension.
>> Okay, I'll realize it and apply the patch)
>>>
>>>> But I see a problem here with tracking deleted objects for which
>>>> statistics are no longer needed. There are two solutions to this and I
>>>> don't like both of them, to be honest.
>>>> The first way is to add a background process that will go through the
>>>> table with saved statistics and check whether the relation or the
>>>> database are relevant now or not and if not, then
>>>> delete the vacuum statistics information for it. This may be
>>>> resource-intensive. The second way is to add hooks for deleting the
>>>> database and relationships (functions dropdb, index_drop,
>>>> heap_drop_with_catalog).
>>> Can we workaround this with object_access_hook?
>>
>> I think this could fix the problem. For the OAT-DROP access type, we 
>> can call a function to reset the vacuum statistics for relations that 
>> are about to be dropped.
>>
>> At the moment, I don’t see any limitations to using this approach.
>>
> I’ve prepared the first working version of the extension.
>
> I haven’t yet implemented writing the statistics to a file and 
> reloading them into a hash table and shared memory at instance 
> startup, and I also haven’t implemented a proper output for 
> database-level statistics yet.
>
> I structured the extension as follows: statistics are stored in a hash 
> table keyed by a composite key - database OID, relation OID, and 
> object type (index, table, or database). When VACUUM or a worker 
> processes a table or index, an exclusive lock is taken to update the 
> corresponding record; a shared lock is taken when reading the 
> statistics. For database-level output, I plan to compute the totals by 
> summing table and index statistics on demand.
>
> To optimize that, I plan to keep entries in the hash table ordered by 
> database OID. When accessing the first element by the partial key 
> (database OID), I’ll scan forward and aggregate until the partitial 
> database key changes.
>
> Right now this requires adding the extension to 
> `shared_preload_libraries`. I haven’t found a way to avoid that 
> because of shared-memory setup, and I’m not sure it’s even possible.
>
> I’m also unsure whether it’s better to store the statistics in the 
> cumulative statistics system (as done here) or entirely inside the 
> extension. Note that the code added to the core to support the 
> extension executes regardless of whether the extension is enabled.





view thread (53+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Vacuum statistics
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox