Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tviKP-007M0d-El for pgsql-hackers@arkaria.postgresql.org; Fri, 21 Mar 2025 19:46:49 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tviKO-00EnQv-5U for pgsql-hackers@arkaria.postgresql.org; Fri, 21 Mar 2025 19:46:48 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tviKN-00EnQ1-QL for pgsql-hackers@lists.postgresql.org; Fri, 21 Mar 2025 19:46:47 +0000 Received: from mail.postgrespro.ru ([93.174.132.70]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tviKL-000NOA-1S for pgsql-hackers@postgresql.org; Fri, 21 Mar 2025 19:46:47 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1742586404; bh=PY8IeMdcCPlAHztExx/GKIgqfYXCvs1DYNyBc9JwYPI=; h=Message-ID:Date:User-Agent:Subject:From:To:Cc:References: In-Reply-To:From; b=ZjnKq+50+Al03xqGC9t/+hvNq/E+R1WYhvQbEKsDK7BcquDnEI5czo4+kC73tox9p MNEKogH7oi64Ar9OkUNK63RpWRP2rkNY5TIarwa9yBlna7JyXMM1EVohnpqSQDMGbG P8bS7iEHpIrPbf2vQtYRnOovqscc/hHbK7+Q/rVWF8AgYUl2RoOHou8wyaqXhzxEXK PA8ureKcQa2ZjtE6mNWUi+K7rSFNxx/wwvyk1irkVIvnpfn2zwcTYFmHbgYpMpttzU foacZ7K+lzt+Bdg4sSsggv5KZSXsCRDTjDUzS+KKKU0JO282MxcnSvpvNzLrzEV+hg +n3zKD3sAyJsA== Received: from [172.20.10.4] (unknown [89.113.149.127]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/587) with ESMTPSA id A10DA60B98; Fri, 21 Mar 2025 22:46:43 +0300 (MSK) Content-Type: multipart/alternative; boundary="------------xNqggfCdQ2yZdgR8RvEF9OBK" Message-ID: Date: Fri, 21 Mar 2025 22:46:41 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Vacuum statistics From: Alena Rybakina To: Bertrand Drouvot , Jim Nasby , vignesh C Cc: Ilia Evdokimov , pgsql-hackers , Alexander Korotkov , Kirill Reshke , Andrei Zubkov , Masahiko Sawada , Melanie Plageman , jian he , a.lepikhov@postgrespro.ru, Sami Imseih References: <86f76aa5-1ab5-4e2e-9b15-405051852a2a@postgrespro.ru> <1e81a0a1-a63b-48fb-905a-d6495f89ab73@postgrespro.ru> <0b4eefc7-4c38-4caa-b2ca-a4c75dd7dd12@postgrespro.ru> <333c2306-c401-4959-9f0c-a44c670a11a9@postgrespro.ru> <513f0188-b093-4cc8-98cf-4c324570d525@postgrespro.ru> <47a7b784-5218-43f2-96e3-65f9a729c5a5@tantorlabs.com> Content-Language: en-US In-Reply-To: X-KSMG-AntiPhishing: NotDetected, bases: 2025/03/21 18:26:00 X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2025/03/21 17:59:00 #27810514 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------xNqggfCdQ2yZdgR8RvEF9OBK Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sorry, I forgot to provide a link to the problem [0], actually. So I provided it below. [0] https://www.postgresql.org/message-id/CAPpHfduoJEuoixPTTg2tjhnXqrdobuMaQGxriqxJ9TjN1uxOuA%40mail.gmail.com On 21.03.2025 22:42, Alena Rybakina wrote: > On 13.03.2025 09:42, Bertrand Drouvot wrote: >> Hi, >> >> On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote: >>> The usecase I can see here is that we don't want autovac creating so much >>> WAL traffic that it starts forcing other backends to have to write WAL out. >>> But tracking how many times autovac writes WAL buffers won't help with that >> Right, because the one that increments the wal_buffers_full metric could "just" >> be a victim (i.e the one that happens to trigger the WAL buffers disk flush, >> even though other backends contributed most of the buffer usage). >> >>> (though we also don't want any WAL buffers written by autovac to be counted >>> in the system-wide wal_buffers_full: >> why? Or do you mean that it would be good to have 2 kinds of metrics: one >> generated by "maintenance" activity and one by "regular" backends? >> >>> What would be helpful would be a way to determine if autovac was causing >>> enough traffic to force other backends to write WAL. Offhand I'm not sure >>> how practical that actually is though. >> a051e71e28a could help to see how much WAL has by written by the autovac workers. >> >>> BTW, there's also an argument to be made that autovac should throttle >>> itself if we're close to running out of available WAL buffers... >> hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate" >> the WAL buffers flush to another backend. >> >> Regards, >> > > I will add it and fix the tests but later and I'll explain why. > > I'm working on this issue [0] and try have already created new > statistics in Statistics Collector to store database and relation > vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION. > > Vacuum statistics are saved there instead of relation's and database's > statistic structure, but for some reason it is not possible to find > them in the hash table when building a snapshot and display them > accordingly. > I have not yet figured out where the error is. > > Without solving this problem, committing vacuum statistics is not yet > possible. An alternative way for us was to refuse some statistics for > now for relations, > but we could not agree on which statistics should not be displayed yet > and for now we are only adding them :). > > I understand why this is important to display more vacuum information > about vacuum statistics - it will allow us to better understand the > problems of incorrect vacuum settings or, for example, notice a bug in > its operation. > > In order to reduce the memory consumption for storing them for those > who are not going to use them, I just realized that we need to create > a separate space for storing the statistics > I mentioned above (PGSTAT_KIND_VACUUM_DB and > PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I > am still trying to complete this functionality. > > I doubt that I will have time for this by code freeze date and even if > I do, I will hardly have time for a normal review. There's really a > lot more to learn related to the stat collector, so > I'm postponing it to the next commitfest. > > Sorry. I'll fix the tests as soon as I finish this part, since they'll > most likely either break the same way or in some new way. > > Tomorrow or the day after tomorrow I will send a diff patch with what > I have already managed to demonstrate the problem, since I need to > bring the code to a normal form. > Maybe someone who worked with the stat collector will suddenly tell me > where and what I have implemented incorrectly. > -- Regards, Alena Rybakina Postgres Professional --------------xNqggfCdQ2yZdgR8RvEF9OBK Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Sorry, I forgot to provide a link to the problem [0], actually. So I provided it below.

[0] https://www.postgresql.org/message-id/CAPpHfduoJEuoixPTTg2tjhnXqrdobuMaQGxriqxJ9TjN1uxOuA%40mail.gmail.com


On 21.03.2025 22:42, Alena Rybakina wrote:
On 13.03.2025 09:42, Bertrand Drouvot wrote:
Hi,

On Wed, Mar 12, 2025 at 05:15:53PM -0500, Jim Nasby wrote:
The usecase I can see here is that we don't want autovac creating so much
WAL traffic that it starts forcing other backends to have to write WAL out.
But tracking how many times autovac writes WAL buffers won't help with that
Right, because the one that increments the wal_buffers_full metric could "just"
be a victim (i.e the one that happens to trigger the WAL buffers disk flush,
even though other backends contributed most of the buffer usage).

(though we also don't want any WAL buffers written by autovac to be counted
in the system-wide wal_buffers_full:
why? Or do you mean that it would be good to have 2 kinds of metrics: one
generated by "maintenance" activity and one by "regular" backends?

What would be helpful would be a way to determine if autovac was causing
enough traffic to force other backends to write WAL. Offhand I'm not sure
how practical that actually is though.
a051e71e28a could help to see how much WAL has by written by the autovac workers.

BTW, there's also an argument to be made that autovac should throttle
itself if we're close to running out of available WAL buffers...
hmm, yeah I think that's an interesting idea OTOH that would mean to "delegate"
the WAL buffers flush to another backend.

Regards,


I will add it and fix the tests but later and I'll explain why. 

I'm working on this issue [0] and try have already created new statistics in Statistics Collector to store database and relation vacuum statistics: PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION.

Vacuum statistics are saved there instead of relation's and database's statistic structure, but for some reason it is not possible to find them in the hash table when building a snapshot and display them accordingly.
I have not yet figured out where the error is. 

Without solving this problem, committing vacuum statistics is not yet possible. An alternative way for us was to refuse some statistics for now for relations,
but we could not agree on which statistics should not be displayed yet and for now we are only adding them :).

I understand why this is important to display more vacuum information about vacuum statistics - it will allow us to better understand the problems of incorrect vacuum settings or, for example, notice a bug in its operation. 

In order to reduce the memory consumption for storing them for those who are not going to use them, I just realized that we need to create a separate space for storing the statistics
I mentioned above (PGSTAT_KIND_VACUUM_DB and PGSTAT_KIND_VACUUM_RELATION), there is no other way to do this and I am still trying to complete this functionality. 

I doubt that I will have time for this by code freeze date and even if I do, I will hardly have time for a normal review. There's really a lot more to learn related to the stat collector, so
I'm postponing it to the next commitfest. 

Sorry. I'll fix the tests as soon as I finish this part, since they'll most likely either break the same way or in some new way.

Tomorrow or the day after tomorrow I will send a diff patch with what I have already managed to demonstrate the problem, since I need to bring the code to a normal form.
Maybe someone who worked with the stat collector will suddenly tell me where and what I have implemented incorrectly.


    

-- 
Regards,
Alena Rybakina
Postgres Professional
--------------xNqggfCdQ2yZdgR8RvEF9OBK--