public inbox for [email protected]  
help / color / mirror / Atom feed
From: Sami Imseih <[email protected]>
To: Lukas Fittl <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info
Date: Sun, 1 Mar 2026 09:43:54 -0600
Message-ID: <CAA5RZ0tu-Si5MaaFMiVq+SONBv+fPPPwhtb65dHoO75Pfs8BEg@mail.gmail.com> (raw)
In-Reply-To: <CAP53PkzYZ8YxH0o+Garw9fWdFRoEtmQKT09-q=2RVMW8uVS5Nw@mail.gmail.com>
References: <CAP53PkzYZ8YxH0o+Garw9fWdFRoEtmQKT09-q=2RVMW8uVS5Nw@mail.gmail.com>

> Hi,
>
> Over the last weeks we've been fighting again with pg_stat_statements
> issues, specifically with the issue of having too many unique entries,
> and the corresponding problem of large query text files and LWLock
> wait events related to pg_stat_statements.

Yeah, LWLock:pg_stat_statements related to garbage collection can be
really bad when large files must be recreated with live query texts,
or if this GC must occur often (in the case of high query entry churn).
A high dealloc count is already a good indicator of the need for GC,
but most people, in my experience, do not realize there is a GC
process, since it is not documented or exposed.

A high dealloc is already a good indicator of the need for gc, but most
people, in my experience, don't realize there is a gc process, since it's
not documented and not exposed.

> I think we can improve debugging for such situations by adding two
> more columns to pg_stat_statements_info that expose information
> already tracked:
>
> 1) "gc_count", showing when the pg_stat_statements query garbage
> collection cycles occur (which can correlate with
> LWLock:pg_stat_statements)
>
> 2) "query_file_size" which tells us the extent of the query text file,
> so we can fine-tune when we query the texts from pg_stat_statements in
> monitoring scripts (i.e. query it less frequently if the query text
> file is very large).

I do agree that having such additional information, with proper
documentation, is a good idea. However, I do wonder if we should hold
off on adding any of this info in 19 because of the point you make
below, which could completely change the information we need to
expose. Adding this information in 19 and then removing it for 20 may
not be worthwhile.

> I've had a patch to improve this prepared for a previous cycle, but
> wasn't sure it was still needed because of the discussion re: keeping
> query texts in shared memory. But since it looks like that won't
> change for 19 (though I'm hoping to contribute more to improving that
> in the PG 20 cycle), see attached for consideration.

19 has 4ba012a8ed, which allows us to serialize and deserialize query
texts stored in, for example, DSA, with a dsa_pointer tracked by the
entry of a custom stats kind. I was also planning on continuing this
work for 20, and getting 4ba012a8ed was an important prerequisite for
this.

What do you think?

[1] [https://www.postgresql.org/message-id/[email protected]...]

--
Sami Imseih
Amazon Web Services (AWS)





view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info
  In-Reply-To: <CAA5RZ0tu-Si5MaaFMiVq+SONBv+fPPPwhtb65dHoO75Pfs8BEg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox