public inbox for [email protected]
help / color / mirror / Atom feedpg_stat_statements: Add gc_count and query_file_size to pgss_info
3+ messages / 2 participants
[nested] [flat]
* pg_stat_statements: Add gc_count and query_file_size to pgss_info
@ 2026-02-28 23:46 Lukas Fittl <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Lukas Fittl @ 2026-02-28 23:46 UTC (permalink / raw)
To: PostgreSQL Hackers <[email protected]>
Hi,
Over the last weeks we've been fighting again with pg_stat_statements
issues, specifically with the issue of having too many unique entries,
and the corresponding problem of large query text files and LWLock
wait events related to pg_stat_statements.
I think we can improve debugging for such situations by adding two
more columns to pg_stat_statements_info that expose information
already tracked:
1) "gc_count", showing when the pg_stat_statements query garbage
collection cycles occur (which can correlate with
LWLock:pg_stat_statements)
2) "query_file_size" which tells us the extent of the query text file,
so we can fine-tune when we query the texts from pg_stat_statements in
monitoring scripts (i.e. query it less frequently if the query text
file is very large).
I've had a patch to improve this prepared for a previous cycle, but
wasn't sure it was still needed because of the discussion re: keeping
query texts in shared memory. But since it looks like that won't
change for 19 (though I'm hoping to contribute more to improving that
in the PG 20 cycle), see attached for consideration.
Thanks,
Lukas
--
Lukas Fittl
Attachments:
[application/octet-stream] v1-0001-pg_stat_statements-Add-gc_count-and-query_file_si.patch (9.7K, 2-v1-0001-pg_stat_statements-Add-gc_count-and-query_file_si.patch)
download | inline diff:
From 966ee151cab2cd35bf9f70ab799311010c6a3305 Mon Sep 17 00:00:00 2001
From: Lukas Fittl <[email protected]>
Date: Sat, 28 Feb 2026 14:56:28 -0800
Subject: [PATCH v1] pg_stat_statements: Add gc_count and query_file_size to
pgss_info
This surfaces two internal metrics to end-users and monitoring scripts
that allow using pg_stat_statements more effectively:
The "gc_count" field complements the existing deallocation counter
("dealloc"), but counts up when the actual query text garbage collection
occurrs, which is when the pg_stat_statements LWLock is typically held
the longest, and significant I/O can occur for large query text files.
The "query_file_size" field returns the current size of the query text
file, as tracked by pg_stat_statements already, without requiring a stat()
call. This can be useful for automated collection of query texts, to
tweak how often to collect them, based on the query file size. It also
helps correlate the amount of I/O performed with the actual file size,
which can be up to 2x of the sum of the query texts actually returned.
Bump pg_stat_statements version to 1.14 due to the addition of the two
new columns to pg_stat_statements_info.
Author: Lukas Fittl <[email protected]>
Reviewed by:
Discussion:
---
contrib/pg_stat_statements/Makefile | 1 +
.../expected/oldextversions.out | 13 +++++++++
contrib/pg_stat_statements/meson.build | 1 +
.../pg_stat_statements--1.13--1.14.sql | 28 +++++++++++++++++++
.../pg_stat_statements/pg_stat_statements.c | 10 +++++--
.../pg_stat_statements.control | 2 +-
.../pg_stat_statements/sql/oldextversions.sql | 4 +++
doc/src/sgml/pgstatstatements.sgml | 20 +++++++++++++
8 files changed, 76 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.13--1.14.sql
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index c27e9529bb6..d7142f71cf7 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -7,6 +7,7 @@ OBJS = \
EXTENSION = pg_stat_statements
DATA = pg_stat_statements--1.4.sql \
+ pg_stat_statements--1.13--1.14.sql \
pg_stat_statements--1.12--1.13.sql \
pg_stat_statements--1.11--1.12.sql pg_stat_statements--1.10--1.11.sql \
pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
diff --git a/contrib/pg_stat_statements/expected/oldextversions.out b/contrib/pg_stat_statements/expected/oldextversions.out
index 726383a99d7..b9c7df4570a 100644
--- a/contrib/pg_stat_statements/expected/oldextversions.out
+++ b/contrib/pg_stat_statements/expected/oldextversions.out
@@ -474,4 +474,17 @@ SELECT count(*) > 0 AS has_data FROM pg_stat_statements;
t
(1 row)
+-- New functions and views for pg_stat_statements_info in 1.14
+AlTER EXTENSION pg_stat_statements UPDATE TO '1.14';
+SELECT pg_get_functiondef('pg_stat_statements_info'::regproc);
+ pg_get_functiondef
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ CREATE OR REPLACE FUNCTION public.pg_stat_statements_info(OUT dealloc bigint, OUT gc_count bigint, OUT query_file_size bigint, OUT stats_reset timestamp with time zone)+
+ RETURNS record +
+ LANGUAGE c +
+ PARALLEL SAFE STRICT +
+ AS '$libdir/pg_stat_statements', $function$pg_stat_statements_info$function$ +
+
+(1 row)
+
DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/meson.build b/contrib/pg_stat_statements/meson.build
index 9d78cb88b7d..77148949c0d 100644
--- a/contrib/pg_stat_statements/meson.build
+++ b/contrib/pg_stat_statements/meson.build
@@ -21,6 +21,7 @@ contrib_targets += pg_stat_statements
install_data(
'pg_stat_statements.control',
'pg_stat_statements--1.4.sql',
+ 'pg_stat_statements--1.13--1.14.sql',
'pg_stat_statements--1.12--1.13.sql',
'pg_stat_statements--1.11--1.12.sql',
'pg_stat_statements--1.10--1.11.sql',
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.13--1.14.sql b/contrib/pg_stat_statements/pg_stat_statements--1.13--1.14.sql
new file mode 100644
index 00000000000..6687ae4e1c3
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.13--1.14.sql
@@ -0,0 +1,28 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.13--1.14.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.14'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements_info;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements_info();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements_info;
+DROP FUNCTION pg_stat_statements_info();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements_info(
+ OUT dealloc bigint,
+ OUT gc_count bigint,
+ OUT query_file_size bigint,
+ OUT stats_reset timestamp with time zone
+)
+RETURNS record
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements_info AS
+ SELECT * FROM pg_stat_statements_info();
+
+GRANT SELECT ON pg_stat_statements_info TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 4a427533bd8..fcda847b4de 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2049,7 +2049,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
}
/* Number of output arguments (columns) for pg_stat_statements_info */
-#define PG_STAT_STATEMENTS_INFO_COLS 2
+#define PG_STAT_STATEMENTS_INFO_COLS 4
/*
* Return statistics of pg_stat_statements.
@@ -2061,6 +2061,8 @@ pg_stat_statements_info(PG_FUNCTION_ARGS)
TupleDesc tupdesc;
Datum values[PG_STAT_STATEMENTS_INFO_COLS] = {0};
bool nulls[PG_STAT_STATEMENTS_INFO_COLS] = {0};
+ int64 gc_count,
+ extent;
if (!pgss || !pgss_hash)
ereport(ERROR,
@@ -2074,10 +2076,14 @@ pg_stat_statements_info(PG_FUNCTION_ARGS)
/* Read global statistics for pg_stat_statements */
SpinLockAcquire(&pgss->mutex);
stats = pgss->stats;
+ gc_count = pgss->gc_count;
+ extent = pgss->extent;
SpinLockRelease(&pgss->mutex);
values[0] = Int64GetDatum(stats.dealloc);
- values[1] = TimestampTzGetDatum(stats.stats_reset);
+ values[1] = Int64GetDatum(gc_count);
+ values[2] = Int64GetDatum(extent);
+ values[3] = TimestampTzGetDatum(stats.stats_reset);
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 2eee0ceffa8..61ae41efc14 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
# pg_stat_statements extension
comment = 'track planning and execution statistics of all SQL statements executed'
-default_version = '1.13'
+default_version = '1.14'
module_pathname = '$libdir/pg_stat_statements'
relocatable = true
diff --git a/contrib/pg_stat_statements/sql/oldextversions.sql b/contrib/pg_stat_statements/sql/oldextversions.sql
index e416efe9ffb..0f050173249 100644
--- a/contrib/pg_stat_statements/sql/oldextversions.sql
+++ b/contrib/pg_stat_statements/sql/oldextversions.sql
@@ -68,4 +68,8 @@ AlTER EXTENSION pg_stat_statements UPDATE TO '1.13';
\d pg_stat_statements
SELECT count(*) > 0 AS has_data FROM pg_stat_statements;
+-- New functions and views for pg_stat_statements_info in 1.14
+AlTER EXTENSION pg_stat_statements UPDATE TO '1.14';
+SELECT pg_get_functiondef('pg_stat_statements_info'::regproc);
+
DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index d753de5836e..11256ef9e3d 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -800,6 +800,26 @@ calls | 2
<varname>pg_stat_statements.max</varname> were observed
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>gc_count</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of times the <structname>pg_stat_statements</structname> query text
+ file was rewritten for garbage collection (GC) purposes. This may occur after an
+ explicit call to <function>pg_stat_statements_reset</function>, or when sufficient
+ stale query texts have accumulated due to deallocations.
+ </para></entry>
+ </row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>query_file_size</structfield> <type>bigint</type>
+ </para>
+ <para>
+ The cumulative size (in bytes) of all query texts stored in the
+ <structname>pg_stat_statements</structname> query text file.
+ </para></entry>
+ </row>
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
--
2.47.1
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info
@ 2026-03-01 15:43 Sami Imseih <[email protected]>
parent: Lukas Fittl <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Sami Imseih @ 2026-03-01 15:43 UTC (permalink / raw)
To: Lukas Fittl <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>
> Hi,
>
> Over the last weeks we've been fighting again with pg_stat_statements
> issues, specifically with the issue of having too many unique entries,
> and the corresponding problem of large query text files and LWLock
> wait events related to pg_stat_statements.
Yeah, LWLock:pg_stat_statements related to garbage collection can be
really bad when large files must be recreated with live query texts,
or if this GC must occur often (in the case of high query entry churn).
A high dealloc count is already a good indicator of the need for GC,
but most people, in my experience, do not realize there is a GC
process, since it is not documented or exposed.
A high dealloc is already a good indicator of the need for gc, but most
people, in my experience, don't realize there is a gc process, since it's
not documented and not exposed.
> I think we can improve debugging for such situations by adding two
> more columns to pg_stat_statements_info that expose information
> already tracked:
>
> 1) "gc_count", showing when the pg_stat_statements query garbage
> collection cycles occur (which can correlate with
> LWLock:pg_stat_statements)
>
> 2) "query_file_size" which tells us the extent of the query text file,
> so we can fine-tune when we query the texts from pg_stat_statements in
> monitoring scripts (i.e. query it less frequently if the query text
> file is very large).
I do agree that having such additional information, with proper
documentation, is a good idea. However, I do wonder if we should hold
off on adding any of this info in 19 because of the point you make
below, which could completely change the information we need to
expose. Adding this information in 19 and then removing it for 20 may
not be worthwhile.
> I've had a patch to improve this prepared for a previous cycle, but
> wasn't sure it was still needed because of the discussion re: keeping
> query texts in shared memory. But since it looks like that won't
> change for 19 (though I'm hoping to contribute more to improving that
> in the PG 20 cycle), see attached for consideration.
19 has 4ba012a8ed, which allows us to serialize and deserialize query
texts stored in, for example, DSA, with a dsa_pointer tracked by the
entry of a custom stats kind. I was also planning on continuing this
work for 20, and getting 4ba012a8ed was an important prerequisite for
this.
What do you think?
[1] [https://www.postgresql.org/message-id/[email protected]...]
--
Sami Imseih
Amazon Web Services (AWS)
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: pg_stat_statements: Add gc_count and query_file_size to pgss_info
@ 2026-03-25 18:49 Lukas Fittl <[email protected]>
parent: Sami Imseih <[email protected]>
0 siblings, 0 replies; 3+ messages in thread
From: Lukas Fittl @ 2026-03-25 18:49 UTC (permalink / raw)
To: Sami Imseih <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>
Hi Sami,
On Sun, Mar 1, 2026 at 7:44 AM Sami Imseih <[email protected]> wrote:
> I do agree that having such additional information, with proper
> documentation, is a good idea. However, I do wonder if we should hold
> off on adding any of this info in 19 because of the point you make
> below, which could completely change the information we need to
> expose. Adding this information in 19 and then removing it for 20 may
> not be worthwhile.
Yeah, I suspect you're right - I'll mark this as returned with feedback for now.
> > I've had a patch to improve this prepared for a previous cycle, but
> > wasn't sure it was still needed because of the discussion re: keeping
> > query texts in shared memory. But since it looks like that won't
> > change for 19 (though I'm hoping to contribute more to improving that
> > in the PG 20 cycle), see attached for consideration.
>
> 19 has 4ba012a8ed, which allows us to serialize and deserialize query
> texts stored in, for example, DSA, with a dsa_pointer tracked by the
> entry of a custom stats kind. I was also planning on continuing this
> work for 20, and getting 4ba012a8ed was an important prerequisite for
> this.
Yup, makes sense - I think 4ba012a8ed is a foundational piece to make
progress on this in 20.
I've also been wondering if we should prototype a new
pg_stat_statements module separately (e.g. in a GitHub repo,
"pg_stat_statements_next" or something), just to allow quicker
iteration and allow easier testing of a pure in-memory approach,
before bringing it to the discussion table for -hackers later in the
PG20 cycle.
Thanks,
Lukas
--
Lukas Fittl
^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-03-25 18:49 UTC | newest]
Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-28 23:46 pg_stat_statements: Add gc_count and query_file_size to pgss_info Lukas Fittl <[email protected]>
2026-03-01 15:43 ` Sami Imseih <[email protected]>
2026-03-25 18:49 ` Lukas Fittl <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox