Re: query_id: jumble names of temp tables for better pg_stat

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: query_id: jumble names of temp tables for better pg_stat_statement UX
19+ messages / 6 participants
[nested] [flat]

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 02:38  Sami Imseih <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread

From: Sami Imseih @ 2025-03-25 02:38 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

> select * from foo s1;
> select * from foo s2;

ah, thanks for pointing this out. Not as good of a workaround as
a comment since one must change aliases, but at least there is
a workaround...

> As against this, there is probably also a set of people who would
> *like* identical queries on identically-rowtyped tables in different
> schemas to be merged.  Right now they have no way to make that happen.

I agree that some may want stats to merge for the same tables,
and others may not. I will suggest this with some hesitation, but why not
make this behavior configurable via a GUC?
We recently introduced query_id_squash_values for controlling
the merge of an IN list, maybe this is another queryId behavior we should
provide a configuration for?

--
Sami Imseih
Amazon Web Services (AWS)

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 02:56  Tom Lane <[email protected]>
  parent: Sami Imseih <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Tom Lane @ 2025-03-25 02:56 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Sami Imseih <[email protected]> writes:
> I agree that some may want stats to merge for the same tables,
> and others may not. I will suggest this with some hesitation, but why not
> make this behavior configurable via a GUC?
> We recently introduced query_id_squash_values for controlling
> the merge of an IN list, maybe this is another queryId behavior we should
> provide a configuration for?

I don't like that GUC and I would not like this one either.  We
learned years ago that GUCs that change query semantics are a bad
idea, but apparently now we have hackers who need to relearn that
lesson the hard way.  (Admittedly, this isn't quite *query* semantics,
which perhaps lessens the blast radius.  But I think we're still going
to regret query_id_squash_values.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 03:30  Sami Imseih <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread

From: Sami Imseih @ 2025-03-25 03:30 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

> Sami Imseih <[email protected]> writes:
> > I agree that some may want stats to merge for the same tables,
> > and others may not. I will suggest this with some hesitation, but why not
> > make this behavior configurable via a GUC?
> > We recently introduced query_id_squash_values for controlling
> > the merge of an IN list, maybe this is another queryId behavior we should
> > provide a configuration for?
>
> I don't like that GUC and I would not like this one either.  We
> learned years ago that GUCs that change query semantics are a bad
> idea, but apparently now we have hackers who need to relearn that
> lesson the hard way.  (Admittedly, this isn't quite *query* semantics,
> which perhaps lessens the blast radius.  But I think we're still going
> to regret query_id_squash_values.)

query_id_squash_values has a much weaker argument to exist than a
guc to control the use of alias vs OID. Why would anyone not want
to squash the IN list? maybe we should revisit this decision in that thread.

With that said, if everyone is OK with the behavior change of pg_s_s
with this patch, I will concede. FWIW, I do think this for most cases, the
proposed behavior is desirable. I just worry about the users who rely on the
OID being jumbled behavior, and have no way to revery.

--
Sami Imseih
Amazon Web Services (AWS)






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 03:51  Michael Paquier <[email protected]>
  parent: Sami Imseih <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Michael Paquier @ 2025-03-25 03:51 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Mon, Mar 24, 2025 at 09:38:35PM -0500, Sami Imseih wrote:
> > select * from foo s1;
> > select * from foo s2;
> 
> ah, thanks for pointing this out. Not as good of a workaround as
> a comment since one must change aliases, but at least there is
> a workaround...

Exactly.  Like Tom I'm not really worried about the proposal, but of
course I could prove to be wrong.  I am ready to assume that bloat in
pgss entries caused by temp tables is a more common case.

So let's move on with this proposal, before the gong.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 03:53  Michael Paquier <[email protected]>
  parent: Sami Imseih <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Michael Paquier @ 2025-03-25 03:53 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Mon, Mar 24, 2025 at 10:30:59PM -0500, Sami Imseih wrote:
>> Sami Imseih <[email protected]> writes:
>>> I agree that some may want stats to merge for the same tables,
>>> and others may not. I will suggest this with some hesitation, but why not
>>> make this behavior configurable via a GUC?
>>> We recently introduced query_id_squash_values for controlling
>>> the merge of an IN list, maybe this is another queryId behavior we should
>>> provide a configuration for?
>>
>> I don't like that GUC and I would not like this one either.  We
>> learned years ago that GUCs that change query semantics are a bad
>> idea, but apparently now we have hackers who need to relearn that
>> lesson the hard way.  (Admittedly, this isn't quite *query* semantics,
>> which perhaps lessens the blast radius.  But I think we're still going
>> to regret query_id_squash_values.)
> 
> query_id_squash_values has a much weaker argument to exist than a
> guc to control the use of alias vs OID. Why would anyone not want
> to squash the IN list? maybe we should revisit this decision in that thread.

This part of the thread is digressing, but I'd on the side of removing
entirely the GUC and make the grouping of IN values the default.  We
still have time to discuss that during the beta cycle, so let's do so
on its related thread.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 04:30  Sami Imseih <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 19+ messages in thread

From: Sami Imseih @ 2025-03-25 04:30 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

> Exactly.  Like Tom I'm not really worried about the proposal, but of
> course I could prove to be wrong.  I am ready to assume that bloat in
> pgss entries caused by temp tables is a more common case.

I ran installcheck-parallel with the patch and without the patch
3x, and the reduction in pg_s_s bloat is visible. The same entries
are reused even when the regression database is recreated, as
expected.

## without patch run installcheck-parallel 3 times
postgres=# select count(distinct queryid), count(*) from pg_stat_statements;
 count | count
-------+-------
 41630 | 74776
(1 row)


## with patch run installcheck-parallel 3 times
postgres=# select count(distinct queryid), count(*) from pg_stat_statements;
 count | count
-------+-------
 26301 | 73030
(1 row)

>This part of the thread is digressing, but I'd on the side of removing
> entirely the GUC and make the grouping of IN values the default.  We
> still have time to discuss that during the beta cycle, so let's do so
> on its related thread.

I will do that tomorrow in that thread. It would not make sense to introduce
a GUC for that behavior and not for this. So, I am glad we discussed this
here.

Looking at the patches, I could not find anything else besides what
was pointed out by Christoph earlier.

--
Sami Imseih
Amazon Web Services (AWS)






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 04:32  Tom Lane <[email protected]>
  parent: Sami Imseih <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Tom Lane @ 2025-03-25 04:32 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Sami Imseih <[email protected]> writes:
>> I don't like that GUC and I would not like this one either.  We
>> learned years ago that GUCs that change query semantics are a bad
>> idea, but apparently now we have hackers who need to relearn that
>> lesson the hard way.

> query_id_squash_values has a much weaker argument to exist than a
> guc to control the use of alias vs OID. Why would anyone not want
> to squash the IN list? maybe we should revisit this decision in that thread.

I'm not opining one way or the other on whether squashing an IN list
is desirable.  My point is that a GUC is a poor way to make --- or
really, avoid making --- such decisions.  The reasons we took away
from previous experiments with semantics-determing GUCs were:

1. The scope of effect of a GUC is seldom what you want for such
things.  There are going to be some queries in which you want behavior
A, and some in which you want behavior B, and in the worst case you
want different behaviors in different parts of the same query.  It's
really painful to make that happen.

2. Tools that are not entitled to set the value of the GUC are forced
to be prepared to cope with any setting.  That can be anywhere from
painful to impossible.

For the specific context of controlling how query jumbling happens,
there's still another problem: the actual statement-merging behavior of
pg_stat_statements will depend on the totality of settings of the GUC
across the entire system.  It's not very clear to me what will happen
if different sessions use different settings, much less if people
change the setting intra-session; but I bet a lot of people will find
it surprising.  62d712ecf did no one any favors by marking that GUC
USERSET rather than something that would prevail system-wide.

All of these remarks apply with equal force to anything that changes
the behavior of query-jumbling, no matter the specific details.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 05:09  Julien Rouhaud <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Julien Rouhaud @ 2025-03-25 05:09 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Hi,

On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote:
>
> I'm not opining one way or the other on whether squashing an IN list
> is desirable.  My point is that a GUC is a poor way to make --- or
> really, avoid making --- such decisions.  The reasons we took away
> from previous experiments with semantics-determing GUCs were:
>
> 1. The scope of effect of a GUC is seldom what you want for such
> things.  There are going to be some queries in which you want behavior
> A, and some in which you want behavior B, and in the worst case you
> want different behaviors in different parts of the same query.  It's
> really painful to make that happen.
>
> 2. Tools that are not entitled to set the value of the GUC are forced
> to be prepared to cope with any setting.  That can be anywhere from
> painful to impossible.

Didn't that ship already sailed in pg14 when we allowed generating custom
query_id?

Now:

> For the specific context of controlling how query jumbling happens,
> there's still another problem: the actual statement-merging behavior of
> pg_stat_statements will depend on the totality of settings of the GUC
> across the entire system.  It's not very clear to me what will happen
> if different sessions use different settings, much less if people
> change the setting intra-session; but I bet a lot of people will find
> it surprising.  62d712ecf did no one any favors by marking that GUC
> USERSET rather than something that would prevail system-wide.

One of the requirement for the custom query_id was that you shouldn't be
allowed to change the algorithm dynamically, at least not unless a superuser
agrees to maybe break everything, which is why compute_query_id is marked as
PGC_SUSET.  I think that the same reasoning should apply here and if the GUC is
kept it should be at least at that level.






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 05:17  Tom Lane <[email protected]>
  parent: Julien Rouhaud <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Tom Lane @ 2025-03-25 05:17 UTC (permalink / raw)
  To: Julien Rouhaud <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Julien Rouhaud <[email protected]> writes:
> On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote:
>> 2. Tools that are not entitled to set the value of the GUC are forced
>> to be prepared to cope with any setting.  That can be anywhere from
>> painful to impossible.

> Didn't that ship already sailed in pg14 when we allowed generating custom
> query_id?

Up to a point, perhaps.  If I'm writing some kind of tool that digests
pg_stat_statements results, I think I'm entitled to disregard the
possibility that somebody is using a custom query_id that behaves in
ways I'm not expecting --- or at least, fixing my code for that is
their problem not mine.  But it's much harder to take that attitude
for things that are built into core PG.

>> For the specific context of controlling how query jumbling happens,
>> there's still another problem: the actual statement-merging behavior of
>> pg_stat_statements will depend on the totality of settings of the GUC
>> across the entire system.

> One of the requirement for the custom query_id was that you shouldn't be
> allowed to change the algorithm dynamically, at least not unless a superuser
> agrees to maybe break everything, which is why compute_query_id is marked as
> PGC_SUSET.  I think that the same reasoning should apply here and if the GUC is
> kept it should be at least at that level.

Fully agreed.  (Even SUSET is debatable in this situation, but I'm okay
with it on the principle that superusers are expected to know what
they're doing and accept the consequences.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 05:28  Julien Rouhaud <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 19+ messages in thread

From: Julien Rouhaud @ 2025-03-25 05:28 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Mar 25, 2025 at 01:17:22AM -0400, Tom Lane wrote:
> Julien Rouhaud <[email protected]> writes:
> > On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote:
> >> 2. Tools that are not entitled to set the value of the GUC are forced
> >> to be prepared to cope with any setting.  That can be anywhere from
> >> painful to impossible.
> 
> > Didn't that ship already sailed in pg14 when we allowed generating custom
> > query_id?
> 
> Up to a point, perhaps.  If I'm writing some kind of tool that digests
> pg_stat_statements results, I think I'm entitled to disregard the
> possibility that somebody is using a custom query_id that behaves in
> ways I'm not expecting --- or at least, fixing my code for that is
> their problem not mine.  But it's much harder to take that attitude
> for things that are built into core PG.

I see, that's fair.






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 06:09  Lukas Fittl <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 2 replies; 19+ messages in thread

From: Lukas Fittl @ 2025-03-25 06:09 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Mon, Mar 24, 2025 at 8:51 PM Michael Paquier <[email protected]> wrote:

> On Mon, Mar 24, 2025 at 09:38:35PM -0500, Sami Imseih wrote:
> > > select * from foo s1;
> > > select * from foo s2;
> >
> > ah, thanks for pointing this out. Not as good of a workaround as
> > a comment since one must change aliases, but at least there is
> > a workaround...
>
> Exactly.  Like Tom I'm not really worried about the proposal, but of
> course I could prove to be wrong.  I am ready to assume that bloat in
> pgss entries caused by temp tables is a more common case.
>

For what its worth, +1 on the current proposal in this thread (and doing it
without a GUC), i.e. merging a query that references the same table alias,
ignoring different schemas.

In the context of the earlier referenced one-schema-per-customer workloads:

In my experience these often not work well with pg_stat_statements today
because of their own bloat problem, just like with temp tables. You quickly
have way too many unique entries, and your query text file accumulates a
lot of duplicative entries (since the same query text gets repeated in the
text file, since its queryid is different), to the point that you can't
monitor your workload at all anymore.

Thanks,
Lukas

-- 
Lukas Fittl

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 07:02  Michael Paquier <[email protected]>
  parent: Lukas Fittl <[email protected]>
  1 sibling, 1 reply; 19+ messages in thread

From: Michael Paquier @ 2025-03-25 07:02 UTC (permalink / raw)
  To: Lukas Fittl <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Mon, Mar 24, 2025 at 11:09:06PM -0700, Lukas Fittl wrote:
> For what its worth, +1 on the current proposal in this thread (and doing it
> without a GUC), i.e. merging a query that references the same table alias,
> ignoring different schemas.

Thanks for the feedback.  I have looked again at the first patch to
add custom_query_jumble as a node field attribute, adjusted some
comments, and applied it as 5ac462e2b7ac.

Attached is the second one, with more tests coverage with attribute
aliases (these being ignored exists in stable branches, but why not
while on it) and table aliases, and the fixes for the issues pointed
out by Christoph.  I'll double-check all that again tomorrow. Please
find an updated version attached for now.
--
Michael


Attachments:

  [text/x-diff] v5-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (11.0K, 2-v5-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch)
  download | inline diff:
From 68d363fbee484b40308a00a85329364ff0901e9b Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Tue, 25 Mar 2025 15:40:10 +0900
Subject: [PATCH v5] Add custom query jumble function for RangeTblEntry.eref

---
 src/include/nodes/parsenodes.h                |  11 +-
 src/backend/nodes/queryjumblefuncs.c          |  19 ++
 .../pg_stat_statements/expected/select.out    | 236 ++++++++++++++++++
 contrib/pg_stat_statements/sql/select.sql     |  69 +++++
 4 files changed, 332 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf2..a87f949b389e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry
 	 */
 	/* user-written alias clause, if any */
 	Alias	   *alias pg_node_attr(query_jumble_ignore);
-	/* expanded reference names */
-	Alias	   *eref pg_node_attr(query_jumble_ignore);
+
+	/*
+	 * Expanded reference names.  This uses a custom query jumble function so
+	 * as the table name is included in the computation, not its list of
+	 * columns.
+	 */
+	Alias	   *eref pg_node_attr(custom_query_jumble);
 
 	RTEKind		rtekind;		/* see above */
 
@@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry
 	 * tables to be invalidated if the underlying table is altered.
 	 */
 	/* OID of the relation */
-	Oid			relid;
+	Oid			relid pg_node_attr(query_jumble_ignore);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704ba..62d6cfb7ac15 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
+									  RangeTblEntry *rte,
+									  Alias *expr);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * Custom query jumble function for RangeTblEntry.eref.
+ */
+static void
+_jumbleRangeTblEntry_eref(JumbleState *jstate,
+						  RangeTblEntry *rte,
+						  Alias *expr)
+{
+	JUMBLE_FIELD(type);
+
+	/*
+	 * This includes only the table name, the list of column names is ignored.
+	 */
+	JUMBLE_STRING(aliasname);
+}
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a6..bf05d521e866 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -413,3 +413,239 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+-- Temporary tables, grouped together.
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls |                                 query                                  
+-------+------------------------------------------------------------------------
+     2 | SELECT * FROM temp_t
+     0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+-- search_path with various schemas and temporary tables
+CREATE SCHEMA pgss_schema_1;
+CREATE SCHEMA pgss_schema_2;
+-- Same attributes.
+CREATE TABLE pgss_schema_1.tab_search_same (a int, b int);
+CREATE TABLE pgss_schema_2.tab_search_same (a int, b int);
+CREATE TEMP TABLE tab_search_same (a int, b int);
+-- Different number of attributes, mapping types
+CREATE TABLE pgss_schema_1.tab_search_diff_1 (a int);
+CREATE TABLE pgss_schema_2.tab_search_diff_1 (a int, b int);
+CREATE TEMP TABLE tab_search_diff_1 (a int, b int, c int);
+-- Same number of attributes, different types
+CREATE TABLE pgss_schema_1.tab_search_diff_2 (a int);
+CREATE TABLE pgss_schema_2.tab_search_diff_2 (a text);
+CREATE TEMP TABLE tab_search_diff_2 (a bigint);
+-- First permanent schema
+SET search_path = 'pgss_schema_1';
+SELECT count(*) FROM tab_search_same;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a, b FROM tab_search_same;
+ a | b 
+---+---
+(0 rows)
+
+SELECT count(*) FROM tab_search_diff_1;
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*) FROM tab_search_diff_2;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a FROM tab_search_diff_2 AS t1;
+ a 
+---
+(0 rows)
+
+SELECT a FROM tab_search_diff_2;
+ a 
+---
+(0 rows)
+
+SELECT a AS a1 FROM tab_search_diff_2;
+ a1 
+----
+(0 rows)
+
+-- Second permanent schema
+SET search_path = 'pgss_schema_2';
+SELECT count(*) FROM tab_search_same;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a, b FROM tab_search_same;
+ a | b 
+---+---
+(0 rows)
+
+SELECT count(*) FROM tab_search_diff_1;
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*) FROM tab_search_diff_2;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a FROM tab_search_diff_2 AS t1;
+ a 
+---
+(0 rows)
+
+SELECT a FROM tab_search_diff_2;
+ a 
+---
+(0 rows)
+
+SELECT a AS a1 FROM tab_search_diff_2;
+ a1 
+----
+(0 rows)
+
+-- Temporary schema
+SET search_path = 'pg_temp';
+SELECT count(*) FROM tab_search_same;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a, b FROM tab_search_same;
+ a | b 
+---+---
+(0 rows)
+
+SELECT count(*) FROM tab_search_diff_1;
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*) FROM tab_search_diff_2;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a FROM tab_search_diff_2 AS t1;
+ a 
+---
+(0 rows)
+
+SELECT a FROM tab_search_diff_2;
+ a 
+---
+(0 rows)
+
+SELECT a AS a1 FROM tab_search_diff_2;
+ a1 
+----
+(0 rows)
+
+RESET search_path;
+-- Schema qualifications
+SELECT count(*) FROM pgss_schema_1.tab_search_same;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a, b FROM pgss_schema_1.tab_search_same;
+ a | b 
+---+---
+(0 rows)
+
+SELECT count(*) FROM pgss_schema_2.tab_search_diff_1;
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*) FROM pg_temp.tab_search_diff_2;
+ count 
+-------
+     0
+(1 row)
+
+SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1;
+ a 
+---
+(0 rows)
+
+SELECT a FROM pgss_schema_2.tab_search_diff_2;
+ a 
+---
+(0 rows)
+
+SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2;
+ a1 
+----
+(0 rows)
+
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls |                                 query                                  
+-------+------------------------------------------------------------------------
+     8 | SELECT a FROM tab_search_diff_2
+     4 | SELECT a FROM tab_search_diff_2 AS t1
+     4 | SELECT a, b FROM tab_search_same
+     0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+     4 | SELECT count(*) FROM tab_search_diff_1
+     4 | SELECT count(*) FROM tab_search_diff_2
+     4 | SELECT count(*) FROM tab_search_same
+     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(8 rows)
+
+DROP SCHEMA pgss_schema_1 CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table pgss_schema_1.tab_search_same
+drop cascades to table pgss_schema_1.tab_search_diff_1
+drop cascades to table pgss_schema_1.tab_search_diff_2
+DROP SCHEMA pgss_schema_2 CASCADE;
+NOTICE:  drop cascades to 3 other objects
+DETAIL:  drop cascades to table pgss_schema_2.tab_search_same
+drop cascades to table pgss_schema_2.tab_search_diff_1
+drop cascades to table pgss_schema_2.tab_search_diff_2
+DROP TABLE tab_search_same, tab_search_diff_1, tab_search_diff_2;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24b..fbed557ec369 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -148,3 +148,72 @@ SELECT (
 
 SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%';
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- Temporary tables, grouped together.
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
+-- search_path with various schemas and temporary tables
+CREATE SCHEMA pgss_schema_1;
+CREATE SCHEMA pgss_schema_2;
+-- Same attributes.
+CREATE TABLE pgss_schema_1.tab_search_same (a int, b int);
+CREATE TABLE pgss_schema_2.tab_search_same (a int, b int);
+CREATE TEMP TABLE tab_search_same (a int, b int);
+-- Different number of attributes, mapping types
+CREATE TABLE pgss_schema_1.tab_search_diff_1 (a int);
+CREATE TABLE pgss_schema_2.tab_search_diff_1 (a int, b int);
+CREATE TEMP TABLE tab_search_diff_1 (a int, b int, c int);
+-- Same number of attributes, different types
+CREATE TABLE pgss_schema_1.tab_search_diff_2 (a int);
+CREATE TABLE pgss_schema_2.tab_search_diff_2 (a text);
+CREATE TEMP TABLE tab_search_diff_2 (a bigint);
+-- First permanent schema
+SET search_path = 'pgss_schema_1';
+SELECT count(*) FROM tab_search_same;
+SELECT a, b FROM tab_search_same;
+SELECT count(*) FROM tab_search_diff_1;
+SELECT count(*) FROM tab_search_diff_2;
+SELECT a FROM tab_search_diff_2 AS t1;
+SELECT a FROM tab_search_diff_2;
+SELECT a AS a1 FROM tab_search_diff_2;
+-- Second permanent schema
+SET search_path = 'pgss_schema_2';
+SELECT count(*) FROM tab_search_same;
+SELECT a, b FROM tab_search_same;
+SELECT count(*) FROM tab_search_diff_1;
+SELECT count(*) FROM tab_search_diff_2;
+SELECT a FROM tab_search_diff_2 AS t1;
+SELECT a FROM tab_search_diff_2;
+SELECT a AS a1 FROM tab_search_diff_2;
+-- Temporary schema
+SET search_path = 'pg_temp';
+SELECT count(*) FROM tab_search_same;
+SELECT a, b FROM tab_search_same;
+SELECT count(*) FROM tab_search_diff_1;
+SELECT count(*) FROM tab_search_diff_2;
+SELECT a FROM tab_search_diff_2 AS t1;
+SELECT a FROM tab_search_diff_2;
+SELECT a AS a1 FROM tab_search_diff_2;
+RESET search_path;
+-- Schema qualifications
+SELECT count(*) FROM pgss_schema_1.tab_search_same;
+SELECT a, b FROM pgss_schema_1.tab_search_same;
+SELECT count(*) FROM pgss_schema_2.tab_search_diff_1;
+SELECT count(*) FROM pg_temp.tab_search_diff_2;
+SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1;
+SELECT a FROM pgss_schema_2.tab_search_diff_2;
+SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+DROP SCHEMA pgss_schema_1 CASCADE;
+DROP SCHEMA pgss_schema_2 CASCADE;
+DROP TABLE tab_search_same, tab_search_diff_1, tab_search_diff_2;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
-- 
2.49.0



  [application/pgp-signature] signature.asc (833B, 3-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 14:59  Sami Imseih <[email protected]>
  parent: Lukas Fittl <[email protected]>
  1 sibling, 0 replies; 19+ messages in thread

From: Sami Imseih @ 2025-03-25 14:59 UTC (permalink / raw)
  To: Lukas Fittl <[email protected]>; +Cc: Michael Paquier <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

> In my experience these often not work well with pg_stat_statements today because
> of their own bloat problem, just like with temp tables. You quickly have way too many
> unique entries, and your query text file accumulates a lot of duplicative entries
> (since the same query text gets repeated in the text file, since its queryid is different),
> to the point that you can't monitor your workload at all anymore.

That is true in terms of pg_stat_statements, and I have seen users
have to increase
pg_stat_statements.max to something much higher to avoid the bloat and constant
deallocs/GC.

But, besides pg_stat_statements, queryId is used to group queries for
database load
monitoring ( pg_stat_activity sampling). As of now, different schemas
are tracked
separately, but with this change they will be merged. This may come as
a surprise to
use-cases that rely on the existing behavior.

But I do agree that pg_s_s bloat is a big pain point, so this change
should be positive
overall. Let's see if there are enough complaints to force us to reconsider.

--
Sami Imseih
Amazon Web Services (AwS)

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 15:41  Christoph Berg <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Christoph Berg @ 2025-03-25 15:41 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Lukas Fittl <[email protected]>; Sami Imseih <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Re: Michael Paquier
> Attached is the second one, with more tests coverage with attribute
> aliases (these being ignored exists in stable branches, but why not
> while on it) and table aliases, and the fixes for the issues pointed
> out by Christoph.  I'll double-check all that again tomorrow. Please
> find an updated version attached for now.

Looks good to me.

Christoph






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 16:58  Sami Imseih <[email protected]>
  parent: Christoph Berg <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Sami Imseih @ 2025-03-25 16:58 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; +Cc: Michael Paquier <[email protected]>; Lukas Fittl <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

> > Attached is the second one, with more tests coverage with attribute
> > aliases (these being ignored exists in stable branches, but why not
> > while on it) and table aliases, and the fixes for the issues pointed
> > out by Christoph.  I'll double-check all that again tomorrow. Please
> > find an updated version attached for now.

There are several parts of the doc that may no longer hold true.

1/
"Since the queryid hash value is computed on the post-parse-analysis
representation of the queries, the opposite is also possible: queries with
identical texts might appear as separate entries, if they have different
meanings as a result of factors such as different search_path settings."

I think this text could remain as is, because search_path still
matters for things like functions, etc.

"""
postgres=# SET SEARCH_PATH=a;
SET
postgres=# explain verbose select * from test();
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 1
 Query Identifier: -1813735303617154554
(3 rows)

postgres=# SET SEARCH_PATH=b;
SET
postgres=# explain verbose select * from test();
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 1
 Query Identifier: -3896107319863686763
(3 rows)
"""

2/
"For example, pg_stat_statements will consider two apparently-identical
queries to be distinct, if they reference a table that was dropped and
recreated between the executions of the two queries."

This is no longer true for relations, but is still true for functions. I think
we should mention the caveats in a bit more detail as this change
will have impact on the most common case. What about something
like this?

"For example, pg_stat_statements will consider two apparently-identical
queries to be distinct, if they reference a function that was dropped and
recreated between the executions of the two queries. Conversely, if a table is
dropped and recreated between the executions of queries, two
apparently-identical
queries will be considered the same. However, if the alias for a table is
different for semantically similar queries, these queries will be
considered distinct"

--
Sami Imseih
Amazon Web Services (AWS)

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 23:10  Michael Paquier <[email protected]>
  parent: Sami Imseih <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Michael Paquier @ 2025-03-25 23:10 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Mar 25, 2025 at 11:58:21AM -0500, Sami Imseih wrote:
> "Since the queryid hash value is computed on the post-parse-analysis
> representation of the queries, the opposite is also possible: queries with
> identical texts might appear as separate entries, if they have different
> meanings as a result of factors such as different search_path settings."
> 
> I think this text could remain as is, because search_path still
> matters for things like functions, etc.

Yeah, I think that's OK as-is.  I'm open to more improvements,
including more tests for these function patterns.  It's one of these
areas where we should be able to tweak RangeTblFunction and apply a
custom function to its funcexpr, and please note that I have no idea
how complex it could become as this is a Node expression.  :D

Functions in a temporary schema is not something as common as temp
tables, I guess, so these matter less, but they would still be a cause
of bloat for monitoring in very specific workloads.

> 2/
> "For example, pg_stat_statements will consider two apparently-identical
> queries to be distinct, if they reference a table that was dropped and
> recreated between the executions of the two queries."
> 
> This is no longer true for relations, but is still true for functions. I think
> we should mention the caveats in a bit more detail as this change
> will have impact on the most common case. What about something
> like this?
> 
> "For example, pg_stat_statements will consider two apparently-identical
> queries to be distinct, if they reference a function that was dropped and
> recreated between the executions of the two queries.

That's a bit larger than functions, but we could remain a bit more
evasive, with "if they reference *for example* a function that was
dropped and recreated between the executions of the two queries".

Note that for DDLs, like CREATE TABLE, we also group entries with
identical relation names, so we are kind of in line with the patch,
not with the current docs.

> Conversely, if a table is dropped and recreated between the
> executions of queries, two apparently-identical queries may be
> considered the same. However, if the alias for a table is different
> for semantically similar queries, these queries will be considered distinct"

This addition sounds like an improvement here.

As this thread has proved, we had little coverage these cases in pgss,
so I've applied the tests as an independent change.  It is also useful
to track how things change in the commit history depending on how the
computation is tweaked.  I've also included your doc suggestions.  I
feel that we could do better here, but that's a common statement
anyway when it comes to the documentation.
--
Michael


Attachments:

  [text/x-diff] v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (5.9K, 2-v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch)
  download | inline diff:
From 8ea61bb0d7d6c4dbbb40dbaedb5e751027163cfe Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Wed, 26 Mar 2025 08:07:59 +0900
Subject: [PATCH v6] Add custom query jumble function for RangeTblEntry.eref

---
 src/include/nodes/parsenodes.h                | 11 +++++++---
 src/backend/nodes/queryjumblefuncs.c          | 19 ++++++++++++++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++++++--
 .../pg_stat_statements/expected/select.out    | 20 ++++++++-----------
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf2..a87f949b389e 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry
 	 */
 	/* user-written alias clause, if any */
 	Alias	   *alias pg_node_attr(query_jumble_ignore);
-	/* expanded reference names */
-	Alias	   *eref pg_node_attr(query_jumble_ignore);
+
+	/*
+	 * Expanded reference names.  This uses a custom query jumble function so
+	 * as the table name is included in the computation, not its list of
+	 * columns.
+	 */
+	Alias	   *eref pg_node_attr(custom_query_jumble);
 
 	RTEKind		rtekind;		/* see above */
 
@@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry
 	 * tables to be invalidated if the underlying table is altered.
 	 */
 	/* OID of the relation */
-	Oid			relid;
+	Oid			relid pg_node_attr(query_jumble_ignore);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704ba..62d6cfb7ac15 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
+									  RangeTblEntry *rte,
+									  Alias *expr);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * Custom query jumble function for RangeTblEntry.eref.
+ */
+static void
+_jumbleRangeTblEntry_eref(JumbleState *jstate,
+						  RangeTblEntry *rte,
+						  Alias *expr)
+{
+	JUMBLE_FIELD(type);
+
+	/*
+	 * This includes only the table name, the list of column names is ignored.
+	 */
+	JUMBLE_STRING(aliasname);
+}
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index f4e384e95aea..679381f00607 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -675,8 +675,13 @@ calls | 2
    things, the internal object identifiers appearing in this representation.
    This has some counterintuitive implications.  For example,
    <filename>pg_stat_statements</filename> will consider two apparently-identical
-   queries to be distinct, if they reference a table that was dropped
-   and recreated between the executions of the two queries.
+   queries to be distinct, if they reference for example a function that was
+   dropped and recreated between the executions of the two queries.
+   Conversely, if a table is dropped and recreated between the
+   executions of queries, two apparently-identical queries may be
+   considered the same. However, if the alias for a table is different
+   for semantically similar queries, these queries will be considered
+   distinct.
    The hashing process is also sensitive to differences in
    machine architecture and other facets of the platform.
    Furthermore, it is not safe to assume that <structfield>queryid</structfield>
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 708c6b0e9c41..1eebc2898ab9 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -433,11 +433,10 @@ COMMIT;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     1 | SELECT * FROM temp_t
-     1 | SELECT * FROM temp_t
+     2 | SELECT * FROM temp_t
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(4 rows)
+(3 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
@@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1
-     9 | SELECT a FROM tab_search_diff_2 AS t1
-     1 | SELECT a, b FROM pgss_schema_1.tab_search_same
-     3 | SELECT a, b FROM tab_search_same
+     8 | SELECT a FROM tab_search_diff_2
+     4 | SELECT a FROM tab_search_diff_2 AS t1
+     4 | SELECT a, b FROM tab_search_same
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
-     1 | SELECT count(*) FROM pgss_schema_1.tab_search_same
-     1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1
-     3 | SELECT count(*) FROM tab_search_diff_1
+     4 | SELECT count(*) FROM tab_search_diff_1
      4 | SELECT count(*) FROM tab_search_diff_2
-     3 | SELECT count(*) FROM tab_search_same
+     4 | SELECT count(*) FROM tab_search_same
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(11 rows)
+(8 rows)
 
 DROP SCHEMA pgss_schema_1 CASCADE;
 NOTICE:  drop cascades to 3 other objects
-- 
2.49.0



  [application/pgp-signature] signature.asc (833B, 3-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-25 23:24  Tom Lane <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Tom Lane @ 2025-03-25 23:24 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Michael Paquier <[email protected]> writes:
> [ v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch ]

Couple of minor review comments ...

* In 5ac462e2b, this bit:

        # node type
-       if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
+       if ($query_jumble_custom)
+       {
+           # Custom function that applies to one field of a node.
+           print $jff "\tJUMBLE_CUSTOM($n, $f);\n";
+       }
+       elsif (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)

fails to honor $query_jumble_ignore as the other if-branches do.
Perhaps it's unlikely that a node would have both query_jumble_custom
and query_jumble_ignore annotations, but still, the script would emit
completely incorrect code if it did.  Also, the "# node type" comment
should probably be moved down to within the first "elsif" block.

* In the v6 patch, this doesn't read very smoothly:

+     * Expanded reference names.  This uses a custom query jumble function so
+     * as the table name is included in the computation, not its list of
+     * columns.

Perhaps better

+     * Expanded reference names.  This uses a custom query jumble function so
+     * that the table name is included in the computation, but not its list of
+     * columns.

* Also, here:

+   considered the same. However, if the alias for a table is different
+   for semantically similar queries, these queries will be considered
+   distinct.

I'd change "semantically similar queries" to "otherwise-similar
queries"; I think writing "semantically" will just confuse people.

Otherwise LGTM.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-26 00:34  Michael Paquier <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 19+ messages in thread

From: Michael Paquier @ 2025-03-26 00:34 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Mar 25, 2025 at 07:24:20PM -0400, Tom Lane wrote:
> fails to honor $query_jumble_ignore as the other if-branches do.
> Perhaps it's unlikely that a node would have both query_jumble_custom
> and query_jumble_ignore annotations, but still, the script would emit
> completely incorrect code if it did.  Also, the "# node type" comment
> should probably be moved down to within the first "elsif" block.

Oops, sorry about that.  Fixed both of these in 27ee6ede6bc9.

> I'd change "semantically similar queries" to "otherwise-similar
> queries"; I think writing "semantically" will just confuse people.

If I get the difference right, semantically would apply to concepts
related to linguistics, but that's not what we have here, so you are
using a more general term.

Thanks for the suggestions.
--
Michael


Attachments:

  [text/x-diff] v7-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (5.9K, 2-v7-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch)
  download | inline diff:
From 30f8066989eac6f8c72034d3fb5150368c2821a9 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Wed, 26 Mar 2025 09:17:41 +0900
Subject: [PATCH v7] Add custom query jumble function for RangeTblEntry.eref

---
 src/include/nodes/parsenodes.h                | 11 +++++++---
 src/backend/nodes/queryjumblefuncs.c          | 19 ++++++++++++++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++++++--
 .../pg_stat_statements/expected/select.out    | 20 ++++++++-----------
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf2..df331b1c0d99 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry
 	 */
 	/* user-written alias clause, if any */
 	Alias	   *alias pg_node_attr(query_jumble_ignore);
-	/* expanded reference names */
-	Alias	   *eref pg_node_attr(query_jumble_ignore);
+
+	/*
+	 * Expanded reference names.  This uses a custom query jumble function so
+	 * that the table name is included in the computation, but not its list of
+	 * columns.
+	 */
+	Alias	   *eref pg_node_attr(custom_query_jumble);
 
 	RTEKind		rtekind;		/* see above */
 
@@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry
 	 * tables to be invalidated if the underlying table is altered.
 	 */
 	/* OID of the relation */
-	Oid			relid;
+	Oid			relid pg_node_attr(query_jumble_ignore);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index f8b0f91704ba..62d6cfb7ac15 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry_eref(JumbleState *jstate,
+									  RangeTblEntry *rte,
+									  Alias *expr);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * Custom query jumble function for RangeTblEntry.eref.
+ */
+static void
+_jumbleRangeTblEntry_eref(JumbleState *jstate,
+						  RangeTblEntry *rte,
+						  Alias *expr)
+{
+	JUMBLE_FIELD(type);
+
+	/*
+	 * This includes only the table name, the list of column names is ignored.
+	 */
+	JUMBLE_STRING(aliasname);
+}
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index f4e384e95aea..625b84ebfefd 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -675,8 +675,13 @@ calls | 2
    things, the internal object identifiers appearing in this representation.
    This has some counterintuitive implications.  For example,
    <filename>pg_stat_statements</filename> will consider two apparently-identical
-   queries to be distinct, if they reference a table that was dropped
-   and recreated between the executions of the two queries.
+   queries to be distinct, if they reference for example a function that was
+   dropped and recreated between the executions of the two queries.
+   Conversely, if a table is dropped and recreated between the
+   executions of queries, two apparently-identical queries may be
+   considered the same. However, if the alias for a table is different
+   for otherwise-similar queries, these queries will be considered
+   distinct.
    The hashing process is also sensitive to differences in
    machine architecture and other facets of the platform.
    Furthermore, it is not safe to assume that <structfield>queryid</structfield>
diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 708c6b0e9c41..1eebc2898ab9 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -433,11 +433,10 @@ COMMIT;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     1 | SELECT * FROM temp_t
-     1 | SELECT * FROM temp_t
+     2 | SELECT * FROM temp_t
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(4 rows)
+(3 rows)
 
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t 
@@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2;
 SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
  calls |                                 query                                  
 -------+------------------------------------------------------------------------
-     3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1
-     9 | SELECT a FROM tab_search_diff_2 AS t1
-     1 | SELECT a, b FROM pgss_schema_1.tab_search_same
-     3 | SELECT a, b FROM tab_search_same
+     8 | SELECT a FROM tab_search_diff_2
+     4 | SELECT a FROM tab_search_diff_2 AS t1
+     4 | SELECT a, b FROM tab_search_same
      0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
-     1 | SELECT count(*) FROM pgss_schema_1.tab_search_same
-     1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1
-     3 | SELECT count(*) FROM tab_search_diff_1
+     4 | SELECT count(*) FROM tab_search_diff_1
      4 | SELECT count(*) FROM tab_search_diff_2
-     3 | SELECT count(*) FROM tab_search_same
+     4 | SELECT count(*) FROM tab_search_same
      1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
-(11 rows)
+(8 rows)
 
 DROP SCHEMA pgss_schema_1 CASCADE;
 NOTICE:  drop cascades to 3 other objects
-- 
2.49.0



  [application/pgp-signature] signature.asc (833B, 3-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-26 00:56  Sami Imseih <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 19+ messages in thread

From: Sami Imseih @ 2025-03-26 00:56 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>


> If I get the difference right, semantically would apply to concepts
> related to linguistics, but that's not what we have here, so you are
> using a more general term.

FWIW, the pg_stat_statements docs in a few places refer to 
queries that may look different but have the same meaning 
as “semantically equivalent”, this is why I used the same 
terminology here.  But, I have no issue with the simplified
rewrite either.

The patch LGTM as well. 

Regards,


Sami Imseih 
Amazon Web Services (AWS)





^ permalink  raw  reply  [nested|flat] 19+ messages in thread

end of thread, other threads:[~2025-03-26 00:56 UTC | newest]

Thread overview: 19+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-25 02:38 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Sami Imseih <[email protected]>
2025-03-25 02:56 ` Tom Lane <[email protected]>
2025-03-25 03:30   ` Sami Imseih <[email protected]>
2025-03-25 03:53     ` Michael Paquier <[email protected]>
2025-03-25 04:30       ` Sami Imseih <[email protected]>
2025-03-25 04:32     ` Tom Lane <[email protected]>
2025-03-25 05:09       ` Julien Rouhaud <[email protected]>
2025-03-25 05:17         ` Tom Lane <[email protected]>
2025-03-25 05:28           ` Julien Rouhaud <[email protected]>
2025-03-25 03:51 ` Michael Paquier <[email protected]>
2025-03-25 06:09   ` Lukas Fittl <[email protected]>
2025-03-25 07:02     ` Michael Paquier <[email protected]>
2025-03-25 15:41       ` Christoph Berg <[email protected]>
2025-03-25 16:58         ` Sami Imseih <[email protected]>
2025-03-25 23:10           ` Michael Paquier <[email protected]>
2025-03-25 23:24             ` Tom Lane <[email protected]>
2025-03-26 00:34               ` Michael Paquier <[email protected]>
2025-03-26 00:56                 ` Sami Imseih <[email protected]>
2025-03-25 14:59     ` Sami Imseih <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox