public inbox for [email protected]help / color / mirror / Atom feed
Re: query_id: jumble names of temp tables for better pg_stat_statement UX 19+ messages / 6 participants [nested] [flat]
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 02:38 Sami Imseih <[email protected]> 0 siblings, 2 replies; 19+ messages in thread From: Sami Imseih @ 2025-03-25 02:38 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > select * from foo s1; > select * from foo s2; ah, thanks for pointing this out. Not as good of a workaround as a comment since one must change aliases, but at least there is a workaround... > As against this, there is probably also a set of people who would > *like* identical queries on identically-rowtyped tables in different > schemas to be merged. Right now they have no way to make that happen. I agree that some may want stats to merge for the same tables, and others may not. I will suggest this with some hesitation, but why not make this behavior configurable via a GUC? We recently introduced query_id_squash_values for controlling the merge of an IN list, maybe this is another queryId behavior we should provide a configuration for? -- Sami Imseih Amazon Web Services (AWS) ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 02:56 Tom Lane <[email protected]> parent: Sami Imseih <[email protected]> 1 sibling, 1 reply; 19+ messages in thread From: Tom Lane @ 2025-03-25 02:56 UTC (permalink / raw) To: Sami Imseih <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Sami Imseih <[email protected]> writes: > I agree that some may want stats to merge for the same tables, > and others may not. I will suggest this with some hesitation, but why not > make this behavior configurable via a GUC? > We recently introduced query_id_squash_values for controlling > the merge of an IN list, maybe this is another queryId behavior we should > provide a configuration for? I don't like that GUC and I would not like this one either. We learned years ago that GUCs that change query semantics are a bad idea, but apparently now we have hackers who need to relearn that lesson the hard way. (Admittedly, this isn't quite *query* semantics, which perhaps lessens the blast radius. But I think we're still going to regret query_id_squash_values.) regards, tom lane ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 03:30 Sami Imseih <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 2 replies; 19+ messages in thread From: Sami Imseih @ 2025-03-25 03:30 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > Sami Imseih <[email protected]> writes: > > I agree that some may want stats to merge for the same tables, > > and others may not. I will suggest this with some hesitation, but why not > > make this behavior configurable via a GUC? > > We recently introduced query_id_squash_values for controlling > > the merge of an IN list, maybe this is another queryId behavior we should > > provide a configuration for? > > I don't like that GUC and I would not like this one either. We > learned years ago that GUCs that change query semantics are a bad > idea, but apparently now we have hackers who need to relearn that > lesson the hard way. (Admittedly, this isn't quite *query* semantics, > which perhaps lessens the blast radius. But I think we're still going > to regret query_id_squash_values.) query_id_squash_values has a much weaker argument to exist than a guc to control the use of alias vs OID. Why would anyone not want to squash the IN list? maybe we should revisit this decision in that thread. With that said, if everyone is OK with the behavior change of pg_s_s with this patch, I will concede. FWIW, I do think this for most cases, the proposed behavior is desirable. I just worry about the users who rely on the OID being jumbled behavior, and have no way to revery. -- Sami Imseih Amazon Web Services (AWS) ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 03:51 Michael Paquier <[email protected]> parent: Sami Imseih <[email protected]> 1 sibling, 1 reply; 19+ messages in thread From: Michael Paquier @ 2025-03-25 03:51 UTC (permalink / raw) To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Mon, Mar 24, 2025 at 09:38:35PM -0500, Sami Imseih wrote: > > select * from foo s1; > > select * from foo s2; > > ah, thanks for pointing this out. Not as good of a workaround as > a comment since one must change aliases, but at least there is > a workaround... Exactly. Like Tom I'm not really worried about the proposal, but of course I could prove to be wrong. I am ready to assume that bloat in pgss entries caused by temp tables is a more common case. So let's move on with this proposal, before the gong. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 03:53 Michael Paquier <[email protected]> parent: Sami Imseih <[email protected]> 1 sibling, 1 reply; 19+ messages in thread From: Michael Paquier @ 2025-03-25 03:53 UTC (permalink / raw) To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Mon, Mar 24, 2025 at 10:30:59PM -0500, Sami Imseih wrote: >> Sami Imseih <[email protected]> writes: >>> I agree that some may want stats to merge for the same tables, >>> and others may not. I will suggest this with some hesitation, but why not >>> make this behavior configurable via a GUC? >>> We recently introduced query_id_squash_values for controlling >>> the merge of an IN list, maybe this is another queryId behavior we should >>> provide a configuration for? >> >> I don't like that GUC and I would not like this one either. We >> learned years ago that GUCs that change query semantics are a bad >> idea, but apparently now we have hackers who need to relearn that >> lesson the hard way. (Admittedly, this isn't quite *query* semantics, >> which perhaps lessens the blast radius. But I think we're still going >> to regret query_id_squash_values.) > > query_id_squash_values has a much weaker argument to exist than a > guc to control the use of alias vs OID. Why would anyone not want > to squash the IN list? maybe we should revisit this decision in that thread. This part of the thread is digressing, but I'd on the side of removing entirely the GUC and make the grouping of IN values the default. We still have time to discuss that during the beta cycle, so let's do so on its related thread. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 04:30 Sami Imseih <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 0 replies; 19+ messages in thread From: Sami Imseih @ 2025-03-25 04:30 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > Exactly. Like Tom I'm not really worried about the proposal, but of > course I could prove to be wrong. I am ready to assume that bloat in > pgss entries caused by temp tables is a more common case. I ran installcheck-parallel with the patch and without the patch 3x, and the reduction in pg_s_s bloat is visible. The same entries are reused even when the regression database is recreated, as expected. ## without patch run installcheck-parallel 3 times postgres=# select count(distinct queryid), count(*) from pg_stat_statements; count | count -------+------- 41630 | 74776 (1 row) ## with patch run installcheck-parallel 3 times postgres=# select count(distinct queryid), count(*) from pg_stat_statements; count | count -------+------- 26301 | 73030 (1 row) >This part of the thread is digressing, but I'd on the side of removing > entirely the GUC and make the grouping of IN values the default. We > still have time to discuss that during the beta cycle, so let's do so > on its related thread. I will do that tomorrow in that thread. It would not make sense to introduce a GUC for that behavior and not for this. So, I am glad we discussed this here. Looking at the patches, I could not find anything else besides what was pointed out by Christoph earlier. -- Sami Imseih Amazon Web Services (AWS) ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 04:32 Tom Lane <[email protected]> parent: Sami Imseih <[email protected]> 1 sibling, 1 reply; 19+ messages in thread From: Tom Lane @ 2025-03-25 04:32 UTC (permalink / raw) To: Sami Imseih <[email protected]>; +Cc: Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Sami Imseih <[email protected]> writes: >> I don't like that GUC and I would not like this one either. We >> learned years ago that GUCs that change query semantics are a bad >> idea, but apparently now we have hackers who need to relearn that >> lesson the hard way. > query_id_squash_values has a much weaker argument to exist than a > guc to control the use of alias vs OID. Why would anyone not want > to squash the IN list? maybe we should revisit this decision in that thread. I'm not opining one way or the other on whether squashing an IN list is desirable. My point is that a GUC is a poor way to make --- or really, avoid making --- such decisions. The reasons we took away from previous experiments with semantics-determing GUCs were: 1. The scope of effect of a GUC is seldom what you want for such things. There are going to be some queries in which you want behavior A, and some in which you want behavior B, and in the worst case you want different behaviors in different parts of the same query. It's really painful to make that happen. 2. Tools that are not entitled to set the value of the GUC are forced to be prepared to cope with any setting. That can be anywhere from painful to impossible. For the specific context of controlling how query jumbling happens, there's still another problem: the actual statement-merging behavior of pg_stat_statements will depend on the totality of settings of the GUC across the entire system. It's not very clear to me what will happen if different sessions use different settings, much less if people change the setting intra-session; but I bet a lot of people will find it surprising. 62d712ecf did no one any favors by marking that GUC USERSET rather than something that would prevail system-wide. All of these remarks apply with equal force to anything that changes the behavior of query-jumbling, no matter the specific details. regards, tom lane ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 05:09 Julien Rouhaud <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Julien Rouhaud @ 2025-03-25 05:09 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Hi, On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote: > > I'm not opining one way or the other on whether squashing an IN list > is desirable. My point is that a GUC is a poor way to make --- or > really, avoid making --- such decisions. The reasons we took away > from previous experiments with semantics-determing GUCs were: > > 1. The scope of effect of a GUC is seldom what you want for such > things. There are going to be some queries in which you want behavior > A, and some in which you want behavior B, and in the worst case you > want different behaviors in different parts of the same query. It's > really painful to make that happen. > > 2. Tools that are not entitled to set the value of the GUC are forced > to be prepared to cope with any setting. That can be anywhere from > painful to impossible. Didn't that ship already sailed in pg14 when we allowed generating custom query_id? Now: > For the specific context of controlling how query jumbling happens, > there's still another problem: the actual statement-merging behavior of > pg_stat_statements will depend on the totality of settings of the GUC > across the entire system. It's not very clear to me what will happen > if different sessions use different settings, much less if people > change the setting intra-session; but I bet a lot of people will find > it surprising. 62d712ecf did no one any favors by marking that GUC > USERSET rather than something that would prevail system-wide. One of the requirement for the custom query_id was that you shouldn't be allowed to change the algorithm dynamically, at least not unless a superuser agrees to maybe break everything, which is why compute_query_id is marked as PGC_SUSET. I think that the same reasoning should apply here and if the GUC is kept it should be at least at that level. ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 05:17 Tom Lane <[email protected]> parent: Julien Rouhaud <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Tom Lane @ 2025-03-25 05:17 UTC (permalink / raw) To: Julien Rouhaud <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Julien Rouhaud <[email protected]> writes: > On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote: >> 2. Tools that are not entitled to set the value of the GUC are forced >> to be prepared to cope with any setting. That can be anywhere from >> painful to impossible. > Didn't that ship already sailed in pg14 when we allowed generating custom > query_id? Up to a point, perhaps. If I'm writing some kind of tool that digests pg_stat_statements results, I think I'm entitled to disregard the possibility that somebody is using a custom query_id that behaves in ways I'm not expecting --- or at least, fixing my code for that is their problem not mine. But it's much harder to take that attitude for things that are built into core PG. >> For the specific context of controlling how query jumbling happens, >> there's still another problem: the actual statement-merging behavior of >> pg_stat_statements will depend on the totality of settings of the GUC >> across the entire system. > One of the requirement for the custom query_id was that you shouldn't be > allowed to change the algorithm dynamically, at least not unless a superuser > agrees to maybe break everything, which is why compute_query_id is marked as > PGC_SUSET. I think that the same reasoning should apply here and if the GUC is > kept it should be at least at that level. Fully agreed. (Even SUSET is debatable in this situation, but I'm okay with it on the principle that superusers are expected to know what they're doing and accept the consequences.) regards, tom lane ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 05:28 Julien Rouhaud <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 0 replies; 19+ messages in thread From: Julien Rouhaud @ 2025-03-25 05:28 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Michael Paquier <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Tue, Mar 25, 2025 at 01:17:22AM -0400, Tom Lane wrote: > Julien Rouhaud <[email protected]> writes: > > On Tue, Mar 25, 2025 at 12:32:05AM -0400, Tom Lane wrote: > >> 2. Tools that are not entitled to set the value of the GUC are forced > >> to be prepared to cope with any setting. That can be anywhere from > >> painful to impossible. > > > Didn't that ship already sailed in pg14 when we allowed generating custom > > query_id? > > Up to a point, perhaps. If I'm writing some kind of tool that digests > pg_stat_statements results, I think I'm entitled to disregard the > possibility that somebody is using a custom query_id that behaves in > ways I'm not expecting --- or at least, fixing my code for that is > their problem not mine. But it's much harder to take that attitude > for things that are built into core PG. I see, that's fair. ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 06:09 Lukas Fittl <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 2 replies; 19+ messages in thread From: Lukas Fittl @ 2025-03-25 06:09 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Mon, Mar 24, 2025 at 8:51 PM Michael Paquier <[email protected]> wrote: > On Mon, Mar 24, 2025 at 09:38:35PM -0500, Sami Imseih wrote: > > > select * from foo s1; > > > select * from foo s2; > > > > ah, thanks for pointing this out. Not as good of a workaround as > > a comment since one must change aliases, but at least there is > > a workaround... > > Exactly. Like Tom I'm not really worried about the proposal, but of > course I could prove to be wrong. I am ready to assume that bloat in > pgss entries caused by temp tables is a more common case. > For what its worth, +1 on the current proposal in this thread (and doing it without a GUC), i.e. merging a query that references the same table alias, ignoring different schemas. In the context of the earlier referenced one-schema-per-customer workloads: In my experience these often not work well with pg_stat_statements today because of their own bloat problem, just like with temp tables. You quickly have way too many unique entries, and your query text file accumulates a lot of duplicative entries (since the same query text gets repeated in the text file, since its queryid is different), to the point that you can't monitor your workload at all anymore. Thanks, Lukas -- Lukas Fittl ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 07:02 Michael Paquier <[email protected]> parent: Lukas Fittl <[email protected]> 1 sibling, 1 reply; 19+ messages in thread From: Michael Paquier @ 2025-03-25 07:02 UTC (permalink / raw) To: Lukas Fittl <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Mon, Mar 24, 2025 at 11:09:06PM -0700, Lukas Fittl wrote: > For what its worth, +1 on the current proposal in this thread (and doing it > without a GUC), i.e. merging a query that references the same table alias, > ignoring different schemas. Thanks for the feedback. I have looked again at the first patch to add custom_query_jumble as a node field attribute, adjusted some comments, and applied it as 5ac462e2b7ac. Attached is the second one, with more tests coverage with attribute aliases (these being ignored exists in stable branches, but why not while on it) and table aliases, and the fixes for the issues pointed out by Christoph. I'll double-check all that again tomorrow. Please find an updated version attached for now. -- Michael Attachments: [text/x-diff] v5-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (11.0K, 2-v5-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch) download | inline diff: From 68d363fbee484b40308a00a85329364ff0901e9b Mon Sep 17 00:00:00 2001 From: Michael Paquier <[email protected]> Date: Tue, 25 Mar 2025 15:40:10 +0900 Subject: [PATCH v5] Add custom query jumble function for RangeTblEntry.eref --- src/include/nodes/parsenodes.h | 11 +- src/backend/nodes/queryjumblefuncs.c | 19 ++ .../pg_stat_statements/expected/select.out | 236 ++++++++++++++++++ contrib/pg_stat_statements/sql/select.sql | 69 +++++ 4 files changed, 332 insertions(+), 3 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 23c9e3c5abf2..a87f949b389e 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry */ /* user-written alias clause, if any */ Alias *alias pg_node_attr(query_jumble_ignore); - /* expanded reference names */ - Alias *eref pg_node_attr(query_jumble_ignore); + + /* + * Expanded reference names. This uses a custom query jumble function so + * as the table name is included in the computation, not its list of + * columns. + */ + Alias *eref pg_node_attr(custom_query_jumble); RTEKind rtekind; /* see above */ @@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry * tables to be invalidated if the underlying table is altered. */ /* OID of the relation */ - Oid relid; + Oid relid pg_node_attr(query_jumble_ignore); /* inheritance requested? */ bool inh; /* relation kind (see pg_class.relkind) */ diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index f8b0f91704ba..62d6cfb7ac15 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements); static void _jumbleA_Const(JumbleState *jstate, Node *node); static void _jumbleList(JumbleState *jstate, Node *node); static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node); +static void _jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr); /* * Given a possibly multi-statement source string, confine our attention to the @@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node) JUMBLE_FIELD(is_local); JUMBLE_LOCATION(location); } + +/* + * Custom query jumble function for RangeTblEntry.eref. + */ +static void +_jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr) +{ + JUMBLE_FIELD(type); + + /* + * This includes only the table name, the list of column names is ignored. + */ + JUMBLE_STRING(aliasname); +} diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out index 37a30af034a6..bf05d521e866 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -413,3 +413,239 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t; t (1 row) +-- Temporary tables, grouped together. +BEGIN; + CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP; + SELECT * FROM temp_t; + id +---- +(0 rows) + +COMMIT; +BEGIN; + CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP; + SELECT * FROM temp_t; + id +---- +(0 rows) + +COMMIT; +SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; + calls | query +-------+------------------------------------------------------------------------ + 2 | SELECT * FROM temp_t + 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" + 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t +(3 rows) + +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +-- search_path with various schemas and temporary tables +CREATE SCHEMA pgss_schema_1; +CREATE SCHEMA pgss_schema_2; +-- Same attributes. +CREATE TABLE pgss_schema_1.tab_search_same (a int, b int); +CREATE TABLE pgss_schema_2.tab_search_same (a int, b int); +CREATE TEMP TABLE tab_search_same (a int, b int); +-- Different number of attributes, mapping types +CREATE TABLE pgss_schema_1.tab_search_diff_1 (a int); +CREATE TABLE pgss_schema_2.tab_search_diff_1 (a int, b int); +CREATE TEMP TABLE tab_search_diff_1 (a int, b int, c int); +-- Same number of attributes, different types +CREATE TABLE pgss_schema_1.tab_search_diff_2 (a int); +CREATE TABLE pgss_schema_2.tab_search_diff_2 (a text); +CREATE TEMP TABLE tab_search_diff_2 (a bigint); +-- First permanent schema +SET search_path = 'pgss_schema_1'; +SELECT count(*) FROM tab_search_same; + count +------- + 0 +(1 row) + +SELECT a, b FROM tab_search_same; + a | b +---+--- +(0 rows) + +SELECT count(*) FROM tab_search_diff_1; + count +------- + 0 +(1 row) + +SELECT count(*) FROM tab_search_diff_2; + count +------- + 0 +(1 row) + +SELECT a FROM tab_search_diff_2 AS t1; + a +--- +(0 rows) + +SELECT a FROM tab_search_diff_2; + a +--- +(0 rows) + +SELECT a AS a1 FROM tab_search_diff_2; + a1 +---- +(0 rows) + +-- Second permanent schema +SET search_path = 'pgss_schema_2'; +SELECT count(*) FROM tab_search_same; + count +------- + 0 +(1 row) + +SELECT a, b FROM tab_search_same; + a | b +---+--- +(0 rows) + +SELECT count(*) FROM tab_search_diff_1; + count +------- + 0 +(1 row) + +SELECT count(*) FROM tab_search_diff_2; + count +------- + 0 +(1 row) + +SELECT a FROM tab_search_diff_2 AS t1; + a +--- +(0 rows) + +SELECT a FROM tab_search_diff_2; + a +--- +(0 rows) + +SELECT a AS a1 FROM tab_search_diff_2; + a1 +---- +(0 rows) + +-- Temporary schema +SET search_path = 'pg_temp'; +SELECT count(*) FROM tab_search_same; + count +------- + 0 +(1 row) + +SELECT a, b FROM tab_search_same; + a | b +---+--- +(0 rows) + +SELECT count(*) FROM tab_search_diff_1; + count +------- + 0 +(1 row) + +SELECT count(*) FROM tab_search_diff_2; + count +------- + 0 +(1 row) + +SELECT a FROM tab_search_diff_2 AS t1; + a +--- +(0 rows) + +SELECT a FROM tab_search_diff_2; + a +--- +(0 rows) + +SELECT a AS a1 FROM tab_search_diff_2; + a1 +---- +(0 rows) + +RESET search_path; +-- Schema qualifications +SELECT count(*) FROM pgss_schema_1.tab_search_same; + count +------- + 0 +(1 row) + +SELECT a, b FROM pgss_schema_1.tab_search_same; + a | b +---+--- +(0 rows) + +SELECT count(*) FROM pgss_schema_2.tab_search_diff_1; + count +------- + 0 +(1 row) + +SELECT count(*) FROM pg_temp.tab_search_diff_2; + count +------- + 0 +(1 row) + +SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1; + a +--- +(0 rows) + +SELECT a FROM pgss_schema_2.tab_search_diff_2; + a +--- +(0 rows) + +SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2; + a1 +---- +(0 rows) + +SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; + calls | query +-------+------------------------------------------------------------------------ + 8 | SELECT a FROM tab_search_diff_2 + 4 | SELECT a FROM tab_search_diff_2 AS t1 + 4 | SELECT a, b FROM tab_search_same + 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" + 4 | SELECT count(*) FROM tab_search_diff_1 + 4 | SELECT count(*) FROM tab_search_diff_2 + 4 | SELECT count(*) FROM tab_search_same + 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t +(8 rows) + +DROP SCHEMA pgss_schema_1 CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table pgss_schema_1.tab_search_same +drop cascades to table pgss_schema_1.tab_search_diff_1 +drop cascades to table pgss_schema_1.tab_search_diff_2 +DROP SCHEMA pgss_schema_2 CASCADE; +NOTICE: drop cascades to 3 other objects +DETAIL: drop cascades to table pgss_schema_2.tab_search_same +drop cascades to table pgss_schema_2.tab_search_diff_1 +drop cascades to table pgss_schema_2.tab_search_diff_2 +DROP TABLE tab_search_same, tab_search_diff_1, tab_search_diff_2; +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql index e0be58d5e24b..fbed557ec369 100644 --- a/contrib/pg_stat_statements/sql/select.sql +++ b/contrib/pg_stat_statements/sql/select.sql @@ -148,3 +148,72 @@ SELECT ( SELECT COUNT(*) FROM pg_stat_statements WHERE query LIKE '%SELECT GROUPING%'; SELECT pg_stat_statements_reset() IS NOT NULL AS t; + +-- Temporary tables, grouped together. +BEGIN; + CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP; + SELECT * FROM temp_t; +COMMIT; +BEGIN; + CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP; + SELECT * FROM temp_t; +COMMIT; +SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + +-- search_path with various schemas and temporary tables +CREATE SCHEMA pgss_schema_1; +CREATE SCHEMA pgss_schema_2; +-- Same attributes. +CREATE TABLE pgss_schema_1.tab_search_same (a int, b int); +CREATE TABLE pgss_schema_2.tab_search_same (a int, b int); +CREATE TEMP TABLE tab_search_same (a int, b int); +-- Different number of attributes, mapping types +CREATE TABLE pgss_schema_1.tab_search_diff_1 (a int); +CREATE TABLE pgss_schema_2.tab_search_diff_1 (a int, b int); +CREATE TEMP TABLE tab_search_diff_1 (a int, b int, c int); +-- Same number of attributes, different types +CREATE TABLE pgss_schema_1.tab_search_diff_2 (a int); +CREATE TABLE pgss_schema_2.tab_search_diff_2 (a text); +CREATE TEMP TABLE tab_search_diff_2 (a bigint); +-- First permanent schema +SET search_path = 'pgss_schema_1'; +SELECT count(*) FROM tab_search_same; +SELECT a, b FROM tab_search_same; +SELECT count(*) FROM tab_search_diff_1; +SELECT count(*) FROM tab_search_diff_2; +SELECT a FROM tab_search_diff_2 AS t1; +SELECT a FROM tab_search_diff_2; +SELECT a AS a1 FROM tab_search_diff_2; +-- Second permanent schema +SET search_path = 'pgss_schema_2'; +SELECT count(*) FROM tab_search_same; +SELECT a, b FROM tab_search_same; +SELECT count(*) FROM tab_search_diff_1; +SELECT count(*) FROM tab_search_diff_2; +SELECT a FROM tab_search_diff_2 AS t1; +SELECT a FROM tab_search_diff_2; +SELECT a AS a1 FROM tab_search_diff_2; +-- Temporary schema +SET search_path = 'pg_temp'; +SELECT count(*) FROM tab_search_same; +SELECT a, b FROM tab_search_same; +SELECT count(*) FROM tab_search_diff_1; +SELECT count(*) FROM tab_search_diff_2; +SELECT a FROM tab_search_diff_2 AS t1; +SELECT a FROM tab_search_diff_2; +SELECT a AS a1 FROM tab_search_diff_2; +RESET search_path; +-- Schema qualifications +SELECT count(*) FROM pgss_schema_1.tab_search_same; +SELECT a, b FROM pgss_schema_1.tab_search_same; +SELECT count(*) FROM pgss_schema_2.tab_search_diff_1; +SELECT count(*) FROM pg_temp.tab_search_diff_2; +SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1; +SELECT a FROM pgss_schema_2.tab_search_diff_2; +SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2; +SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; +DROP SCHEMA pgss_schema_1 CASCADE; +DROP SCHEMA pgss_schema_2 CASCADE; +DROP TABLE tab_search_same, tab_search_diff_1, tab_search_diff_2; +SELECT pg_stat_statements_reset() IS NOT NULL AS t; -- 2.49.0 [application/pgp-signature] signature.asc (833B, 3-signature.asc) download ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 14:59 Sami Imseih <[email protected]> parent: Lukas Fittl <[email protected]> 1 sibling, 0 replies; 19+ messages in thread From: Sami Imseih @ 2025-03-25 14:59 UTC (permalink / raw) To: Lukas Fittl <[email protected]>; +Cc: Michael Paquier <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > In my experience these often not work well with pg_stat_statements today because > of their own bloat problem, just like with temp tables. You quickly have way too many > unique entries, and your query text file accumulates a lot of duplicative entries > (since the same query text gets repeated in the text file, since its queryid is different), > to the point that you can't monitor your workload at all anymore. That is true in terms of pg_stat_statements, and I have seen users have to increase pg_stat_statements.max to something much higher to avoid the bloat and constant deallocs/GC. But, besides pg_stat_statements, queryId is used to group queries for database load monitoring ( pg_stat_activity sampling). As of now, different schemas are tracked separately, but with this change they will be merged. This may come as a surprise to use-cases that rely on the existing behavior. But I do agree that pg_s_s bloat is a big pain point, so this change should be positive overall. Let's see if there are enough complaints to force us to reconsider. -- Sami Imseih Amazon Web Services (AwS) ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 15:41 Christoph Berg <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Christoph Berg @ 2025-03-25 15:41 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Lukas Fittl <[email protected]>; Sami Imseih <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Re: Michael Paquier > Attached is the second one, with more tests coverage with attribute > aliases (these being ignored exists in stable branches, but why not > while on it) and table aliases, and the fixes for the issues pointed > out by Christoph. I'll double-check all that again tomorrow. Please > find an updated version attached for now. Looks good to me. Christoph ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 16:58 Sami Imseih <[email protected]> parent: Christoph Berg <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Sami Imseih @ 2025-03-25 16:58 UTC (permalink / raw) To: Christoph Berg <[email protected]>; +Cc: Michael Paquier <[email protected]>; Lukas Fittl <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > > Attached is the second one, with more tests coverage with attribute > > aliases (these being ignored exists in stable branches, but why not > > while on it) and table aliases, and the fixes for the issues pointed > > out by Christoph. I'll double-check all that again tomorrow. Please > > find an updated version attached for now. There are several parts of the doc that may no longer hold true. 1/ "Since the queryid hash value is computed on the post-parse-analysis representation of the queries, the opposite is also possible: queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings." I think this text could remain as is, because search_path still matters for things like functions, etc. """ postgres=# SET SEARCH_PATH=a; SET postgres=# explain verbose select * from test(); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 1 Query Identifier: -1813735303617154554 (3 rows) postgres=# SET SEARCH_PATH=b; SET postgres=# explain verbose select * from test(); QUERY PLAN ------------------------------------------ Result (cost=0.00..0.01 rows=1 width=4) Output: 1 Query Identifier: -3896107319863686763 (3 rows) """ 2/ "For example, pg_stat_statements will consider two apparently-identical queries to be distinct, if they reference a table that was dropped and recreated between the executions of the two queries." This is no longer true for relations, but is still true for functions. I think we should mention the caveats in a bit more detail as this change will have impact on the most common case. What about something like this? "For example, pg_stat_statements will consider two apparently-identical queries to be distinct, if they reference a function that was dropped and recreated between the executions of the two queries. Conversely, if a table is dropped and recreated between the executions of queries, two apparently-identical queries will be considered the same. However, if the alias for a table is different for semantically similar queries, these queries will be considered distinct" -- Sami Imseih Amazon Web Services (AWS) ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 23:10 Michael Paquier <[email protected]> parent: Sami Imseih <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Michael Paquier @ 2025-03-25 23:10 UTC (permalink / raw) To: Sami Imseih <[email protected]>; +Cc: Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; Tom Lane <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Tue, Mar 25, 2025 at 11:58:21AM -0500, Sami Imseih wrote: > "Since the queryid hash value is computed on the post-parse-analysis > representation of the queries, the opposite is also possible: queries with > identical texts might appear as separate entries, if they have different > meanings as a result of factors such as different search_path settings." > > I think this text could remain as is, because search_path still > matters for things like functions, etc. Yeah, I think that's OK as-is. I'm open to more improvements, including more tests for these function patterns. It's one of these areas where we should be able to tweak RangeTblFunction and apply a custom function to its funcexpr, and please note that I have no idea how complex it could become as this is a Node expression. :D Functions in a temporary schema is not something as common as temp tables, I guess, so these matter less, but they would still be a cause of bloat for monitoring in very specific workloads. > 2/ > "For example, pg_stat_statements will consider two apparently-identical > queries to be distinct, if they reference a table that was dropped and > recreated between the executions of the two queries." > > This is no longer true for relations, but is still true for functions. I think > we should mention the caveats in a bit more detail as this change > will have impact on the most common case. What about something > like this? > > "For example, pg_stat_statements will consider two apparently-identical > queries to be distinct, if they reference a function that was dropped and > recreated between the executions of the two queries. That's a bit larger than functions, but we could remain a bit more evasive, with "if they reference *for example* a function that was dropped and recreated between the executions of the two queries". Note that for DDLs, like CREATE TABLE, we also group entries with identical relation names, so we are kind of in line with the patch, not with the current docs. > Conversely, if a table is dropped and recreated between the > executions of queries, two apparently-identical queries may be > considered the same. However, if the alias for a table is different > for semantically similar queries, these queries will be considered distinct" This addition sounds like an improvement here. As this thread has proved, we had little coverage these cases in pgss, so I've applied the tests as an independent change. It is also useful to track how things change in the commit history depending on how the computation is tweaked. I've also included your doc suggestions. I feel that we could do better here, but that's a common statement anyway when it comes to the documentation. -- Michael Attachments: [text/x-diff] v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (5.9K, 2-v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch) download | inline diff: From 8ea61bb0d7d6c4dbbb40dbaedb5e751027163cfe Mon Sep 17 00:00:00 2001 From: Michael Paquier <[email protected]> Date: Wed, 26 Mar 2025 08:07:59 +0900 Subject: [PATCH v6] Add custom query jumble function for RangeTblEntry.eref --- src/include/nodes/parsenodes.h | 11 +++++++--- src/backend/nodes/queryjumblefuncs.c | 19 ++++++++++++++++++ doc/src/sgml/pgstatstatements.sgml | 9 +++++++-- .../pg_stat_statements/expected/select.out | 20 ++++++++----------- 4 files changed, 42 insertions(+), 17 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 23c9e3c5abf2..a87f949b389e 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry */ /* user-written alias clause, if any */ Alias *alias pg_node_attr(query_jumble_ignore); - /* expanded reference names */ - Alias *eref pg_node_attr(query_jumble_ignore); + + /* + * Expanded reference names. This uses a custom query jumble function so + * as the table name is included in the computation, not its list of + * columns. + */ + Alias *eref pg_node_attr(custom_query_jumble); RTEKind rtekind; /* see above */ @@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry * tables to be invalidated if the underlying table is altered. */ /* OID of the relation */ - Oid relid; + Oid relid pg_node_attr(query_jumble_ignore); /* inheritance requested? */ bool inh; /* relation kind (see pg_class.relkind) */ diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index f8b0f91704ba..62d6cfb7ac15 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements); static void _jumbleA_Const(JumbleState *jstate, Node *node); static void _jumbleList(JumbleState *jstate, Node *node); static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node); +static void _jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr); /* * Given a possibly multi-statement source string, confine our attention to the @@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node) JUMBLE_FIELD(is_local); JUMBLE_LOCATION(location); } + +/* + * Custom query jumble function for RangeTblEntry.eref. + */ +static void +_jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr) +{ + JUMBLE_FIELD(type); + + /* + * This includes only the table name, the list of column names is ignored. + */ + JUMBLE_STRING(aliasname); +} diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index f4e384e95aea..679381f00607 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -675,8 +675,13 @@ calls | 2 things, the internal object identifiers appearing in this representation. This has some counterintuitive implications. For example, <filename>pg_stat_statements</filename> will consider two apparently-identical - queries to be distinct, if they reference a table that was dropped - and recreated between the executions of the two queries. + queries to be distinct, if they reference for example a function that was + dropped and recreated between the executions of the two queries. + Conversely, if a table is dropped and recreated between the + executions of queries, two apparently-identical queries may be + considered the same. However, if the alias for a table is different + for semantically similar queries, these queries will be considered + distinct. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that <structfield>queryid</structfield> diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out index 708c6b0e9c41..1eebc2898ab9 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -433,11 +433,10 @@ COMMIT; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query -------+------------------------------------------------------------------------ - 1 | SELECT * FROM temp_t - 1 | SELECT * FROM temp_t + 2 | SELECT * FROM temp_t 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(4 rows) +(3 rows) SELECT pg_stat_statements_reset() IS NOT NULL AS t; t @@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query -------+------------------------------------------------------------------------ - 3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1 - 9 | SELECT a FROM tab_search_diff_2 AS t1 - 1 | SELECT a, b FROM pgss_schema_1.tab_search_same - 3 | SELECT a, b FROM tab_search_same + 8 | SELECT a FROM tab_search_diff_2 + 4 | SELECT a FROM tab_search_diff_2 AS t1 + 4 | SELECT a, b FROM tab_search_same 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" - 1 | SELECT count(*) FROM pgss_schema_1.tab_search_same - 1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1 - 3 | SELECT count(*) FROM tab_search_diff_1 + 4 | SELECT count(*) FROM tab_search_diff_1 4 | SELECT count(*) FROM tab_search_diff_2 - 3 | SELECT count(*) FROM tab_search_same + 4 | SELECT count(*) FROM tab_search_same 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(11 rows) +(8 rows) DROP SCHEMA pgss_schema_1 CASCADE; NOTICE: drop cascades to 3 other objects -- 2.49.0 [application/pgp-signature] signature.asc (833B, 3-signature.asc) download ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-25 23:24 Tom Lane <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Tom Lane @ 2025-03-25 23:24 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> Michael Paquier <[email protected]> writes: > [ v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch ] Couple of minor review comments ... * In 5ac462e2b, this bit: # node type - if (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/) + if ($query_jumble_custom) + { + # Custom function that applies to one field of a node. + print $jff "\tJUMBLE_CUSTOM($n, $f);\n"; + } + elsif (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/) fails to honor $query_jumble_ignore as the other if-branches do. Perhaps it's unlikely that a node would have both query_jumble_custom and query_jumble_ignore annotations, but still, the script would emit completely incorrect code if it did. Also, the "# node type" comment should probably be moved down to within the first "elsif" block. * In the v6 patch, this doesn't read very smoothly: + * Expanded reference names. This uses a custom query jumble function so + * as the table name is included in the computation, not its list of + * columns. Perhaps better + * Expanded reference names. This uses a custom query jumble function so + * that the table name is included in the computation, but not its list of + * columns. * Also, here: + considered the same. However, if the alias for a table is different + for semantically similar queries, these queries will be considered + distinct. I'd change "semantically similar queries" to "otherwise-similar queries"; I think writing "semantically" will just confuse people. Otherwise LGTM. regards, tom lane ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-26 00:34 Michael Paquier <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 19+ messages in thread From: Michael Paquier @ 2025-03-26 00:34 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Sami Imseih <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> On Tue, Mar 25, 2025 at 07:24:20PM -0400, Tom Lane wrote: > fails to honor $query_jumble_ignore as the other if-branches do. > Perhaps it's unlikely that a node would have both query_jumble_custom > and query_jumble_ignore annotations, but still, the script would emit > completely incorrect code if it did. Also, the "# node type" comment > should probably be moved down to within the first "elsif" block. Oops, sorry about that. Fixed both of these in 27ee6ede6bc9. > I'd change "semantically similar queries" to "otherwise-similar > queries"; I think writing "semantically" will just confuse people. If I get the difference right, semantically would apply to concepts related to linguistics, but that's not what we have here, so you are using a more general term. Thanks for the suggestions. -- Michael Attachments: [text/x-diff] v7-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch (5.9K, 2-v7-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch) download | inline diff: From 30f8066989eac6f8c72034d3fb5150368c2821a9 Mon Sep 17 00:00:00 2001 From: Michael Paquier <[email protected]> Date: Wed, 26 Mar 2025 09:17:41 +0900 Subject: [PATCH v7] Add custom query jumble function for RangeTblEntry.eref --- src/include/nodes/parsenodes.h | 11 +++++++--- src/backend/nodes/queryjumblefuncs.c | 19 ++++++++++++++++++ doc/src/sgml/pgstatstatements.sgml | 9 +++++++-- .../pg_stat_statements/expected/select.out | 20 ++++++++----------- 4 files changed, 42 insertions(+), 17 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 23c9e3c5abf2..df331b1c0d99 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry */ /* user-written alias clause, if any */ Alias *alias pg_node_attr(query_jumble_ignore); - /* expanded reference names */ - Alias *eref pg_node_attr(query_jumble_ignore); + + /* + * Expanded reference names. This uses a custom query jumble function so + * that the table name is included in the computation, but not its list of + * columns. + */ + Alias *eref pg_node_attr(custom_query_jumble); RTEKind rtekind; /* see above */ @@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry * tables to be invalidated if the underlying table is altered. */ /* OID of the relation */ - Oid relid; + Oid relid pg_node_attr(query_jumble_ignore); /* inheritance requested? */ bool inh; /* relation kind (see pg_class.relkind) */ diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index f8b0f91704ba..62d6cfb7ac15 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements); static void _jumbleA_Const(JumbleState *jstate, Node *node); static void _jumbleList(JumbleState *jstate, Node *node); static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node); +static void _jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr); /* * Given a possibly multi-statement source string, confine our attention to the @@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node) JUMBLE_FIELD(is_local); JUMBLE_LOCATION(location); } + +/* + * Custom query jumble function for RangeTblEntry.eref. + */ +static void +_jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr) +{ + JUMBLE_FIELD(type); + + /* + * This includes only the table name, the list of column names is ignored. + */ + JUMBLE_STRING(aliasname); +} diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index f4e384e95aea..625b84ebfefd 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -675,8 +675,13 @@ calls | 2 things, the internal object identifiers appearing in this representation. This has some counterintuitive implications. For example, <filename>pg_stat_statements</filename> will consider two apparently-identical - queries to be distinct, if they reference a table that was dropped - and recreated between the executions of the two queries. + queries to be distinct, if they reference for example a function that was + dropped and recreated between the executions of the two queries. + Conversely, if a table is dropped and recreated between the + executions of queries, two apparently-identical queries may be + considered the same. However, if the alias for a table is different + for otherwise-similar queries, these queries will be considered + distinct. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that <structfield>queryid</structfield> diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out index 708c6b0e9c41..1eebc2898ab9 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -433,11 +433,10 @@ COMMIT; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query -------+------------------------------------------------------------------------ - 1 | SELECT * FROM temp_t - 1 | SELECT * FROM temp_t + 2 | SELECT * FROM temp_t 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(4 rows) +(3 rows) SELECT pg_stat_statements_reset() IS NOT NULL AS t; t @@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query -------+------------------------------------------------------------------------ - 3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1 - 9 | SELECT a FROM tab_search_diff_2 AS t1 - 1 | SELECT a, b FROM pgss_schema_1.tab_search_same - 3 | SELECT a, b FROM tab_search_same + 8 | SELECT a FROM tab_search_diff_2 + 4 | SELECT a FROM tab_search_diff_2 AS t1 + 4 | SELECT a, b FROM tab_search_same 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C" - 1 | SELECT count(*) FROM pgss_schema_1.tab_search_same - 1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1 - 3 | SELECT count(*) FROM tab_search_diff_1 + 4 | SELECT count(*) FROM tab_search_diff_1 4 | SELECT count(*) FROM tab_search_diff_2 - 3 | SELECT count(*) FROM tab_search_same + 4 | SELECT count(*) FROM tab_search_same 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(11 rows) +(8 rows) DROP SCHEMA pgss_schema_1 CASCADE; NOTICE: drop cascades to 3 other objects -- 2.49.0 [application/pgp-signature] signature.asc (833B, 3-signature.asc) download ^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: query_id: jumble names of temp tables for better pg_stat_statement UX @ 2025-03-26 00:56 Sami Imseih <[email protected]> parent: Michael Paquier <[email protected]> 0 siblings, 0 replies; 19+ messages in thread From: Sami Imseih @ 2025-03-26 00:56 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]> > If I get the difference right, semantically would apply to concepts > related to linguistics, but that's not what we have here, so you are > using a more general term. FWIW, the pg_stat_statements docs in a few places refer to queries that may look different but have the same meaning as “semantically equivalent”, this is why I used the same terminology here. But, I have no issue with the simplified rewrite either. The patch LGTM as well. Regards, Sami Imseih Amazon Web Services (AWS) ^ permalink raw reply [nested|flat] 19+ messages in thread
end of thread, other threads:[~2025-03-26 00:56 UTC | newest] Thread overview: 19+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-03-25 02:38 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Sami Imseih <[email protected]> 2025-03-25 02:56 ` Tom Lane <[email protected]> 2025-03-25 03:30 ` Sami Imseih <[email protected]> 2025-03-25 03:53 ` Michael Paquier <[email protected]> 2025-03-25 04:30 ` Sami Imseih <[email protected]> 2025-03-25 04:32 ` Tom Lane <[email protected]> 2025-03-25 05:09 ` Julien Rouhaud <[email protected]> 2025-03-25 05:17 ` Tom Lane <[email protected]> 2025-03-25 05:28 ` Julien Rouhaud <[email protected]> 2025-03-25 03:51 ` Michael Paquier <[email protected]> 2025-03-25 06:09 ` Lukas Fittl <[email protected]> 2025-03-25 07:02 ` Michael Paquier <[email protected]> 2025-03-25 15:41 ` Christoph Berg <[email protected]> 2025-03-25 16:58 ` Sami Imseih <[email protected]> 2025-03-25 23:10 ` Michael Paquier <[email protected]> 2025-03-25 23:24 ` Tom Lane <[email protected]> 2025-03-26 00:34 ` Michael Paquier <[email protected]> 2025-03-26 00:56 ` Sami Imseih <[email protected]> 2025-03-25 14:59 ` Sami Imseih <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox