public inbox for [email protected]  
help / color / mirror / Atom feed
Re: query_id: jumble names of temp tables for better pg_stat_statement UX
10+ messages / 3 participants
[nested] [flat]

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-18 23:18  Michael Paquier <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Michael Paquier @ 2025-03-18 23:18 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Mar 18, 2025 at 05:51:54PM +0100, Christoph Berg wrote:
> I had thought about it, but figured that integers and strings are
> already separate namespaces, so hashing them shouldn't have any
> conflicts. But it's more clear to do that, so added in the new
> version:
> 
>        AppendJumble(jstate, (const unsigned char *)"pg_temp", sizeof("pg_temp"));
>        AppendJumble(jstate, (const unsigned char *)rel_name, strlen(rel_name));

Yes, I know that it's not a big deal, but it could be confusing if
somebody makes an argument about jumbling more object names for
relations.  The problem is not only limited to relations, though, as
there are other object types that can use a temp namespace like
functions, but the case of table entries should cover most of the
complaints I can imagine.

> >  typedef struct RangeTblEntry
> >  {
> > -    pg_node_attr(custom_read_write)
> > +    pg_node_attr(custom_read_write, custom_query_jumble)
> > 
> > This structure still includes some query_jumble_ignore, which are not
> > required once custom_query_jumble is added.
> 
> I would tend to keep them for documentation purposes. (The other
> custom_query_jumble functions have a much more explicit structure so
> there it is clear which fields are supposed to be jumbled.)

Fine by me as well to keep a dependency based on the fact that the
structure is rather complicated, but I'd rather document that as well
in parsenodes.h with a slightly fatter comment.  What do you think?
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-19 12:02  Christoph Berg <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Christoph Berg @ 2025-03-19 12:02 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Re: Michael Paquier
> Fine by me as well to keep a dependency based on the fact that the
> structure is rather complicated, but I'd rather document that as well
> in parsenodes.h with a slightly fatter comment.  What do you think?

You are of course right, that one-line comment was just snakeoil :D.
Now there are proper ones, thanks.

Christoph


Attachments:

  [text/x-diff] v3-0001-Jumble-temp-tables-by-name.patch (6.6K, 2-v3-0001-Jumble-temp-tables-by-name.patch)
  download | inline diff:
From 725b6c65160a44678b2c43b4af0a3dc9c281830d Mon Sep 17 00:00:00 2001
From: Christoph Berg <[email protected]>
Date: Mon, 17 Mar 2025 17:17:17 +0100
Subject: [PATCH v3] Jumble temp tables by name

Query jumbling considers everything by OID, which is fine for regular
objects. But for temp tables, which have to be recreated in each session
(or even transaction), this means that the same temp table query run
again in the next session will never be aggregated in
pg_stat_statements. Instead, the statistics are polluted with a large
number of 1-call entries.

Fix by using the temp table name instead. This has the risk of
aggregating structurally different temp tables together if they share
the same name, but practically, the queries will likely already differ
in other details anyway. And even if not, aggregating the entries in
pg_stat_statements instead of polluting the stats seems the better
choice. (The user has still the option to simply change the name of the
temp table to have the queries separated. In the old scheme, the user
does not have any chance to change behavior.)
---
 .../pg_stat_statements/expected/select.out    | 31 +++++++++++++++
 contrib/pg_stat_statements/sql/select.sql     | 12 ++++++
 src/backend/nodes/queryjumblefuncs.c          | 39 +++++++++++++++++++
 src/include/nodes/parsenodes.h                |  8 +++-
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out
index 37a30af034a..8a7e237298c 100644
--- a/contrib/pg_stat_statements/expected/select.out
+++ b/contrib/pg_stat_statements/expected/select.out
@@ -241,6 +241,37 @@ SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 (12 rows)
 
 DROP TABLE pgss_a, pgss_b CASCADE;
+-- temp tables
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; -- one query with two calls
+ calls |                                 query                                  
+-------+------------------------------------------------------------------------
+     2 | SELECT * FROM temp_t
+     0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"
+     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(3 rows)
+
 --
 -- access to pg_stat_statements_info view
 --
diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql
index e0be58d5e24..81b9d50ecec 100644
--- a/contrib/pg_stat_statements/sql/select.sql
+++ b/contrib/pg_stat_statements/sql/select.sql
@@ -90,6 +90,18 @@ SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
 
 DROP TABLE pgss_a, pgss_b CASCADE;
 
+-- temp tables
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; -- one query with two calls
+
 --
 -- access to pg_stat_statements_info view
 --
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 189bfda610a..d5b3134cc2f 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -33,6 +33,7 @@
 #include "postgres.h"
 
 #include "access/transam.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_proc.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
@@ -40,6 +41,7 @@
 #include "nodes/queryjumble.h"
 #include "utils/lsyscache.h"
 #include "parser/scansup.h"
+#include "utils/lsyscache.h"
 
 #define JUMBLE_SIZE				1024	/* query serialization buffer size */
 
@@ -67,6 +69,7 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry(JumbleState *jstate, Node *node);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -513,3 +516,39 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * RangeTblEntry is jumbled exactly like the automatic jumbling would do, but
+ * with using the name of temp tables instead of OID. Since temp tables have to
+ * be recreated for each session (or even transaction), their OID is useless
+ * for fingerprinting.
+ */
+static void
+_jumbleRangeTblEntry(JumbleState *jstate, Node *node)
+{
+	RangeTblEntry *expr = (RangeTblEntry *) node;
+	char	   *rel_name;
+
+	JUMBLE_FIELD(rtekind);
+
+	if (expr->rtekind == RTE_RELATION && isAnyTempNamespace(get_rel_namespace(expr->relid)))
+	{
+		rel_name = get_rel_name(expr->relid);
+		AppendJumble(jstate, (const unsigned char *)"pg_temp", sizeof("pg_temp"));
+		AppendJumble(jstate, (const unsigned char *)rel_name, strlen(rel_name));
+	}
+	else
+		JUMBLE_FIELD(relid);
+
+	JUMBLE_FIELD(inh);
+	JUMBLE_NODE(tablesample);
+	JUMBLE_NODE(subquery);
+	JUMBLE_FIELD(jointype);
+	JUMBLE_NODE(functions);
+	JUMBLE_FIELD(funcordinality);
+	JUMBLE_NODE(tablefunc);
+	JUMBLE_NODE(values_lists);
+	JUMBLE_STRING(ctename);
+	JUMBLE_FIELD(ctelevelsup);
+	JUMBLE_STRING(enrname);
+}
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..4fbe1520548 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1019,6 +1019,12 @@ typedef struct PartitionCmd
  *	  rewriter to implement security-barrier views and/or row-level security.
  *	  Note that the planner turns each boolean expression into an implicitly
  *	  AND'ed sublist, as is its usual habit with qualification expressions.
+ *
+ *	  RangeTblEntry uses a custom query jumble function to hash temp tables by
+ *	  name instead of by OID. The query_jumble_ignore markers on struct members
+ *	  are still kept for documentation; if the custom_query_jumble attribute is
+ *	  dropped, the automatically generated _jumbleRangeTblEntry function should
+ *	  be identical except for the relid.
  *--------------------
  */
 typedef enum RTEKind
@@ -1039,7 +1045,7 @@ typedef enum RTEKind
 
 typedef struct RangeTblEntry
 {
-	pg_node_attr(custom_read_write)
+	pg_node_attr(custom_read_write, custom_query_jumble)
 
 	NodeTag		type;
 
-- 
2.47.2



^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-21 00:54  Michael Paquier <[email protected]>
  parent: Christoph Berg <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Michael Paquier @ 2025-03-21 00:54 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Wed, Mar 19, 2025 at 01:02:54PM +0100, Christoph Berg wrote:
> You are of course right, that one-line comment was just snakeoil :D.
> Now there are proper ones, thanks.

I have been thinking about this patch for a couple of days.  What
makes me unhappy in this proposal is that RangeTblEntry is a large and
complicated Node, and we only want to force a custom computation for
the "relid" portion of the node, while being able to also take some
decisions for what the parent node has.  Leaving the existing
per-field query_jumble_ignore with the custom function is prone to
errors, as well, because we need to maintain some level of consistency
between parsenodes.h and src/backend/nodes/.

Hence here is a counter-proposal, that can bring the best of both
worlds: let's add support for custom_query_jumble at field level.
I've spent some time on that, and some concatenation in a macro used
by gen_node_support.pl to force a policy for the custom function name
and its arguments is proving to be rather straight-forward.  This
approach addresses the problem of this thread, while also tackling my 
concerns about complex node structures.

The custom functions are named _jumble${node}_${field}, with the field
and the parent node given as arguments.  I agree that giving the field
is kind of pointless if you have the parent node, but I think that
this is better so as this forces developers to think about how to use
the field value with the node.

Please see the attached.  What do you think? 
--
Michael


Attachments:

  [text/x-diff] v3-0001-Add-support-for-custom_query_jumble-at-field-leve.patch (3.8K, 2-v3-0001-Add-support-for-custom_query_jumble-at-field-leve.patch)
  download | inline diff:
From 11e3154cabdc24feb14e91d35c0cfee5f6c0ca2c Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Fri, 21 Mar 2025 09:35:38 +0900
Subject: [PATCH v3 1/2] Add support for custom_query_jumble at field-level

This option gives the possibility for query jumbling to force custom
jumbling policies specific to a field of a node.  When dealing with
complex node structures, this is simpler than having to enforce a custom
function across the full node.

Custom functions need to be defined as _jumble${node}_${field}, are
given in input the parent node and the field value.  The field value is
not really required if we have the parent Node, but it makes custom
implementations easier to follow with field-specific definitions and
values.  The code generated by gen_node_support.pl uses a macro called
JUMBLE_CUSTOM(), that hides the objects specific to queryjumblefuncs.c.
---
 src/include/nodes/nodes.h             |  4 ++++
 src/backend/nodes/gen_node_support.pl | 11 +++++++++++
 src/backend/nodes/queryjumblefuncs.c  |  6 ++++++
 3 files changed, 21 insertions(+)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index d18044b4e650..adec68a45ab8 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -54,6 +54,7 @@ typedef enum NodeTag
  *   readfuncs.c.
  *
  * - custom_query_jumble: Has custom implementation in queryjumblefuncs.c.
+ *   Available as well as a per-field attribute.
  *
  * - no_copy: Does not support copyObject() at all.
  *
@@ -101,6 +102,9 @@ typedef enum NodeTag
  * - equal_ignore_if_zero: Ignore the field for equality if it is zero.
  *   (Otherwise, compare normally.)
  *
+ * - custom_query_jumble: Has custom implementation in queryjumblefuncs.c
+ *   applied for the field of a node.  Available as well as a node attribute.
+ *
  * - query_jumble_ignore: Ignore the field for the query jumbling.  Note
  *   that typmod and collation information are usually irrelevant for the
  *   query jumbling.
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 7e3f335ac09d..bcae70cea7d4 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -471,6 +471,7 @@ foreach my $infile (@ARGV)
 								&& $attr !~ /^read_as\(\w+\)$/
 								&& !elem $attr,
 								qw(copy_as_scalar
+								custom_query_jumble
 								equal_as_scalar
 								equal_ignore
 								equal_ignore_if_zero
@@ -1283,12 +1284,17 @@ _jumble${n}(JumbleState *jstate, Node *node)
 		my $t = $node_type_info{$n}->{field_types}{$f};
 		my @a = @{ $node_type_info{$n}->{field_attrs}{$f} };
 		my $query_jumble_ignore = $struct_no_query_jumble;
+		my $query_jumble_custom = 0;
 		my $query_jumble_location = 0;
 		my $query_jumble_squash = 0;
 
 		# extract per-field attributes
 		foreach my $a (@a)
 		{
+			if ($a eq 'custom_query_jumble')
+			{
+				$query_jumble_custom = 1;
+			}
 			if ($a eq 'query_jumble_ignore')
 			{
 				$query_jumble_ignore = 1;
@@ -1328,6 +1334,11 @@ _jumble${n}(JumbleState *jstate, Node *node)
 				  unless $query_jumble_ignore;
 			}
 		}
+		elsif ($query_jumble_custom)
+		{
+			# Custom function that applies to one field of a node.
+			print $jff "\tJUMBLE_CUSTOM($n, $f);\n";
+		}
 		elsif ($t eq 'char*')
 		{
 			print $jff "\tJUMBLE_STRING($f);\n"
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 189bfda610aa..9b9cf6f34381 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -333,6 +333,12 @@ do { \
 	if (expr->str) \
 		AppendJumble(jstate, (const unsigned char *) (expr->str), strlen(expr->str) + 1); \
 } while(0)
+/*
+ * Note that the argument types are enforced for the per-field custom
+ * functions.
+ */
+#define JUMBLE_CUSTOM(nodetype, item) \
+	_jumble##nodetype##_##item(jstate, expr, expr->item)
 
 #include "queryjumblefuncs.funcs.c"
 
-- 
2.49.0



  [text/x-diff] v3-0002-Add-custom-query-jumble-function-for-RangeTblEntr.patch (5.0K, 3-v3-0002-Add-custom-query-jumble-function-for-RangeTblEntr.patch)
  download | inline diff:
From 1c5b1a99aee4d0872d42b6edfdaab1266d13a522 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Fri, 21 Mar 2025 09:43:48 +0900
Subject: [PATCH v3 2/2] Add custom query jumble function for
 RangeTblEntry.relid

---
 src/include/nodes/parsenodes.h                |  9 +++--
 src/backend/nodes/queryjumblefuncs.c          | 27 +++++++++++++++
 .../pg_stat_statements/expected/utility.out   | 33 +++++++++++++++++++
 contrib/pg_stat_statements/sql/utility.sql    | 12 +++++++
 4 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf2..3eb16846e0e1 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1093,8 +1093,13 @@ typedef struct RangeTblEntry
 	 * relation.  This allows plans referencing AFTER trigger transition
 	 * tables to be invalidated if the underlying table is altered.
 	 */
-	/* OID of the relation */
-	Oid			relid;
+
+	/*
+	 * OID of the relation.  A custom query jumble function is used here for
+	 * temporary tables, where the computation uses the relation name instead
+	 * of the OID.
+	 */
+	Oid			relid pg_node_attr(custom_query_jumble);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */
diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c
index 9b9cf6f34381..fbb05eab1bbe 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -33,6 +33,7 @@
 #include "postgres.h"
 
 #include "access/transam.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_proc.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
@@ -67,6 +68,9 @@ static void _jumbleElements(JumbleState *jstate, List *elements);
 static void _jumbleA_Const(JumbleState *jstate, Node *node);
 static void _jumbleList(JumbleState *jstate, Node *node);
 static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node);
+static void _jumbleRangeTblEntry_relid(JumbleState *jstate,
+									   RangeTblEntry *expr,
+									   Oid relid);
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -519,3 +523,26 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node)
 	JUMBLE_FIELD(is_local);
 	JUMBLE_LOCATION(location);
 }
+
+/*
+ * Custom query jumble function for RangeTblEntry.relid.
+ */
+static void
+_jumbleRangeTblEntry_relid(JumbleState *jstate,
+						   RangeTblEntry *expr,
+						   Oid relid)
+{
+	/*
+	 * If this is a temporary table, jumble its name instead of the table OID.
+	 */
+	if (expr->rtekind == RTE_RELATION &&
+		isAnyTempNamespace(get_rel_namespace(expr->relid)))
+	{
+		char	   *relname = get_rel_name(expr->relid);
+
+		AppendJumble(jstate, (const unsigned char *) "pg_temp", sizeof("pg_temp"));
+		AppendJumble(jstate, (const unsigned char *) relname, strlen(relname));
+	}
+	else
+		JUMBLE_FIELD(relid);
+}
diff --git a/contrib/pg_stat_statements/expected/utility.out b/contrib/pg_stat_statements/expected/utility.out
index aa4f0f7e6280..941ba0e66deb 100644
--- a/contrib/pg_stat_statements/expected/utility.out
+++ b/contrib/pg_stat_statements/expected/utility.out
@@ -9,6 +9,39 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t;
  t
 (1 row)
 
+-- Temporary tables, grouped together.
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+ id 
+----
+(0 rows)
+
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls |                       query                        
+-------+----------------------------------------------------
+     2 | BEGIN
+     2 | COMMIT
+     2 | CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP
+     2 | SELECT * FROM temp_t
+     1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t
+(5 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 -- Tables, indexes, triggers
 CREATE TEMP TABLE tab_stats (a int, b char(20));
 CREATE INDEX index_stats ON tab_stats(b, (b || 'data1'), (b || 'data2')) WHERE a > 0;
diff --git a/contrib/pg_stat_statements/sql/utility.sql b/contrib/pg_stat_statements/sql/utility.sql
index dd97203c2102..e21b656d44a8 100644
--- a/contrib/pg_stat_statements/sql/utility.sql
+++ b/contrib/pg_stat_statements/sql/utility.sql
@@ -6,6 +6,18 @@
 SET pg_stat_statements.track_utility = TRUE;
 SELECT pg_stat_statements_reset() IS NOT NULL AS t;
 
+-- Temporary tables, grouped together.
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+BEGIN;
+  CREATE TEMP TABLE temp_t (id int) ON COMMIT DROP;
+  SELECT * FROM temp_t;
+COMMIT;
+SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
 -- Tables, indexes, triggers
 CREATE TEMP TABLE tab_stats (a int, b char(20));
 CREATE INDEX index_stats ON tab_stats(b, (b || 'data1'), (b || 'data2')) WHERE a > 0;
-- 
2.49.0



  [application/pgp-signature] signature.asc (833B, 4-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-21 16:26  Christoph Berg <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Christoph Berg @ 2025-03-21 16:26 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Re: Michael Paquier
> I have been thinking about this patch for a couple of days.  What
> makes me unhappy in this proposal is that RangeTblEntry is a large and
> complicated Node, and we only want to force a custom computation for
> the "relid" portion of the node, while being able to also take some
> decisions for what the parent node has.  Leaving the existing
> per-field query_jumble_ignore with the custom function is prone to
> errors, as well, because we need to maintain some level of consistency
> between parsenodes.h and src/backend/nodes/.

Ack, that was also bothering me, but I didn't think it was so easy to
do it on a per-field level. Thanks!

> The custom functions are named _jumble${node}_${field}, with the field
> and the parent node given as arguments.  I agree that giving the field
> is kind of pointless if you have the parent node, but I think that
> this is better so as this forces developers to think about how to use
> the field value with the node.

Makes sense.

> Please see the attached.  What do you think?

Just one minor thing, I don't understand what you are trying to say in
this comment:

> +/*
> + * Note that the argument types are enforced for the per-field custom
> + * functions.
> + */
> +#define JUMBLE_CUSTOM(nodetype, item) \
> +	_jumble##nodetype##_##item(jstate, expr, expr->item)

Thanks,
Christoph






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-21 22:45  Michael Paquier <[email protected]>
  parent: Christoph Berg <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Michael Paquier @ 2025-03-21 22:45 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Fri, Mar 21, 2025 at 05:26:20PM +0100, Christoph Berg wrote:
> Just one minor thing, I don't understand what you are trying to say in
> this comment:
> 
>> +/*
>> + * Note that the argument types are enforced for the per-field custom
>> + * functions.
>> + */
>> +#define JUMBLE_CUSTOM(nodetype, item) \
>> +	_jumble##nodetype##_##item(jstate, expr, expr->item)

In this one, I want to mean that we require a custom per-field
function to look like that:
_jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);

Rather than having more generic shape like that:
_jumbleNodefoo_field(JumbleState *jstate, Node *exp,
                     const unsigned char *item, Size size);

So a custom function is defined so as the node type and field type are
arguments.  Perhaps this comment would be better if reworded like
that:
"The arguments of this function use the node type and the field type,
rather than a generic argument like AppendJumble() and the other
_jumble() functions."

If you have a better idea, please feel free..
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-22 09:43  Christoph Berg <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 2 replies; 10+ messages in thread

From: Christoph Berg @ 2025-03-22 09:43 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Re: Michael Paquier
> >> + * Note that the argument types are enforced for the per-field custom
> >> + * functions.
> >> + */
> >> +#define JUMBLE_CUSTOM(nodetype, item) \
> >> +	_jumble##nodetype##_##item(jstate, expr, expr->item)
> 
> In this one, I want to mean that we require a custom per-field
> function to look like that:
> _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
> 
> Rather than having more generic shape like that:
> _jumbleNodefoo_field(JumbleState *jstate, Node *exp,
>                      const unsigned char *item, Size size);
> 
> So a custom function is defined so as the node type and field type are
> arguments.  Perhaps this comment would be better if reworded like
> that:
> "The arguments of this function use the node type and the field type,
> rather than a generic argument like AppendJumble() and the other
> _jumble() functions."

Perhaps this:

/*
 * The per-field custom jumble functions get jstate, the node, and the
 * field as arguments.
 */

They are not actually different from _jumbleList and _jumbleA_Const
which also get the node (and just not the field). AppendJumble is a
different layer, the output, so it's not surprising its signature is
different.

Are we at the point where the patch is already Ready for Committer?

Thanks,
Christoph






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-22 10:12  Christoph Berg <[email protected]>
  parent: Christoph Berg <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Christoph Berg @ 2025-03-22 10:12 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Re: To Michael Paquier
> > >> +#define JUMBLE_CUSTOM(nodetype, item) \
> > >> +	_jumble##nodetype##_##item(jstate, expr, expr->item)
> > 
> > In this one, I want to mean that we require a custom per-field
> > function to look like that:
> > _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
> 
> Perhaps this:

Or actually more explicit:

/*
 * Per-field custom jumble functions have this signature:
 * _jumbleNodefoo_field(JumbleState *jstate, NodeFoo *expr, FieldType field);
 */

Christoph






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-22 10:32  Michael Paquier <[email protected]>
  parent: Christoph Berg <[email protected]>
  1 sibling, 1 reply; 10+ messages in thread

From: Michael Paquier @ 2025-03-22 10:32 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Sat, Mar 22, 2025 at 10:43:00AM +0100, Christoph Berg wrote:
> Are we at the point where the patch is already Ready for Committer?

I'll think a bit more about how to document all that.  Anyway, yes,
I'm OK with the per-field custom_query_jumble, so let's move on with
that, so I will do something about that.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-22 15:12  Tom Lane <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Tom Lane @ 2025-03-22 15:12 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Michael Paquier <[email protected]> writes:
> On Sat, Mar 22, 2025 at 10:43:00AM +0100, Christoph Berg wrote:
>> Are we at the point where the patch is already Ready for Committer?

> I'll think a bit more about how to document all that.  Anyway, yes,
> I'm OK with the per-field custom_query_jumble, so let's move on with
> that, so I will do something about that.

I'm not terribly happy with the entire proposal.

(1) I know it was asserted upthread that there was no performance
impact, but I find that hard to believe.

(2) This patch inserts catalog lookups into query ID computation,
which AFAIK there never were before.  This means you can't compute a
query ID outside a transaction or in an already-failed transaction.
Surely that's going to bite us eventually.

(3) I think having query jumbling work differently for temp tables
than other tables is a fundamentally bad idea.

So my feeling is: if we think this is the behavior we want, let's do
it across the board.  I suggest that we simply drop the relid from the
jumble and use the table alias (that is, eref->aliasname) instead.
ISTM this fits well with the general trend in pg_stat_statements
to merge statements together more aggressively than the original
concept envisioned.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-22 16:24  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 10+ messages in thread

From: Tom Lane @ 2025-03-22 16:24 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Christoph Berg <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

I wrote:
> So my feeling is: if we think this is the behavior we want, let's do
> it across the board.  I suggest that we simply drop the relid from the
> jumble and use the table alias (that is, eref->aliasname) instead.

I experimented with this trivial fix (shown in-line to keep the cfbot
from thinking this is the patch-of-record):

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 23c9e3c5abf..a54bbdc18b7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1051,7 +1051,7 @@ typedef struct RangeTblEntry
 	/* user-written alias clause, if any */
 	Alias	   *alias pg_node_attr(query_jumble_ignore);
 	/* expanded reference names */
-	Alias	   *eref pg_node_attr(query_jumble_ignore);
+	Alias	   *eref;
 
 	RTEKind		rtekind;		/* see above */
 
@@ -1094,7 +1094,7 @@ typedef struct RangeTblEntry
 	 * tables to be invalidated if the underlying table is altered.
 	 */
 	/* OID of the relation */
-	Oid			relid;
+	Oid			relid pg_node_attr(query_jumble_ignore);
 	/* inheritance requested? */
 	bool		inh;
 	/* relation kind (see pg_class.relkind) */

This caused just one diff in existing regression test cases:

diff --git a/contrib/pg_stat_statements/expected/planning.out b/contrib/pg_stat_statements/expected/planning.out
index 3ee1928cbe9..c25b8b946fd 100644
--- a/contrib/pg_stat_statements/expected/planning.out
+++ b/contrib/pg_stat_statements/expected/planning.out
@@ -75,8 +75,9 @@ SELECT plans >= 2 AND plans <= calls AS plans_ok, calls, rows, query FROM pg_sta
   WHERE query LIKE 'SELECT COUNT%' ORDER BY query COLLATE "C";
  plans_ok | calls | rows |                query                 
 ----------+-------+------+--------------------------------------
- t        |     4 |    4 | SELECT COUNT(*) FROM stats_plan_test
-(1 row)
+ f        |     1 |    1 | SELECT COUNT(*) FROM stats_plan_test
+ f        |     3 |    3 | SELECT COUNT(*) FROM stats_plan_test
+(2 rows)
 
 -- Cleanup
 DROP TABLE stats_plan_test;

What's happening there is that there's an ALTER TABLE ADD COLUMN in
the test, so the executions after the first one see more entries
in eref->colnames and come up with a different jumble.  I think
we probably don't want that behavior; we only want to jumble the
table name.  So we'd still need the v3-0001 patch in some form to
allow annotating RangeTblEntry.eref with a custom jumble method
that'd only jumble the aliasname.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 10+ messages in thread


end of thread, other threads:[~2025-03-22 16:24 UTC | newest]

Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-18 23:18 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
2025-03-19 12:02 ` Christoph Berg <[email protected]>
2025-03-21 00:54   ` Michael Paquier <[email protected]>
2025-03-21 16:26     ` Christoph Berg <[email protected]>
2025-03-21 22:45       ` Michael Paquier <[email protected]>
2025-03-22 09:43         ` Christoph Berg <[email protected]>
2025-03-22 10:12           ` Christoph Berg <[email protected]>
2025-03-22 10:32           ` Michael Paquier <[email protected]>
2025-03-22 15:12             ` Tom Lane <[email protected]>
2025-03-22 16:24               ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox