Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txDPq-004slN-6D for pgsql-hackers@arkaria.postgresql.org; Tue, 25 Mar 2025 23:10:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1txDPo-00BRZt-GH for pgsql-hackers@arkaria.postgresql.org; Tue, 25 Mar 2025 23:10:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1txDPo-00BRZk-51 for pgsql-hackers@lists.postgresql.org; Tue, 25 Mar 2025 23:10:36 +0000 Received: from fhigh-a1-smtp.messagingengine.com ([103.168.172.152]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1txDPl-001ARq-2i for pgsql-hackers@lists.postgresql.org; Tue, 25 Mar 2025 23:10:35 +0000 Received: from phl-compute-06.internal (phl-compute-06.phl.internal [10.202.2.46]) by mailfhigh.phl.internal (Postfix) with ESMTP id 1E2D31140152; Tue, 25 Mar 2025 19:10:31 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Tue, 25 Mar 2025 19:10:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paquier.xyz; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm2; t=1742944231; x=1743030631; bh=TMPJXdeZe4 bS7qwLlXizIlVhCJ0UsDibiGVnhVKJFJc=; b=hBNou9Dxjssbwidp4DC1wH01sq jctVbeIkguQSphQPJ5Cq6eA8jQNa+7Mphs9zuns3dfU/XTe2k19gyyc96wIaARao NY7hF2wHUFm5UE/kPKf86an7B54oT6GB4QnajVT56S6Lc3HMT+G3C+JpJJx104+f dSN2arqg3pJDewrc1NI9W1trrWU0KKJaWSChCCFfaRTaZV0FE+uhnS6OJaTMWZls ngfit8XWNX5nLe8yPmFq+Ohog2WKR09wq6u3FL5RNXAcEqnwnG2wc0qX1qZ00Qsg DaG47rvARsw8y3lDDCXFrIJ8hjsuOfKLnPsnWKDmsZ5nhC+YnNSoAIAKb6ww== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1742944231; x=1743030631; bh=TMPJXdeZe4bS7qwLlXizIlVhCJ0UsDibiGV nhVKJFJc=; b=uvlZ9f0Q0R4O6GW8jL6bRWlkvmWb0SjQ1ZeURSE9aEb6XPNuqU7 uMqeKizE4nd+gCMBBxgruMS97ToKdrM2N9cPonJFzQy7eafurEPaDijbmhI1LgbP LIuovYNqGjqVegtQzrZNnu08cR4QGD6Rz9b9vz7OSzFd0A7RXmBccv99yuf18KL3 sChY7XuQrI/JhWaX9WPOrUiG61iUea43xIZGuPiQ8t4JDPJOEKE1Yos9d3UgdOrA LtBAHHZXvaXTcJPqiPM+12HbmN0PL70vXnNVOOFYWcTxUmIv7o1TlNg8yWeVFENw o7UuH0O0LfA8X28gw8tAB4M+YCkH6fiheMg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduieefleegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnegfrhhlucfvnfffucdljedtmdenucfjughrpeffhffvvefu kfhfgggtuggjsehgtderredttddvnecuhfhrohhmpefoihgthhgrvghlucfrrghquhhivg hruceomhhitghhrggvlhesphgrqhhuihgvrhdrgiihiieqnecuggftrfgrthhtvghrnhep teelieefudffhffhtdetleeggeegfffhkeeuveetiefgudduvedutefggeeivdejnecuve hluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepmhhitghhrggv lhesphgrqhhuihgvrhdrgiihiidpnhgspghrtghpthhtohepiedpmhhouggvpehsmhhtph houhhtpdhrtghpthhtohepshgrmhhimhhsvghihhesghhmrghilhdrtghomhdprhgtphht thhopehmhihonhesuggvsghirghnrdhorhhgpdhrtghpthhtoheplhhukhgrshesfhhith htlhdrtghomhdprhgtphhtthhopehtghhlsehsshhsrdhpghhhrdhprgdruhhspdhrtghp thhtohepphhgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrd horhhgpdhrtghpthhtohepmhgruddttdeshhhothhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: i0fe9450f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 25 Mar 2025 19:10:28 -0400 (EDT) Date: Wed, 26 Mar 2025 08:10:16 +0900 From: Michael Paquier To: Sami Imseih Cc: Christoph Berg , Lukas Fittl , Tom Lane , PostgreSQL Hackers , ma lz Subject: Re: query_id: jumble names of temp tables for better pg_stat_statement UX Message-ID: References: <461405.1742691859@sss.pgh.pa.us> <1189112.1742869660@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="s8ia3hVjg8lWZQf/" Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --s8ia3hVjg8lWZQf/ Content-Type: multipart/mixed; boundary="rRt3L36MHAlbwzfM" Content-Disposition: inline --rRt3L36MHAlbwzfM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 25, 2025 at 11:58:21AM -0500, Sami Imseih wrote: > "Since the queryid hash value is computed on the post-parse-analysis > representation of the queries, the opposite is also possible: queries with > identical texts might appear as separate entries, if they have different > meanings as a result of factors such as different search_path settings." >=20 > I think this text could remain as is, because search_path still > matters for things like functions, etc. Yeah, I think that's OK as-is. I'm open to more improvements, including more tests for these function patterns. It's one of these areas where we should be able to tweak RangeTblFunction and apply a custom function to its funcexpr, and please note that I have no idea how complex it could become as this is a Node expression. :D Functions in a temporary schema is not something as common as temp tables, I guess, so these matter less, but they would still be a cause of bloat for monitoring in very specific workloads. > 2/ > "For example, pg_stat_statements will consider two apparently-identical > queries to be distinct, if they reference a table that was dropped and > recreated between the executions of the two queries." >=20 > This is no longer true for relations, but is still true for functions. I = think > we should mention the caveats in a bit more detail as this change > will have impact on the most common case. What about something > like this? >=20 > "For example, pg_stat_statements will consider two apparently-identical > queries to be distinct, if they reference a function that was dropped and > recreated between the executions of the two queries. That's a bit larger than functions, but we could remain a bit more evasive, with "if they reference *for example* a function that was dropped and recreated between the executions of the two queries". Note that for DDLs, like CREATE TABLE, we also group entries with identical relation names, so we are kind of in line with the patch, not with the current docs. > Conversely, if a table is dropped and recreated between the > executions of queries, two apparently-identical queries may be > considered the same. However, if the alias for a table is different > for semantically similar queries, these queries will be considered distin= ct" This addition sounds like an improvement here. As this thread has proved, we had little coverage these cases in pgss, so I've applied the tests as an independent change. It is also useful to track how things change in the commit history depending on how the computation is tweaked. I've also included your doc suggestions. I feel that we could do better here, but that's a common statement anyway when it comes to the documentation. -- Michael --rRt3L36MHAlbwzfM Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="v6-0001-Add-custom-query-jumble-function-for-RangeTblEntr.patch" Content-Transfer-Encoding: quoted-printable =46rom 8ea61bb0d7d6c4dbbb40dbaedb5e751027163cfe Mon Sep 17 00:00:00 2001 =46rom: Michael Paquier Date: Wed, 26 Mar 2025 08:07:59 +0900 Subject: [PATCH v6] Add custom query jumble function for RangeTblEntry.eref --- src/include/nodes/parsenodes.h | 11 +++++++--- src/backend/nodes/queryjumblefuncs.c | 19 ++++++++++++++++++ doc/src/sgml/pgstatstatements.sgml | 9 +++++++-- .../pg_stat_statements/expected/select.out | 20 ++++++++----------- 4 files changed, 42 insertions(+), 17 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 23c9e3c5abf2..a87f949b389e 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -1050,8 +1050,13 @@ typedef struct RangeTblEntry */ /* user-written alias clause, if any */ Alias *alias pg_node_attr(query_jumble_ignore); - /* expanded reference names */ - Alias *eref pg_node_attr(query_jumble_ignore); + + /* + * Expanded reference names. This uses a custom query jumble function so + * as the table name is included in the computation, not its list of + * columns. + */ + Alias *eref pg_node_attr(custom_query_jumble); =20 RTEKind rtekind; /* see above */ =20 @@ -1094,7 +1099,7 @@ typedef struct RangeTblEntry * tables to be invalidated if the underlying table is altered. */ /* OID of the relation */ - Oid relid; + Oid relid pg_node_attr(query_jumble_ignore); /* inheritance requested? */ bool inh; /* relation kind (see pg_class.relkind) */ diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/query= jumblefuncs.c index f8b0f91704ba..62d6cfb7ac15 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -67,6 +67,9 @@ static void _jumbleElements(JumbleState *jstate, List *el= ements); static void _jumbleA_Const(JumbleState *jstate, Node *node); static void _jumbleList(JumbleState *jstate, Node *node); static void _jumbleVariableSetStmt(JumbleState *jstate, Node *node); +static void _jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr); =20 /* * Given a possibly multi-statement source string, confine our attention t= o the @@ -516,3 +519,19 @@ _jumbleVariableSetStmt(JumbleState *jstate, Node *node) JUMBLE_FIELD(is_local); JUMBLE_LOCATION(location); } + +/* + * Custom query jumble function for RangeTblEntry.eref. + */ +static void +_jumbleRangeTblEntry_eref(JumbleState *jstate, + RangeTblEntry *rte, + Alias *expr) +{ + JUMBLE_FIELD(type); + + /* + * This includes only the table name, the list of column names is ignored. + */ + JUMBLE_STRING(aliasname); +} diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatem= ents.sgml index f4e384e95aea..679381f00607 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -675,8 +675,13 @@ calls | 2 things, the internal object identifiers appearing in this representatio= n. This has some counterintuitive implications. For example, pg_stat_statements will consider two apparently-id= entical - queries to be distinct, if they reference a table that was dropped - and recreated between the executions of the two queries. + queries to be distinct, if they reference for example a function that w= as + dropped and recreated between the executions of the two queries. + Conversely, if a table is dropped and recreated between the + executions of queries, two apparently-identical queries may be + considered the same. However, if the alias for a table is different + for semantically similar queries, these queries will be considered + distinct. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that queryid diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_st= at_statements/expected/select.out index 708c6b0e9c41..1eebc2898ab9 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -433,11 +433,10 @@ COMMIT; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query = =20 -------+------------------------------------------------------------------= ------ - 1 | SELECT * FROM temp_t - 1 | SELECT * FROM temp_t + 2 | SELECT * FROM temp_t 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLAT= E "C" 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(4 rows) +(3 rows) =20 SELECT pg_stat_statements_reset() IS NOT NULL AS t; t=20 @@ -623,18 +622,15 @@ SELECT a AS a1 FROM pgss_schema_2.tab_search_diff_2; SELECT calls, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | query = =20 -------+------------------------------------------------------------------= ------ - 3 | SELECT a FROM pgss_schema_2.tab_search_diff_2 AS t1 - 9 | SELECT a FROM tab_search_diff_2 AS t1 - 1 | SELECT a, b FROM pgss_schema_1.tab_search_same - 3 | SELECT a, b FROM tab_search_same + 8 | SELECT a FROM tab_search_diff_2 + 4 | SELECT a FROM tab_search_diff_2 AS t1 + 4 | SELECT a, b FROM tab_search_same 0 | SELECT calls, query FROM pg_stat_statements ORDER BY query COLLAT= E "C" - 1 | SELECT count(*) FROM pgss_schema_1.tab_search_same - 1 | SELECT count(*) FROM pgss_schema_2.tab_search_diff_1 - 3 | SELECT count(*) FROM tab_search_diff_1 + 4 | SELECT count(*) FROM tab_search_diff_1 4 | SELECT count(*) FROM tab_search_diff_2 - 3 | SELECT count(*) FROM tab_search_same + 4 | SELECT count(*) FROM tab_search_same 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(11 rows) +(8 rows) =20 DROP SCHEMA pgss_schema_1 CASCADE; NOTICE: drop cascades to 3 other objects --=20 2.49.0 --rRt3L36MHAlbwzfM-- --s8ia3hVjg8lWZQf/ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEG72nH6vTowiyblFKnvQgOdbyQH0FAmfjN9gACgkQnvQgOdby QH1WVQ/9EyLUAhS+boIbF+OdqdxB7WliGPUVcR4t2Ltyj0l9I4qP3rzQYax+IASa anFI/SKdmGchuaJBlrkB2ip/plvVH80HLg/WaV0uO73mI7uRjVpLiZEoycFXcKFq 7wBGkfaf3S05N24aIc4+7CqURfWWhoNh3t+TgvJ5H5y0q6pFkM+D4Ga8lKzwR3Gu BZafCMOcLYIMhPhfVlW7tA5zPYUQnY0wBwNGA5Uw2PAKBEomOmiK4HWPBpCHTR48 kQ33IKudF03IonaTIv3OoQ25r3fL44Rb59Nh4rtcHTLVV7Vta+9APHePhKDg1hbA CfOXuNNKKc8RbaYG/JDDsL3rGr6XJ2Lm5ghBwqVYYUDvDb5dOPh3fcBnkn24gj/K sH8Ug7CHBQaaHD0A4IHwupaC3o61+SAj/TGArnyvckYPEZ3TQ6qtvsQHQMHgGd1L ris27l0vKJlL28xzZyxzx1LhUHiVtCF/+DzNMdjDXOj9YcXytK+ISm6a7HnA6PYP 5m7TA3kGiD9nnypAdcfqGo9TIvHtSYbmhWzZjigLHoR1NgtgtbLIBUXFBOB9qwQX 80RQYDbXHoEXNxCxYHhJRrssRjP/RZdin12gg/hdNSdEpP5uXQ2E9y/CJweTrQsy i7GZgptuGTgyFZfIA1N1yvpqm/FYQdU/VIytIaLI5l+vhuWvRUY= =0gjP -----END PGP SIGNATURE----- --s8ia3hVjg8lWZQf/--