public inbox for [email protected]  
help / color / mirror / Atom feed
Re: query_id: jumble names of temp tables for better pg_stat_statement UX
4+ messages / 2 participants
[nested] [flat]

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
@ 2025-03-26 06:24 Michael Paquier <[email protected]>
  2025-07-15 14:48 ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Alexander Kukushkin <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Michael Paquier @ 2025-03-26 06:24 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Mar 25, 2025 at 07:56:29PM -0500, Sami Imseih wrote:
> FWIW, the pg_stat_statements docs in a few places refer to 
> queries that may look different but have the same meaning 
> as “semantically equivalent”, this is why I used the same 
> terminology here.  But, I have no issue with the simplified
> rewrite either.
> 
> The patch LGTM as well. 

If any adjustments are required, it would be always possible to do
these later.  Anyway, workloads with a lot of temporary tables are
going to benefit from this change, so let's see how it goes.

Applied, after eyeing much more the PGSS dumps from installcheck
before and after the patch, to make sure that I'm not missing
something..
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
  2025-03-26 06:24 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
@ 2025-07-15 14:48 ` Alexander Kukushkin <[email protected]>
  2025-07-15 23:39   ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Alexander Kukushkin @ 2025-07-15 14:48 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

Hi,

I totally understand the wish to make pg_stat_statements useful for
workloads that create/drop a ton of temporary tables.
However, when pursuing this goal we impacted other types of totally valid
workloads when tables with the same name exist in different schemas.
Example:
create schema s1;
create table s1.t as select id from generate_series(1, 10) as id;
create schema s2;
create table s1.t as select id from generate_series(1, 1000000) as id;
select count(id) from s1.t;
select count(id) from s2.t;

select * from pg_stat_statements;
userid                     | 10
dbid                       | 5
toplevel                   | t
queryid                    | -8317141500049987426
query                      | select count(id) from s1.t
plans                      | 0
total_plan_time            | 0
min_plan_time              | 0
max_plan_time              | 0
mean_plan_time             | 0
stddev_plan_time           | 0
calls                      | 2
total_exec_time            | 22.577107
min_exec_time              | 0.325021
max_exec_time              | 22.252086000000002
mean_exec_time             | 11.2885535
stddev_exec_time           | 10.963532500000001
rows                       | 2
shared_blks_hit            | 4425

That is, two different queries, accessing two absolutely different tables
(one of them has 100000 times more rows!) were merged together.

Regards,
--
Alexander Kukushkin


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
  2025-03-26 06:24 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
  2025-07-15 14:48 ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Alexander Kukushkin <[email protected]>
@ 2025-07-15 23:39   ` Michael Paquier <[email protected]>
  2025-07-16 06:20     ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Alexander Kukushkin <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Michael Paquier @ 2025-07-15 23:39 UTC (permalink / raw)
  To: Alexander Kukushkin <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Tue, Jul 15, 2025 at 04:48:05PM +0200, Alexander Kukushkin wrote:
> I totally understand the wish to make pg_stat_statements useful for
> workloads that create/drop a ton of temporary tables.
> However, when pursuing this goal we impacted other types of totally valid
> workloads when tables with the same name exist in different schemas.
> Example:
> create schema s1;
> create table s1.t as select id from generate_series(1, 10) as id;
> create schema s2;
> create table s1.t as select id from generate_series(1, 1000000) as id;

I suspect that you mean s2.t and not s1.t here.

> select count(id) from s1.t;
> select count(id) from s2.t;
>
> That is, two different queries, accessing two absolutely different tables
> (one of them has 100000 times more rows!) were merged together.

Yes, we had this argument upthread, and it is still possible to
differentiate both cases by using a different alias in the FROM
clause, as of:
select count(id) from s1.t as t1;
select count(id) from s2.t as t2;

The new behavior where we do not need to worry about temporary tables,
which is not that uncommon because some workloads like using these for
JOIN patterns as a "temporary" anchor in a session, has more benefits
IMO, particularly more if the connections have a rather higher
turnover.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: query_id: jumble names of temp tables for better pg_stat_statement UX
  2025-03-26 06:24 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
  2025-07-15 14:48 ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Alexander Kukushkin <[email protected]>
  2025-07-15 23:39   ` Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
@ 2025-07-16 06:20     ` Alexander Kukushkin <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Alexander Kukushkin @ 2025-07-16 06:20 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Christoph Berg <[email protected]>; Lukas Fittl <[email protected]>; PostgreSQL Hackers <[email protected]>; ma lz <[email protected]>

On Wed, 16 Jul 2025 at 01:39, Michael Paquier <[email protected]> wrote:

>
> > create schema s1;
> > create table s1.t as select id from generate_series(1, 10) as id;
> > create schema s2;
> > create table s1.t as select id from generate_series(1, 1000000) as id;
>
> I suspect that you mean s2.t and not s1.t here.
>

Yes.


> Yes, we had this argument upthread, and it is still possible to
> differentiate both cases by using a different alias in the FROM
> clause, as of:
> select count(id) from s1.t as t1;
> select count(id) from s2.t as t2;
>
> The new behavior where we do not need to worry about temporary tables,
> which is not that uncommon because some workloads like using these for
> JOIN patterns as a "temporary" anchor in a session, has more benefits
> IMO, particularly more if the connections have a rather higher
> turnover.


Yes, I've seen this argument and know that aliases will make these queries
look different.
However, we regularly hear from many different customers that they *don't
control queries* sent by application or *can't modify these queries*.
Such kinds of workloads are also not that uncommon and this change makes it
impossible to monitor them.

I would somewhat understand when a table in the query is used without
specifying schema and such queries are merged together:
s1: SET search_path s1; select count(*) from t;
s2: SET search_path s2; select count(*) from t;

But, even this case doesn't feel right, because these tables are still
different and therefore queries.

Regards,
--
Alexander Kukushkin


^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2025-07-16 06:20 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-26 06:24 Re: query_id: jumble names of temp tables for better pg_stat_statement UX Michael Paquier <[email protected]>
2025-07-15 14:48 ` Alexander Kukushkin <[email protected]>
2025-07-15 23:39   ` Michael Paquier <[email protected]>
2025-07-16 06:20     ` Alexander Kukushkin <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox