Re: [PATCH] GROUP BY ALL

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: [PATCH] GROUP BY ALL
6+ messages / 6 participants
[nested] [flat]

* Re: [PATCH] GROUP BY ALL
@ 2024-07-22 21:33 David G. Johnston <[email protected]>
  2024-07-22 21:40 ` Re: [PATCH] GROUP BY ALL Isaac Morland <[email protected]>
  2024-07-23 13:22 ` Re: [PATCH] GROUP BY ALL David Christensen <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: David G. Johnston @ 2024-07-22 21:33 UTC (permalink / raw)
  To: David Christensen <[email protected]>; +Cc: pgsql-hackers

On Mon, Jul 22, 2024 at 1:55 PM David Christensen <[email protected]> wrote:

> I see that there'd been some chatter but not a lot of discussion about
> a GROUP BY ALL feature/functionality.  There certainly is utility in
> such a construct IMHO.
>
> Still need some docs; just throwing this out there and getting some
> feedback.
>
>
I strongly dislike adding this feature.  I'd only consider supporting it if
it was part of the SQL standard.

Code is written once and read many times.  This feature caters to
the writer, not the reader.  And furthermore usage of this is prone to be
to the writer's detriment as well.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] GROUP BY ALL
  2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
@ 2024-07-22 21:40 ` Isaac Morland <[email protected]>
  2024-07-22 22:43   ` Re: [PATCH] GROUP BY ALL Tom Lane <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Isaac Morland @ 2024-07-22 21:40 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: David Christensen <[email protected]>; pgsql-hackers

On Mon, 22 Jul 2024 at 17:34, David G. Johnston <[email protected]>
wrote:

> On Mon, Jul 22, 2024 at 1:55 PM David Christensen <[email protected]>
> wrote:
>
>> I see that there'd been some chatter but not a lot of discussion about
>> a GROUP BY ALL feature/functionality.  There certainly is utility in
>> such a construct IMHO.
>>
>> Still need some docs; just throwing this out there and getting some
>> feedback.
>>
>>
> I strongly dislike adding this feature.  I'd only consider supporting it
> if it was part of the SQL standard.
>
> Code is written once and read many times.  This feature caters to
> the writer, not the reader.  And furthermore usage of this is prone to be
> to the writer's detriment as well.
>

And for when this might be useful, the syntax for it already exists,
although a spurious error message is generated:

odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
                ^

I'm not sure exactly what's going on here — it's like it's still seeing the
table name in the field list as only a table name and not the value
corresponding to the whole table as a row value (But in general I'm not
happy with the system's ability to figure out that a column's value has
only one possibility given the grouping columns). You can work around:

odyssey=> with t as (select uw_term, count(*) from uw_term group by
uw_term) select (uw_term).*, count from t;

This query works.

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] GROUP BY ALL
  2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
  2024-07-22 21:40 ` Re: [PATCH] GROUP BY ALL Isaac Morland <[email protected]>
@ 2024-07-22 22:43   ` Tom Lane <[email protected]>
  2024-07-23 17:02     ` Re: [PATCH] GROUP BY ALL Paul Jungwirth <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tom Lane @ 2024-07-22 22:43 UTC (permalink / raw)
  To: Isaac Morland <[email protected]>; +Cc: David G. Johnston <[email protected]>; David Christensen <[email protected]>; pgsql-hackers

Isaac Morland <[email protected]> writes:
> And for when this might be useful, the syntax for it already exists,
> although a spurious error message is generated:

> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be
> used in an aggregate function
> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>                 ^

> I'm not sure exactly what's going on here

The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
uw_term.col3, ...", and those single-column Vars don't match the
whole-row Var appearing in the GROUP BY list.  I guess if we
think this is important, we could add a proof rule saying that
a per-column Var is functionally dependent on a whole-row Var
of the same relation.  Odd that the point hasn't come up before
(though I guess that suggests that few people try this).

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] GROUP BY ALL
  2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
  2024-07-22 21:40 ` Re: [PATCH] GROUP BY ALL Isaac Morland <[email protected]>
  2024-07-22 22:43   ` Re: [PATCH] GROUP BY ALL Tom Lane <[email protected]>
@ 2024-07-23 17:02     ` Paul Jungwirth <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Paul Jungwirth @ 2024-07-23 17:02 UTC (permalink / raw)
  To: [email protected]

On 7/22/24 15:43, Tom Lane wrote:
> Isaac Morland <[email protected]> writes:
>> And for when this might be useful, the syntax for it already exists,
>> although a spurious error message is generated:
> 
>> odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
>> ERROR:  column "uw_term.term_id" must appear in the GROUP BY clause or be
>> used in an aggregate function
>> LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
>>                  ^
> 
>> I'm not sure exactly what's going on here
> 
> The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
> uw_term.col3, ...", and those single-column Vars don't match the
> whole-row Var appearing in the GROUP BY list.  I guess if we
> think this is important, we could add a proof rule saying that
> a per-column Var is functionally dependent on a whole-row Var
> of the same relation.  Odd that the point hasn't come up before
> (though I guess that suggests that few people try this).

I was just using this group-by-row feature last week to implement a temporal outer join in a way 
that would work for arbitrary tables. Here is some example SQL:

https://github.com/pjungwir/temporal_ops/blob/b10d65323749faa6c47956db2e8f95441e508fce/sql/outer_joi...

That does `GROUP BY a` then `SELECT (x.a).*`.[1]

It is very useful for writing queries that don't want to know about the structure of the row.

I noticed the same error as Isaac. I worked around the problem by wrapping it in a subquery and 
decomposing the row outside. It's already an obscure feature, and an easy workaround might be why 
you haven't heard complaints before. I wouldn't mind writing a patch for that rule when I get a 
chance (if no one else gets to it first.)

[1] Actually I see it does `GROUP BY a, a.valid_at`, but that is surely more than I need. I think 
that `a.valid_at` is leftover from a previous version of the query.

Yours,

-- 
Paul              ~{:-)
[email protected]

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] GROUP BY ALL
  2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
@ 2024-07-23 13:22 ` David Christensen <[email protected]>
  2024-07-24 14:12   ` Re: [PATCH] GROUP BY ALL Ashutosh Bapat <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: David Christensen @ 2024-07-23 13:22 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: pgsql-hackers

On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
<[email protected]> wrote:
>
> On Mon, Jul 22, 2024 at 1:55 PM David Christensen <[email protected]> wrote:
>>
>> I see that there'd been some chatter but not a lot of discussion about
>> a GROUP BY ALL feature/functionality.  There certainly is utility in
>> such a construct IMHO.
>>
>> Still need some docs; just throwing this out there and getting some feedback.
>>
>
> I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.
>
> Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore usage of this is prone to be to the writer's detriment as well.

I'd say this feature (at least for me) caters to the investigator;
someone who is interactively looking at data hence why it would cater
to the writer.  Consider acquainting yourself with a large table that
has a large number of annoying-named fields where you want to look at
how different data is correlated or broken-down.  Something along the
lines of:

SELECT last_name, substring(first_name,1,1) as first_initial,
income_range, count(*) FROM census_data GROUP BY ALL;

If you are iteratively looking at things, adding or removing fields
from your breakdown, you only need to change it in a single place, the
tlist.  Additionally, expressions can be used transparently without
needing to repeat them.  (Yes, in practice, I'd often use GROUP BY 1,
2, say, but if you add more fields to this you need to edit in
multiple places.)

David

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] GROUP BY ALL
  2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
  2024-07-23 13:22 ` Re: [PATCH] GROUP BY ALL David Christensen <[email protected]>
@ 2024-07-24 14:12   ` Ashutosh Bapat <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Ashutosh Bapat @ 2024-07-24 14:12 UTC (permalink / raw)
  To: David Christensen <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-hackers

On Tue, Jul 23, 2024 at 6:53 PM David Christensen <[email protected]> wrote:
>
> On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
> <[email protected]> wrote:
> >
> > On Mon, Jul 22, 2024 at 1:55 PM David Christensen <[email protected]> wrote:
> >>
> >> I see that there'd been some chatter but not a lot of discussion about
> >> a GROUP BY ALL feature/functionality.  There certainly is utility in
> >> such a construct IMHO.
> >>
> >> Still need some docs; just throwing this out there and getting some feedback.
> >>
> >
> > I strongly dislike adding this feature.  I'd only consider supporting it if it was part of the SQL standard.
> >
> > Code is written once and read many times.  This feature caters to the writer, not the reader.  And furthermore usage of this is prone to be to the writer's detriment as well.
>
> I'd say this feature (at least for me) caters to the investigator;
> someone who is interactively looking at data hence why it would cater
> to the writer.  Consider acquainting yourself with a large table that
> has a large number of annoying-named fields where you want to look at
> how different data is correlated or broken-down.  Something along the
> lines of:

To me this looks like a feature that a data exploration tool may
implement instead of being part of the server. It would then provide
more statistics about each correlation/column set etc.

-- 
Best Wishes,
Ashutosh Bapat






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

end of thread, other threads:[~2024-07-24 14:12 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 21:33 Re: [PATCH] GROUP BY ALL David G. Johnston <[email protected]>
2024-07-22 21:40 ` Isaac Morland <[email protected]>
2024-07-22 22:43   ` Tom Lane <[email protected]>
2024-07-23 17:02     ` Paul Jungwirth <[email protected]>
2024-07-23 13:22 ` David Christensen <[email protected]>
2024-07-24 14:12   ` Ashutosh Bapat <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox