public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Functionally dependent columns in SELECT DISTINCT
3+ messages / 3 participants
[nested] [flat]

* Re: Functionally dependent columns in SELECT DISTINCT
@ 2024-09-13 06:13  [email protected]
  0 siblings, 1 reply; 3+ messages in thread

From: [email protected] @ 2024-09-13 06:13 UTC (permalink / raw)
  To: [email protected]

Willow Chargin schrieb am 13.09.2024 um 07:20:
> Hello! Postgres lets us omit columns from a GROUP BY clause if they are
> functionally dependent on a grouped key, which is a nice quality-of-life
> feature. I'm wondering if a similar relaxation could be permitted for
> the SELECT DISTINCT list?
>
> I have a query where I want to find the most recent few items from a
> table that match some complex condition, where the condition involves
> joining other tables. Here's an example, with two approaches:


What about using DISTINCT ON () ?
    SELECT DISTINCT ON (items.id) items.*
    FROM items
      JOIN parts ON items.id = parts.item_id
    WHERE part_id % 3 = 0
    ORDER BY items.id,items.create_time DESC
    LIMIT 5;

This gives me this plan: https://explain.depesz.com/s/QHr6 on 16.2  (Windows, i7-1260P)











^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Functionally dependent columns in SELECT DISTINCT
@ 2024-09-13 15:26  Willow Chargin <[email protected]>
  parent: [email protected]
  0 siblings, 1 reply; 3+ messages in thread

From: Willow Chargin @ 2024-09-13 15:26 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

On Thu, Sep 12, 2024 at 11:13 PM <[email protected]> wrote:
>
> What about using DISTINCT ON () ?
>     SELECT DISTINCT ON (items.id) items.*
>     FROM items
>       JOIN parts ON items.id = parts.item_id
>     WHERE part_id % 3 = 0
>     ORDER BY items.id,items.create_time DESC
>     LIMIT 5;
>
> This gives me this plan: https://explain.depesz.com/s/QHr6 on 16.2  (Windows, i7-1260P)

Ordering by items.id changes the answer, though. In the example I gave,
items.id and items.create_time happened to be in the same order, but
that needn't hold. In reality I really do want the ID columns of the
*most recent* items.

You can see the difference if you build the test dataset a bit
differently:

    INSERT INTO items(id, create_time)
        SELECT i, now() - make_interval(secs => random() * 1e6)
        FROM generate_series(1, 1000000) s(i);

We want the returned create_times to be all recent, and the IDs now
should look roughly random.






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Functionally dependent columns in SELECT DISTINCT
@ 2024-09-13 15:43  David G. Johnston <[email protected]>
  parent: Willow Chargin <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: David G. Johnston @ 2024-09-13 15:43 UTC (permalink / raw)
  To: Willow Chargin <[email protected]>; +Cc: [email protected] <[email protected]>; [email protected] <[email protected]>

On Friday, September 13, 2024, Willow Chargin <[email protected]>
wrote:

> In reality I really do want the ID columns of the
> *most recent* items.
>

Use a window function to rank them and pull out rank=1, or use a lateral
subquery to surgically (fetch first 1) retrieve the first row when sorted
by recency descending.

David J.


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-09-13 15:43 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-13 06:13 Re: Functionally dependent columns in SELECT DISTINCT [email protected]
2024-09-13 15:26 ` Willow Chargin <[email protected]>
2024-09-13 15:43   ` David G. Johnston <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox