public inbox for [email protected]  
help / color / mirror / Atom feed
Re: nth_value out of more than n values returns null
6+ messages / 4 participants
[nested] [flat]

* Re: nth_value out of more than n values returns null
@ 2024-11-04 23:36 Erik Wienhold <[email protected]>
  2024-11-04 23:38 ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Erik Wienhold @ 2024-11-04 23:36 UTC (permalink / raw)
  To: Guyren Howe <[email protected]>; +Cc: pgsql-general

On 2024-11-05 00:17 +0100, Guyren Howe wrote:
> This query:
> 
> SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table
> 
> in a table where SELECT COUNT(*) returns a value a few thousand over 5
> million, where id is the primary key, returns null.
> 
> The inclusion of the primary key should make the order by a total
> order. So there should be a 5 millionth row.
> 
> How can this happen?

https://www.postgresql.org/docs/current/functions-window.html explains
it:

    Note that first_value, last_value, and nth_value consider only the rows
    within the "window frame", which by default contains the rows from the
    start of the partition through the last peer of the current row. This is
    likely to give unhelpful results for last_value and sometimes also
    nth_value. You can redefine the frame by adding a suitable frame
    specification (RANGE, ROWS or GROUPS) to the OVER clause. See Section
    4.2.8 for more information about frame specifications.

You probably want to extend the window frame with this:

    SELECT NTH_VALUE(id, 5000000) OVER (
        ORDER BY created_at, id ASC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) FROM table

-- 
Erik






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: nth_value out of more than n values returns null
  2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
@ 2024-11-04 23:38 ` Guyren Howe <[email protected]>
  2024-11-04 23:44   ` Re: nth_value out of more than n values returns null David G. Johnston <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Guyren Howe @ 2024-11-04 23:38 UTC (permalink / raw)
  To: Erik Wienhold <[email protected]>; +Cc: pgsql-general

I’m trying to get the id of the 5,000,000th record, so I can join against it to get a name. I didn’t fully understand what the docs say there. What am I missing?
On 4 Nov 2024 at 15:36 -0800, Erik Wienhold <[email protected]>, wrote:
> On 2024-11-05 00:17 +0100, Guyren Howe wrote:
> > This query:
> >
> > SELECT NTH_VALUE(id, 5000000) OVER (ORDER BY created_at, id ASC) FROM table
> >
> > in a table where SELECT COUNT(*) returns a value a few thousand over 5
> > million, where id is the primary key, returns null.
> >
> > The inclusion of the primary key should make the order by a total
> > order. So there should be a 5 millionth row.
> >
> > How can this happen?
>
> https://www.postgresql.org/docs/current/functions-window.html explains
> it:
>
> Note that first_value, last_value, and nth_value consider only the rows
> within the "window frame", which by default contains the rows from the
> start of the partition through the last peer of the current row. This is
> likely to give unhelpful results for last_value and sometimes also
> nth_value. You can redefine the frame by adding a suitable frame
> specification (RANGE, ROWS or GROUPS) to the OVER clause. See Section
> 4.2.8 for more information about frame specifications.
>
> You probably want to extend the window frame with this:
>
> SELECT NTH_VALUE(id, 5000000) OVER (
> ORDER BY created_at, id ASC
> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> ) FROM table
>
> --
> Erik


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: nth_value out of more than n values returns null
  2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
  2024-11-04 23:38 ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
@ 2024-11-04 23:44   ` David G. Johnston <[email protected]>
  2024-11-04 23:46     ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
  2024-11-05 00:49     ` Re: nth_value out of more than n values returns null Tom Lane <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: David G. Johnston @ 2024-11-04 23:44 UTC (permalink / raw)
  To: Guyren Howe <[email protected]>; +Cc: Erik Wienhold <[email protected]>; pgsql-general

On Monday, November 4, 2024, Guyren Howe <[email protected]> wrote:

> I’m trying to get the id of the 5,000,000th record, so I can join against
> it to get a name. I didn’t fully understand what the docs say there. What
> am I missing?
>

So just use “offset 5_000_000 limit 1”.  Bringing in a window function here
seems unhelpful.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: nth_value out of more than n values returns null
  2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
  2024-11-04 23:38 ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
  2024-11-04 23:44   ` Re: nth_value out of more than n values returns null David G. Johnston <[email protected]>
@ 2024-11-04 23:46     ` Guyren Howe <[email protected]>
  2024-11-04 23:53       ` Re: nth_value out of more than n values returns null David G. Johnston <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Guyren Howe @ 2024-11-04 23:46 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Erik Wienhold <[email protected]>; pgsql-general

Wouldn’t it be offset 4_999_999?

I’d still like to understand why nth_value doesn’t work.
On 4 Nov 2024 at 15:44 -0800, David G. Johnston <[email protected]>, wrote:
> On Monday, November 4, 2024, Guyren Howe <[email protected]> wrote:
> > I’m trying to get the id of the 5,000,000th record, so I can join against it to get a name. I didn’t fully understand what the docs say there. What am I missing?
>
> So just use “offset 5_000_000 limit 1”.  Bringing in a window function here seems unhelpful.
>
> David J.
>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: nth_value out of more than n values returns null
  2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
  2024-11-04 23:38 ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
  2024-11-04 23:44   ` Re: nth_value out of more than n values returns null David G. Johnston <[email protected]>
  2024-11-04 23:46     ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
@ 2024-11-04 23:53       ` David G. Johnston <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: David G. Johnston @ 2024-11-04 23:53 UTC (permalink / raw)
  To: Guyren Howe <[email protected]>; +Cc: Erik Wienhold <[email protected]>; pgsql-general

On Monday, November 4, 2024, Guyren Howe <[email protected]> wrote:

> Wouldn’t it be offset 4_999_999?
>

Probably.  I tend to expect off-by-one for these kinds of things and test
my way out.


>
> I’d still like to understand why nth_value doesn’t work.
>

When you perform an order by in a window clause the frame you get by
default ends at the current row.  Consider “count(*) over ()” versus
“count(*) over (order by id)”.

You need to not use defaults for the window frame if this doesn’t suit you.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: nth_value out of more than n values returns null
  2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
  2024-11-04 23:38 ` Re: nth_value out of more than n values returns null Guyren Howe <[email protected]>
  2024-11-04 23:44   ` Re: nth_value out of more than n values returns null David G. Johnston <[email protected]>
@ 2024-11-05 00:49     ` Tom Lane <[email protected]>
  1 sibling, 0 replies; 6+ messages in thread

From: Tom Lane @ 2024-11-05 00:49 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: Guyren Howe <[email protected]>; Erik Wienhold <[email protected]>; pgsql-general

"David G. Johnston" <[email protected]> writes:
> So just use “offset 5_000_000 limit 1”.  Bringing in a window function here
> seems unhelpful.

Yeah, that.  A bite-size example might help clarify what the window
function is doing:

regression=# create table zed(f1 int) ;
CREATE TABLE
regression=# insert into zed select generate_series(1, 10);
INSERT 0 10
regression=# select f1, nth_value(f1, 5) over (order by f1) from zed;
 f1 | nth_value 
----+-----------
  1 |          
  2 |          
  3 |          
  4 |          
  5 |         5
  6 |         5
  7 |         5
  8 |         5
  9 |         5
 10 |         5
(10 rows)

For the first four rows, the window frame doesn't include the row
you want, so you get NULL.  You can fix that with a non-default
window frame:

regression=# select f1, nth_value(f1, 5) over (order by f1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from zed;
 f1 | nth_value 
----+-----------
  1 |         5
  2 |         5
  3 |         5
  4 |         5
  5 |         5
  6 |         5
  7 |         5
  8 |         5
  9 |         5
 10 |         5
(10 rows)

So yeah, you can get the fifth (or five million'th) row this way, but
you'll get N copies of it, which I assume is not what you want.
Better

regression=# select f1 from zed order by f1 offset 4 limit 1;
 f1 
----
  5
(1 row)

which gets you just the one row and is a lot cheaper too.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2024-11-05 00:49 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-11-04 23:36 Re: nth_value out of more than n values returns null Erik Wienhold <[email protected]>
2024-11-04 23:38 ` Guyren Howe <[email protected]>
2024-11-04 23:44   ` David G. Johnston <[email protected]>
2024-11-04 23:46     ` Guyren Howe <[email protected]>
2024-11-04 23:53       ` David G. Johnston <[email protected]>
2024-11-05 00:49     ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox