public inbox for [email protected]  
help / color / mirror / Atom feed
Loading the latest N rows into the cache seems way too fast.
6+ messages / 3 participants
[nested] [flat]

* Loading the latest N rows into the cache seems way too fast.
@ 2025-02-17 20:32 Ron Johnson <[email protected]>
  2025-02-17 21:36 ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Ron Johnson @ 2025-02-17 20:32 UTC (permalink / raw)
  To: pgsql-general

PG 9.6.24 and PG 14.15, if it matters.
(Yes, 9.6 is really EOL.  I don't control that.)

(I could use pg_prewarm, but the table is much bigger than RAM, and
last_block value only has the newest record if data has never been
deleted.  The oldest records regularly get deleted, and then the table is
vacuumed; thus, new records can be anywhere in the table.)

Thus, roll my own cache-loading statement.

The bigint "id" column in "mytbl" is populated from a sequence, and so is
monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
to take much longer than 1/2 second.  Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

$ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
DESC LIMIT 200000 ; END \$\$;"
DO

real    0m0.457s
user    0m0.005s
sys     0m0.004s

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Loading the latest N rows into the cache seems way too fast.
  2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
@ 2025-02-17 21:36 ` Tom Lane <[email protected]>
  2025-02-17 21:40   ` Re: Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tom Lane @ 2025-02-17 21:36 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

Ron Johnson <[email protected]> writes:
> The bigint "id" column in "mytbl" is populated from a sequence, and so is
> monotonically increasing: the newest records will have the biggest id
> values.
> The table also has a bytea column that averages about 100KB.

> Loading 200K rows is more than 200MB.  I expected this "prewarm" statement
> to take much longer than 1/2 second.  Am I still in the dark ages of
> computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Loading the latest N rows into the cache seems way too fast.
  2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  2025-02-17 21:36 ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
@ 2025-02-17 21:40   ` Ron Johnson <[email protected]>
  2025-02-17 21:51     ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
  2025-02-17 21:58     ` Re: Loading the latest N rows into the cache seems way too fast. David G. Johnston <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: Ron Johnson @ 2025-02-17 21:40 UTC (permalink / raw)
  To: pgsql-general

On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <[email protected]> wrote:

> Ron Johnson <[email protected]> writes:
> > The bigint "id" column in "mytbl" is populated from a sequence, and so is
> > monotonically increasing: the newest records will have the biggest id
> > values.
> > The table also has a bytea column that averages about 100KB.
>
> > Loading 200K rows is more than 200MB.  I expected this "prewarm"
> statement
> > to take much longer than 1/2 second.  Am I still in the dark ages of
> > computer speed, or is this statement not doing what I hope it's doing?
>
> It's not pulling in the TOAST storage where the bytea column lives.
> (pg_prewarm wouldn't have either, without special pushups.)
>

Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
bytea column's name?

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Loading the latest N rows into the cache seems way too fast.
  2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  2025-02-17 21:36 ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
  2025-02-17 21:40   ` Re: Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
@ 2025-02-17 21:51     ` Tom Lane <[email protected]>
  2025-02-17 23:03       ` Re: Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Tom Lane @ 2025-02-17 21:51 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

Ron Johnson <[email protected]> writes:
> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <[email protected]> wrote:
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)

> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such.  (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Loading the latest N rows into the cache seems way too fast.
  2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  2025-02-17 21:36 ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
  2025-02-17 21:40   ` Re: Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  2025-02-17 21:51     ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
@ 2025-02-17 23:03       ` Ron Johnson <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Ron Johnson @ 2025-02-17 23:03 UTC (permalink / raw)
  To: pgsql-general

On Mon, Feb 17, 2025 at 4:51 PM Tom Lane <[email protected]> wrote:

> Ron Johnson <[email protected]> writes:
> > On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <[email protected]> wrote:
> >> It's not pulling in the TOAST storage where the bytea column lives.
> >> (pg_prewarm wouldn't have either, without special pushups.)
>
> > Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> > bytea column's name?
>
> You'd have to do something that actually used the column's value,
> perhaps "md5(byteacol)" or such.  (The obvious candidate would be
> length(), but I think that is optimized to not fetch or decompress
> the whole value.)
>

That's definitely taking a LOT longer...

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Loading the latest N rows into the cache seems way too fast.
  2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
  2025-02-17 21:36 ` Re: Loading the latest N rows into the cache seems way too fast. Tom Lane <[email protected]>
  2025-02-17 21:40   ` Re: Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
@ 2025-02-17 21:58     ` David G. Johnston <[email protected]>
  1 sibling, 0 replies; 6+ messages in thread

From: David G. Johnston @ 2025-02-17 21:58 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On Mon, Feb 17, 2025 at 2:41 PM Ron Johnson <[email protected]> wrote:

> On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <[email protected]> wrote:
>
>> Ron Johnson <[email protected]> writes:
>> > The bigint "id" column in "mytbl" is populated from a sequence, and so
>> is
>> > monotonically increasing: the newest records will have the biggest id
>> > values.
>> > The table also has a bytea column that averages about 100KB.
>>
>> > Loading 200K rows is more than 200MB.  I expected this "prewarm"
>> statement
>> > to take much longer than 1/2 second.  Am I still in the dark ages of
>> > computer speed, or is this statement not doing what I hope it's doing?
>>
>> It's not pulling in the TOAST storage where the bytea column lives.
>> (pg_prewarm wouldn't have either, without special pushups.)
>>
>
> Puzzling, since I ran "PERFORM *".  What if I explicitly mentioned the
> bytea column's name?
>
>
It's more about the system optimizing away data retrieval because you've
indicated you don't care about the contents due to using PERFORM.  All it
needs is a pointer to represent the future data, not the data itself.  And
PERFORM will never resolve that pointer by itself - so as Tom said your
query would need to force pointer resolution by computing on the data.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2025-02-17 23:03 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-02-17 20:32 Loading the latest N rows into the cache seems way too fast. Ron Johnson <[email protected]>
2025-02-17 21:36 ` Tom Lane <[email protected]>
2025-02-17 21:40   ` Ron Johnson <[email protected]>
2025-02-17 21:51     ` Tom Lane <[email protected]>
2025-02-17 23:03       ` Ron Johnson <[email protected]>
2025-02-17 21:58     ` David G. Johnston <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox