Why analyze reports 30000 pages and rows scanned. Why not just rows?

public inbox for [email protected]  
help / color / mirror / Atom feed

Why analyze reports 30000 pages and rows scanned. Why not just rows?
2+ messages / 2 participants
[nested] [flat]

* Why analyze reports 30000 pages and rows scanned. Why not just rows?
@ 2025-08-19 10:17 David Mullineux <[email protected]>
  2025-08-19 14:40 ` Re: Why analyze reports 30000 pages and rows scanned. Why not just rows? Tom Lane <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: David Mullineux @ 2025-08-19 10:17 UTC (permalink / raw)
  To: pgsql-general

According to docs, analyze ,by default, will try to sample 30000 rows from
a table.
(I've read analyze.c note about Haas and Stokes IBM Research ).

But my question is, why does 'analyze verbose' report that it has scanned
'30000 of NNNN pages, containing NNNN live rows and 0 dead rows; 30000 rows
in sample,....'

As most tables would store more than 1 row per page, I expected that 30000
rows would require a lot fewer than 30000 *pages* to be scanned.  Why is it
saying it's scanned 30000 pages instead of only 30000 rows ?

Confused. thanks.

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Why analyze reports 30000 pages and rows scanned. Why not just rows?
  2025-08-19 10:17 Why analyze reports 30000 pages and rows scanned. Why not just rows? David Mullineux <[email protected]>
@ 2025-08-19 14:40 ` Tom Lane <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Tom Lane @ 2025-08-19 14:40 UTC (permalink / raw)
  To: David Mullineux <[email protected]>; +Cc: pgsql-general

David Mullineux <[email protected]> writes:
> But my question is, why does 'analyze verbose' report that it has scanned
> '30000 of NNNN pages, containing NNNN live rows and 0 dead rows; 30000 rows
> in sample,....'

> As most tables would store more than 1 row per page, I expected that 30000
> rows would require a lot fewer than 30000 *pages* to be scanned.  Why is it
> saying it's scanned 30000 pages instead of only 30000 rows ?

If the table is sufficiently large, taking a sample of a single row
from each of 30000 different pages is the correct behavior.  Taking
more than one row from each of a smaller set of pages would give a
nonrandom (because clumped) sample.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2025-08-19 14:40 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-19 10:17 Why analyze reports 30000 pages and rows scanned. Why not just rows? David Mullineux <[email protected]>
2025-08-19 14:40 ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox