public inbox for [email protected]  
help / color / mirror / Atom feed
pg_stats.correlation rule of thumb for re-clustering a table?
2+ messages / 2 participants
[nested] [flat]

* pg_stats.correlation rule of thumb for re-clustering a table?
@ 2025-09-12 14:46 Ron Johnson <[email protected]>
  2025-09-12 20:59 ` Re: pg_stats.correlation rule of thumb for re-clustering a table? Laurenz Albe <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Ron Johnson @ 2025-09-12 14:46 UTC (permalink / raw)
  To: pgsql-admin

(By re-cluster, I of course mean pg_repack.)

Purely OLTP tables (that are only accessed randomly) can of course live
with 0% correlation, but lots of tables are mixed-use, and so benefit from
physical ordering on a carefully chosen field..

SELECT abs(correlation)::numeric(3,2) as correlation
FROM pg_stats
WHERE schemaname = 'foo' AND tablename = 'bar'
  AND attname = 'blarge';
 correlation
-------------
        0.84
(1 row)

Obviously 84% is no need to worry, but what about 60% or 40%? Currently, I
use 60%, but would like to do better.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: pg_stats.correlation rule of thumb for re-clustering a table?
  2025-09-12 14:46 pg_stats.correlation rule of thumb for re-clustering a table? Ron Johnson <[email protected]>
@ 2025-09-12 20:59 ` Laurenz Albe <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Laurenz Albe @ 2025-09-12 20:59 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; pgsql-admin

On Fri, 2025-09-12 at 10:46 -0400, Ron Johnson wrote:
> Purely OLTP tables (that are only accessed randomly) can of course live with 0% correlation,
> but lots of tables are mixed-use, and so benefit from physical ordering on a carefully chosen field..
> 
> SELECT abs(correlation)::numeric(3,2) as correlation
> FROM pg_stats 
> WHERE schemaname = 'foo' AND tablename = 'bar'
>   AND attname = 'blarge';
>  correlation
> -------------
>         0.84
> (1 row)
> 
> Obviously 84% is no need to worry, but what about 60% or 40%? Currently, I use 60%, but would like to do better.

Either the difference is gradual, so that it there is no real cut-off point,
or there is a sudden plan change at some point that depends on the query the
data and the parameter settings.  I don't think it is possible to give reliable
numbers that cover all cases.

I suggest that you run a series of benchmarks with a copy of the table with
different correlation values and come up with numbers that are meaningful
for your individual case.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2025-09-12 20:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-12 14:46 pg_stats.correlation rule of thumb for re-clustering a table? Ron Johnson <[email protected]>
2025-09-12 20:59 ` Laurenz Albe <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox