public inbox for [email protected]  
help / color / mirror / Atom feed
Re: CLUSTER vs. VACUUM FULL
20+ messages / 6 participants
[nested] [flat]

* Re: CLUSTER vs. VACUUM FULL
@ 2024-04-21 22:45 Tom Lane <[email protected]>
  2024-04-22 00:06 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Tom Lane @ 2024-04-21 22:45 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

Ron Johnson <[email protected]> writes:
> Why is VACUUM FULL recommended for compressing a table, when CLUSTER does
> the same thing (similarly doubling disk space), and apparently runs just as
> fast?

CLUSTER makes the additional effort to sort the data per the ordering
of the specified index.  I'm surprised that's not noticeable in your
test case.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
@ 2024-04-22 00:06 ` Ron Johnson <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 00:06 UTC (permalink / raw)
  To: pgsql-general

On Sun, Apr 21, 2024 at 6:45 PM Tom Lane <[email protected]> wrote:

> Ron Johnson <[email protected]> writes:
> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER does
> > the same thing (similarly doubling disk space), and apparently runs just
> as
> > fast?
>
> CLUSTER makes the additional effort to sort the data per the ordering
> of the specified index.  I'm surprised that's not noticeable in your
> test case.
>

It's in a freshly restored database.  However, regular deletions of old
records, and normal vacuums would have led me to expect that the "fresh"
public.log would have been in relatively random order.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
@ 2024-04-22 00:15 ` Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 00:15 UTC (permalink / raw)
  To: pgsql-general

On Sun, Apr 21, 2024 at 6:45 PM Tom Lane <[email protected]> wrote:

> Ron Johnson <[email protected]> writes:
> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER does
> > the same thing (similarly doubling disk space), and apparently runs just
> as
> > fast?
>
> CLUSTER makes the additional effort to sort the data per the ordering
> of the specified index.  I'm surprised that's not noticeable in your
> test case.
>

Clustering on a completely different index  was also 44 seconds.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 01:34   ` David Rowley <[email protected]>
  2024-04-22 02:50     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: David Rowley @ 2024-04-22 01:34 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On Mon, 22 Apr 2024 at 12:16, Ron Johnson <[email protected]> wrote:
>
> On Sun, Apr 21, 2024 at 6:45 PM Tom Lane <[email protected]> wrote:
>>
>> Ron Johnson <[email protected]> writes:
>> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER does
>> > the same thing (similarly doubling disk space), and apparently runs just as
>> > fast?
>>
>> CLUSTER makes the additional effort to sort the data per the ordering
>> of the specified index.  I'm surprised that's not noticeable in your
>> test case.
>
> Clustering on a completely different index  was also 44 seconds.

Both VACUUM FULL and CLUSTER go through a very similar code path. Both
use cluster_rel().  VACUUM FULL just won't make use of an existing
index to provide presorted input or perform a sort, whereas CLUSTER
will attempt to choose the cheapest out of these two to get sorted
results.

If the timing for each is similar, it just means that using an index
scan or sorting isn't very expensive compared to the other work that's
being done.  Both CLUSTER and VACUUM FULL require reading every heap
page and writing out new pages into a new heap and maintaining  all
indexes on the new heap. That's quite an effort.

To satisfy your curiosity, you could always run some EXPLAIN ANALYZE
SELECT queries to measure how much time was spent sorting the entire
table. You'd have to set work_mem to the value of
maintenance_work_mem.

David






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
@ 2024-04-22 02:50     ` Ron Johnson <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 02:50 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: pgsql-general

On Sun, Apr 21, 2024 at 9:35 PM David Rowley <[email protected]> wrote:

> On Mon, 22 Apr 2024 at 12:16, Ron Johnson <[email protected]> wrote:
> >
> > On Sun, Apr 21, 2024 at 6:45 PM Tom Lane <[email protected]> wrote:
> >>
> >> Ron Johnson <[email protected]> writes:
> >> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER
> does
> >> > the same thing (similarly doubling disk space), and apparently runs
> just as
> >> > fast?
> >>
> >> CLUSTER makes the additional effort to sort the data per the ordering
> >> of the specified index.  I'm surprised that's not noticeable in your
> >> test case.
> >
> > Clustering on a completely different index  was also 44 seconds.
>
> Both VACUUM FULL and CLUSTER go through a very similar code path. Both
> use cluster_rel().  VACUUM FULL just won't make use of an existing
> index to provide presorted input or perform a sort, whereas CLUSTER
> will attempt to choose the cheapest out of these two to get sorted
> results.
>
> If the timing for each is similar, it just means that using an index
> scan or sorting isn't very expensive compared to the other work that's
> being done.  Both CLUSTER and VACUUM FULL require reading every heap
> page and writing out new pages into a new heap and maintaining  all
> indexes on the new heap. That's quite an effort.
>

My original CLUSTER command didn't have to change the order of the data
very much, thus, the sort didn't have to do much work.

CLUSTER on a different index was indeed much slower than VACUUM FULL.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
@ 2024-04-22 11:42     ` Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: Marcos Pegoraro @ 2024-04-22 11:42 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Ron Johnson <[email protected]>; pgsql-general

Em dom., 21 de abr. de 2024 às 22:35, David Rowley <[email protected]>
escreveu:

>
> Both VACUUM FULL and CLUSTER go through a very similar code path. Both
> use cluster_rel().  VACUUM FULL just won't make use of an existing
> index to provide presorted input or perform a sort, whereas CLUSTER
> will attempt to choose the cheapest out of these two to get sorted
> results.


But wouldn't it be good that VACUUM FULL uses that index defined by
Cluster, if it exists ? Maybe an additional option for VACUUM FULL ?
Because if I periodically reorganize all tables I have to run CLUSTER once,
which will run on clustered tables, and VACUUM FULL on every table that is
not clustered, because if I run VACUUM FULL for entire database it'll just
ignore cluster index defined for every table. So, INDISCLUSTERED is used
when running CLUSTER but is ignored when running VACUUM FULL.

regards
Marcos


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
@ 2024-04-22 14:25       ` Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 17:50         ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  0 siblings, 2 replies; 20+ messages in thread

From: Tom Lane @ 2024-04-22 14:25 UTC (permalink / raw)
  To: Marcos Pegoraro <[email protected]>; +Cc: David Rowley <[email protected]>; Ron Johnson <[email protected]>; pgsql-general

Marcos Pegoraro <[email protected]> writes:
> But wouldn't it be good that VACUUM FULL uses that index defined by
> Cluster, if it exists ?

No ... what would be the difference then?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
@ 2024-04-22 15:37         ` Ron Johnson <[email protected]>
  2024-04-22 15:51           ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  1 sibling, 2 replies; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 15:37 UTC (permalink / raw)
  To: pgsql-general

On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]> wrote:

> Marcos Pegoraro <[email protected]> writes:
> > But wouldn't it be good that VACUUM FULL uses that index defined by
> > Cluster, if it exists ?
>
> No ... what would be the difference then?
>

What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the PK,
if the PK is a sequence (whether that be an actual sequence, or a timestamp
or something else that grows monotonically).

That's because the data is already roughly in PK order.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 15:51           ` Adrian Klaver <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Adrian Klaver @ 2024-04-22 15:51 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; pgsql-general

On 4/22/24 08:37, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     Marcos Pegoraro <[email protected] <mailto:[email protected]>> writes:
>      > But wouldn't it be good that VACUUM FULL uses that index defined by
>      > Cluster, if it exists ?
> 
>     No ... what would be the difference then?
> 
> What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the 
> PK, if the PK is a sequence (whether that be an actual sequence, or a 
> timestamp or something else that grows monotonically).

Why?

That would, per David Rowley's comments, impose a sort cost on top of 
the cost of hitting every heap page and rewriting it. You end up with 
sorted table granted, until such time as you start making changes to it. 
If you are to the point of running VACUUM FULL that indicates to me the 
table has seen a heavy load of changes that you want to clean out. Given 
the temporary nature of the effects of a  CLUSTER under a change load I 
don't see why it would be the way to go to clean up a changing table.

> 
> That's because the data is already roughly in PK order.
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 16:29           ` David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  1 sibling, 1 reply; 20+ messages in thread

From: David G. Johnston @ 2024-04-22 16:29 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On Mon, Apr 22, 2024, 08:37 Ron Johnson <[email protected]> wrote:

> On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]> wrote:
>
>> Marcos Pegoraro <[email protected]> writes:
>> > But wouldn't it be good that VACUUM FULL uses that index defined by
>> > Cluster, if it exists ?
>>
>> No ... what would be the difference then?
>>
>
> What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the PK,
> if the PK is a sequence (whether that be an actual sequence, or a timestamp
> or something else that grows monotonically).
>
> That's because the data is already roughly in PK order.
>

If things are bad enough to require a vacuum full that doesn't seem like a
good assumption.  Any insert-only table or one with a reduced fill-factor
maybe.

David J


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
@ 2024-04-22 18:45             ` Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 18:45 UTC (permalink / raw)
  To: pgsql-general

On Mon, Apr 22, 2024 at 12:29 PM David G. Johnston <
[email protected]> wrote:

>
>
> On Mon, Apr 22, 2024, 08:37 Ron Johnson <[email protected]> wrote:
>
>> On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]> wrote:
>>
>>> Marcos Pegoraro <[email protected]> writes:
>>> > But wouldn't it be good that VACUUM FULL uses that index defined by
>>> > Cluster, if it exists ?
>>>
>>> No ... what would be the difference then?
>>>
>>
>> What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the PK,
>> if the PK is a sequence (whether that be an actual sequence, or a timestamp
>> or something else that grows monotonically).
>>
>> That's because the data is already roughly in PK order.
>>
>
> If things are bad enough to require a vacuum full that doesn't seem like a
> good assumption.
>

Sure it does.

For example, I just deleted the oldest half of the records in 30 tables.
Tables who's CREATED_ON timestamp value strongly correlates to the
synthetic PK sequence values.

Thus, the remaining records were still mostly in PK order.  CLUSTERs on the
PK values would have taken just about as much time as the VACUUM FULL
statements which I *did* run.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 19:14               ` Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Adrian Klaver @ 2024-04-22 19:14 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; pgsql-general



On 4/22/24 11:45 AM, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 12:29 PM David G. Johnston 
> <[email protected] <mailto:[email protected]>> wrote:
> 
> 
> 
>     On Mon, Apr 22, 2024, 08:37 Ron Johnson <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]
>         <mailto:[email protected]>> wrote:
> 
>             Marcos Pegoraro <[email protected]
>             <mailto:[email protected]>> writes:
>              > But wouldn't it be good that VACUUM FULL uses that index
>             defined by
>              > Cluster, if it exists ?
> 
>             No ... what would be the difference then?
> 
>         What the VACUUM docs "should" do, it seems, is suggest CLUSTER
>         on the PK, if the PK is a sequence (whether that be an actual
>         sequence, or a timestamp or something else that grows
>         monotonically).
> 
>         That's because the data is already roughly in PK order.
> 
> 
>     If things are bad enough to require a vacuum full that doesn't seem
>     like a good assumption.
> 
> 
> Sure it does.
> 
> For example, I just deleted the oldest half of the records in 30 
> tables.  Tables who's CREATED_ON timestamp value strongly correlates to 
> the synthetic PK sequence values.
> 
> Thus, the remaining records were still mostly in PK order.  CLUSTERs on 
> the PK values would have taken just about as much time as the VACUUM 
> FULL statements which I /did/ run.

1) If they are already in enough of a PK order that the CLUSTER time vs 
VACUUM FULL time would not be material as there is not much or any 
sorting to do then what does the CLUSTER gain you? Unless this table 
then became read only whatever small gain arose from the CLUSTER would 
fade away as UPDATEs and DELETEs where done.

2) What evidence is there that the records where still in PK order just 
because you deleted based on CREATED_ON? I understand the correlation 
between CREATED_ON and the PK just not sure why that would necessarily 
translate to an on disk order by PK?

-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
@ 2024-04-22 19:51                 ` Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 19:51 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

On Mon, Apr 22, 2024 at 3:14 PM Adrian Klaver <[email protected]>
wrote:

>
>
> On 4/22/24 11:45 AM, Ron Johnson wrote:
> > On Mon, Apr 22, 2024 at 12:29 PM David G. Johnston
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> >
> >
> >     On Mon, Apr 22, 2024, 08:37 Ron Johnson <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]
> >         <mailto:[email protected]>> wrote:
> >
> >             Marcos Pegoraro <[email protected]
> >             <mailto:[email protected]>> writes:
> >              > But wouldn't it be good that VACUUM FULL uses that index
> >             defined by
> >              > Cluster, if it exists ?
> >
> >             No ... what would be the difference then?
> >
> >         What the VACUUM docs "should" do, it seems, is suggest CLUSTER
> >         on the PK, if the PK is a sequence (whether that be an actual
> >         sequence, or a timestamp or something else that grows
> >         monotonically).
> >
> >         That's because the data is already roughly in PK order.
> >
> >
> >     If things are bad enough to require a vacuum full that doesn't seem
> >     like a good assumption.
> >
> >
> > Sure it does.
> >
> > For example, I just deleted the oldest half of the records in 30
> > tables.  Tables who's CREATED_ON timestamp value strongly correlates to
> > the synthetic PK sequence values.
> >
> > Thus, the remaining records were still mostly in PK order.  CLUSTERs on
> > the PK values would have taken just about as much time as the VACUUM
> > FULL statements which I /did/ run.
>
> 1) If they are already in enough of a PK order that the CLUSTER time vs
> VACUUM FULL time would not be material as there is not much or any
> sorting to do then what does the CLUSTER gain you?


Not much.  Now they're just "slightly more ordered" instead of "slightly
less ordered" for little if any extra effort.


> 2) What evidence is there that the records where still in PK order just
> because you deleted based on CREATED_ON? I understand the correlation
> between CREATED_ON and the PK just not sure why that would necessarily
> translate to an on disk order by PK?
>

1. Records are appended to tables in INSERT order, and INSERT order is
highly correlated to synthetic PK, by the nature of sequences.
2. My original email showed that CLUSTER took just as long as VACUUM FULL.
That means not many records had to be sorted, because... the on-disk order
was strongly correlated to PK and CREATED_ON.

Will that happen *every time* in *every circumstance* in *every database*?
No, and I never said it would.  But it does in *my *database in *this *
application.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 20:21                   ` Adrian Klaver <[email protected]>
  2024-04-22 20:59                     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Adrian Klaver @ 2024-04-22 20:21 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On 4/22/24 12:51, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 3:14 PM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
> 

> 
>     1) If they are already in enough of a PK order that the CLUSTER time vs
>     VACUUM FULL time would not be material as there is not much or any
>     sorting to do then what does the CLUSTER gain you? 
> 
> 
> Not much.  Now they're just "slightly more ordered" instead of "slightly 
> less ordered" for little if any extra effort.
> 
>     2) What evidence is there that the records where still in PK order just
>     because you deleted based on CREATED_ON? I understand the correlation
>     between CREATED_ON and the PK just not sure why that would necessarily
>     translate to an on disk order by PK?
> 
> 
> 1. Records are appended to tables in INSERT order, and INSERT order is 
> highly correlated to synthetic PK, by the nature of sequences.

Not something I would count on, see:

https://www.postgresql.org/docs/current/sql-createsequence.html

Notes

for how that may not always be the case.

Also any UPDATE or DELETE is going to change that. There is no guarantee 
of order for the data in the table. If there where you would not need to 
run CLUSTER.

> 2. My original email showed that CLUSTER took just as long as VACUUM 
> FULL.  That means not many records had to be sorted, because... the 
> on-disk order was strongly correlated to PK and CREATED_ON. >
> Will that happen *every time* in *every circumstance* in *every 
> database*?  No, and I never said it would.  But it does in *my *database 
> in *this *application.
> 

Which gets us back to your comment upstream:

"What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the 
PK, if the PK is a sequence (whether that be an actual sequence, or a 
timestamp or something else that grows monotonically)."

This is a case specific to you and this particular circumstance, not a 
general rule for VACUUM FULL. If for no other reason then it might make 
more sense for the application that the CLUSTER be done on some other 
index then the PK.



-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
@ 2024-04-22 20:59                     ` Ron Johnson <[email protected]>
  2024-04-22 21:03                       ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 20:59 UTC (permalink / raw)
  To: pgsql-general

On Mon, Apr 22, 2024 at 4:21 PM Adrian Klaver <[email protected]>
wrote:
[snip]

> Which gets us back to your comment upstream:
>
> "What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the
> PK, if the PK is a sequence (whether that be an actual sequence, or a
> timestamp or something else that grows monotonically)."
>
> This is a case specific to you and this particular circumstance, not a
> general rule for VACUUM FULL. If for no other reason then it might make
> more sense for the application that the CLUSTER be done on some other
> index then the PK.
>

On Stack Exchange, I've got a question on how to determine when to run
CLUSTER.  It ties in strongly with this thread..


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 20:59                     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 21:03                       ` Adrian Klaver <[email protected]>
  2024-04-22 21:35                         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Adrian Klaver @ 2024-04-22 21:03 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; pgsql-general

On 4/22/24 13:59, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 4:21 PM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> [snip]
> 
>     Which gets us back to your comment upstream:
> 
>     "What the VACUUM docs "should" do, it seems, is suggest CLUSTER on the
>     PK, if the PK is a sequence (whether that be an actual sequence, or a
>     timestamp or something else that grows monotonically)."
> 
>     This is a case specific to you and this particular circumstance, not a
>     general rule for VACUUM FULL. If for no other reason then it might make
>     more sense for the application that the CLUSTER be done on some other
>     index then the PK.
> 
> 
> On Stack Exchange, I've got a question on how to determine when to run 
> CLUSTER.  It ties in strongly with this thread..
> 

And the link is?

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 20:59                     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 21:03                       ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
@ 2024-04-22 21:35                         ` Ron Johnson <[email protected]>
  2024-04-22 21:56                           ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 21:35 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general

On Mon, Apr 22, 2024 at 5:03 PM Adrian Klaver <[email protected]>
wrote:

> On 4/22/24 13:59, Ron Johnson wrote:
> > On Mon, Apr 22, 2024 at 4:21 PM Adrian Klaver <[email protected]
> > <mailto:[email protected]>> wrote:
> > [snip]
> >
> >     Which gets us back to your comment upstream:
> >
> >     "What the VACUUM docs "should" do, it seems, is suggest CLUSTER on
> the
> >     PK, if the PK is a sequence (whether that be an actual sequence, or a
> >     timestamp or something else that grows monotonically)."
> >
> >     This is a case specific to you and this particular circumstance, not
> a
> >     general rule for VACUUM FULL. If for no other reason then it might
> make
> >     more sense for the application that the CLUSTER be done on some other
> >     index then the PK.
> >
> >
> > On Stack Exchange, I've got a question on how to determine when to run
> > CLUSTER.  It ties in strongly with this thread..
> >
>
> And the link is?
>

Sorry.  Got distracted by the answer.

https://dba.stackexchange.com/questions/338870/when-to-rerun-cluster


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 20:59                     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 21:03                       ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 21:35                         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
@ 2024-04-22 21:56                           ` Adrian Klaver <[email protected]>
  2024-04-22 22:06                             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  0 siblings, 1 reply; 20+ messages in thread

From: Adrian Klaver @ 2024-04-22 21:56 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-general

On 4/22/24 14:35, Ron Johnson wrote:

>      >
>      > On Stack Exchange, I've got a question on how to determine when
>     to run
>      > CLUSTER.  It ties in strongly with this thread..
>      >
> 
>     And the link is?

It should have been the initial question of this thread and it explains 
what you are really after.

> 
> 
> Sorry.  Got distracted by the answer.
> 
> https://dba.stackexchange.com/questions/338870/when-to-rerun-cluster 
> <https://dba.stackexchange.com/questions/338870/when-to-rerun-cluster;



-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 15:37         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 16:29           ` Re: CLUSTER vs. VACUUM FULL David G. Johnston <[email protected]>
  2024-04-22 18:45             ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 19:14               ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 19:51                 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 20:21                   ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 20:59                     ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 21:03                       ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
  2024-04-22 21:35                         ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 21:56                           ` Re: CLUSTER vs. VACUUM FULL Adrian Klaver <[email protected]>
@ 2024-04-22 22:06                             ` Ron Johnson <[email protected]>
  0 siblings, 0 replies; 20+ messages in thread

From: Ron Johnson @ 2024-04-22 22:06 UTC (permalink / raw)
  To: pgsql-general

On Mon, Apr 22, 2024 at 5:56 PM Adrian Klaver <[email protected]>
wrote:

> On 4/22/24 14:35, Ron Johnson wrote:
>
> >      >
> >      > On Stack Exchange, I've got a question on how to determine when
> >     to run
> >      > CLUSTER.  It ties in strongly with this thread..
> >      >
> >
> >     And the link is?
>
> It should have been the initial question of this thread and it explains
> what you are really after.
>

It was already a long email.


^ permalink  raw  reply  [nested|flat] 20+ messages in thread

* Re: CLUSTER vs. VACUUM FULL
  2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
  2024-04-22 00:15 ` Re: CLUSTER vs. VACUUM FULL Ron Johnson <[email protected]>
  2024-04-22 01:34   ` Re: CLUSTER vs. VACUUM FULL David Rowley <[email protected]>
  2024-04-22 11:42     ` Re: CLUSTER vs. VACUUM FULL Marcos Pegoraro <[email protected]>
  2024-04-22 14:25       ` Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
@ 2024-04-22 17:50         ` Marcos Pegoraro <[email protected]>
  1 sibling, 0 replies; 20+ messages in thread

From: Marcos Pegoraro @ 2024-04-22 17:50 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: David Rowley <[email protected]>; Ron Johnson <[email protected]>; pgsql-general

Em seg., 22 de abr. de 2024 às 11:25, Tom Lane <[email protected]> escreveu:

> No ... what would be the difference then


Well, I think if a cluster index was defined sometime in a table, it should
be respected for next commands, including VACUUM FULL.
If I want to come back to PK or any other index I would use CLUSTER ...
USING PK_INDEX.

regards
Marcos


^ permalink  raw  reply  [nested|flat] 20+ messages in thread


end of thread, other threads:[~2024-04-22 22:06 UTC | newest]

Thread overview: 20+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-04-21 22:45 Re: CLUSTER vs. VACUUM FULL Tom Lane <[email protected]>
2024-04-22 00:06 ` Ron Johnson <[email protected]>
2024-04-22 00:15 ` Ron Johnson <[email protected]>
2024-04-22 01:34   ` David Rowley <[email protected]>
2024-04-22 02:50     ` Ron Johnson <[email protected]>
2024-04-22 11:42     ` Marcos Pegoraro <[email protected]>
2024-04-22 14:25       ` Tom Lane <[email protected]>
2024-04-22 15:37         ` Ron Johnson <[email protected]>
2024-04-22 15:51           ` Adrian Klaver <[email protected]>
2024-04-22 16:29           ` David G. Johnston <[email protected]>
2024-04-22 18:45             ` Ron Johnson <[email protected]>
2024-04-22 19:14               ` Adrian Klaver <[email protected]>
2024-04-22 19:51                 ` Ron Johnson <[email protected]>
2024-04-22 20:21                   ` Adrian Klaver <[email protected]>
2024-04-22 20:59                     ` Ron Johnson <[email protected]>
2024-04-22 21:03                       ` Adrian Klaver <[email protected]>
2024-04-22 21:35                         ` Ron Johnson <[email protected]>
2024-04-22 21:56                           ` Adrian Klaver <[email protected]>
2024-04-22 22:06                             ` Ron Johnson <[email protected]>
2024-04-22 17:50         ` Marcos Pegoraro <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox