public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: Ron Johnson <[email protected]>
To: pgsql-general <[email protected]>
Subject: Re: CLUSTER vs. VACUUM FULL
Date: Mon, 22 Apr 2024 12:14:38 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANzqJaConAabzE942KSQV2BtZdBaEjvY9Cz_A8xvwrgauUX=zA@mail.gmail.com>
References: <CANzqJaD8o0UuK42F==BXZA=L380n=aXt+VJ93PN3waFbNhZHBg@mail.gmail.com>
	<[email protected]>
	<CANzqJaBvcGKdhqaZz+b0jNc8WLK74UCDW+zk=Ja78duOjWUBRA@mail.gmail.com>
	<CAApHDvraS7EkYZsgZnAWc-L4_jbie5UkPAtedKSqN6s1j+PJiQ@mail.gmail.com>
	<CAB-JLwYPrJpYGqsQFDpiMf6UFKeT1zwagabdb-2V3x0G8DqNPQ@mail.gmail.com>
	<[email protected]>
	<CANzqJaDocQZNhD8CpGS7ae82tVeuL5-reoKy0_Y6=RKuTh6qwQ@mail.gmail.com>
	<CAKFQuwYFHCf+cd=wxJmzRqezQKy8rAKv2_QyvipzE1qboBXXvw@mail.gmail.com>
	<CANzqJaConAabzE942KSQV2BtZdBaEjvY9Cz_A8xvwrgauUX=zA@mail.gmail.com>



On 4/22/24 11:45 AM, Ron Johnson wrote:
> On Mon, Apr 22, 2024 at 12:29 PM David G. Johnston 
> <[email protected] <mailto:[email protected]>> wrote:
> 
> 
> 
>     On Mon, Apr 22, 2024, 08:37 Ron Johnson <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         On Mon, Apr 22, 2024 at 10:25 AM Tom Lane <[email protected]
>         <mailto:[email protected]>> wrote:
> 
>             Marcos Pegoraro <[email protected]
>             <mailto:[email protected]>> writes:
>              > But wouldn't it be good that VACUUM FULL uses that index
>             defined by
>              > Cluster, if it exists ?
> 
>             No ... what would be the difference then?
> 
>         What the VACUUM docs "should" do, it seems, is suggest CLUSTER
>         on the PK, if the PK is a sequence (whether that be an actual
>         sequence, or a timestamp or something else that grows
>         monotonically).
> 
>         That's because the data is already roughly in PK order.
> 
> 
>     If things are bad enough to require a vacuum full that doesn't seem
>     like a good assumption.
> 
> 
> Sure it does.
> 
> For example, I just deleted the oldest half of the records in 30 
> tables.  Tables who's CREATED_ON timestamp value strongly correlates to 
> the synthetic PK sequence values.
> 
> Thus, the remaining records were still mostly in PK order.  CLUSTERs on 
> the PK values would have taken just about as much time as the VACUUM 
> FULL statements which I /did/ run.

1) If they are already in enough of a PK order that the CLUSTER time vs 
VACUUM FULL time would not be material as there is not much or any 
sorting to do then what does the CLUSTER gain you? Unless this table 
then became read only whatever small gain arose from the CLUSTER would 
fade away as UPDATEs and DELETEs where done.

2) What evidence is there that the records where still in PK order just 
because you deleted based on CREATED_ON? I understand the correlation 
between CREATED_ON and the PK just not sure why that would necessarily 
translate to an on disk order by PK?

-- 
Adrian Klaver
[email protected]






view thread (20+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: CLUSTER vs. VACUUM FULL
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox