Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ryjml-000Bzd-Po for pgsql-general@arkaria.postgresql.org; Mon, 22 Apr 2024 02:52:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ryjlm-00H4AF-Ay for pgsql-general@arkaria.postgresql.org; Mon, 22 Apr 2024 02:51:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ryjll-00H4A2-W3 for pgsql-general@lists.postgresql.org; Mon, 22 Apr 2024 02:51:01 +0000 Received: from mail-oi1-x236.google.com ([2607:f8b0:4864:20::236]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ryjlj-002GSg-L9 for pgsql-general@postgresql.org; Mon, 22 Apr 2024 02:51:01 +0000 Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-3c7510d1bacso892901b6e.1 for ; Sun, 21 Apr 2024 19:50:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713754257; x=1714359057; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=l4eQV4zUVt8Af+RI0arlGOtq8nG+xG/E37fsRGhm6i0=; b=f/mdWnaHRkpv3ryFN1qxy9oNP9oMdjvO2XBMkAUgZsxK4UnB2I1Lv+g3Ul/jQ8XyUA EIsgUgaq4+NfXC1AI6Eh8AhXJxdbcAks7bxGvAZMRaTaVhWEAG3L9nCBrm+ywfAyTMDi 1tlG60n//aNPNDDRYabvsHLPrsNkyiCOb9pUO9M4XQ00vmWWUPTateP7iWMd3srsdg7A 8jzMYc1EqKP58ED+g/O09BHD/iBrbUEoRobmbY48GsMMwbYSZJaglWrrO/wUfU6mgj84 wlSBYIWGHsMAJKd8WdvgJkAa0UqU/Y6icYrWW9yRxfSYLqhb+8uezqPqyigcf94ALt/U trxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713754257; x=1714359057; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=l4eQV4zUVt8Af+RI0arlGOtq8nG+xG/E37fsRGhm6i0=; b=qAPgeO1Zbori0vSWqleGTT8iXz5kOnshE5FhA7nk8lUIt2oYLznPjbEUlkHq3fhWvh W8uTLOd79ow4hSTUiY0C0GzszRcOAXCA1nQ6ORjcDF4A0Q0/WL3Woi9fESN+JYlTFjhU MTdxQS/VCA0Y95wX6JgkLiFGjrzTr6Zzt7fe2rugEljwam7rwHa1oKjy82vdQ7UfO22x TqHtOkhOCWqyCCF7uq+HE6vbpPbNTB/uYWrTDW9R2EJCasIbwku8oxR3GeI6ttmFmaHY 5VkFLvOMk7SjJRKvab029RoJmYJnAISwlbAufgCPLPyHA/edF+knkuN1uMZ5tPL3HA46 v5iA== X-Gm-Message-State: AOJu0YxDIviIvglOMUrxVZ3eKTAzHzC3IZy3STBurke+PvuskL1PFQb6 P2oL8/fmB+71VjTafXJpxarDK8DuBgBuw2RUQcMGyIvnuyyapVa4VstJgKrY1Pz57Vo6Be8i5oV E6ja5VUtGFcFSZx+K0qzSIym2eK5wMw== X-Google-Smtp-Source: AGHT+IFlsgLYDReed0RGuhSipz4eU+y0DI4vpFZjhtlisF705HrLPkrUdHZJtVAPHtdqkoFG/RW9pzRokJ4QnXQ1KRs= X-Received: by 2002:a05:6870:858b:b0:233:5b4d:ff90 with SMTP id f11-20020a056870858b00b002335b4dff90mr13106149oal.50.1713754257582; Sun, 21 Apr 2024 19:50:57 -0700 (PDT) MIME-Version: 1.0 References: <2870091.1713739514@sss.pgh.pa.us> In-Reply-To: From: Ron Johnson Date: Sun, 21 Apr 2024 22:50:46 -0400 Message-ID: Subject: Re: CLUSTER vs. VACUUM FULL To: David Rowley Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000c16f9f0616a681e3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c16f9f0616a681e3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Apr 21, 2024 at 9:35=E2=80=AFPM David Rowley = wrote: > On Mon, 22 Apr 2024 at 12:16, Ron Johnson wrote= : > > > > On Sun, Apr 21, 2024 at 6:45=E2=80=AFPM Tom Lane wr= ote: > >> > >> Ron Johnson writes: > >> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER > does > >> > the same thing (similarly doubling disk space), and apparently runs > just as > >> > fast? > >> > >> CLUSTER makes the additional effort to sort the data per the ordering > >> of the specified index. I'm surprised that's not noticeable in your > >> test case. > > > > Clustering on a completely different index was also 44 seconds. > > Both VACUUM FULL and CLUSTER go through a very similar code path. Both > use cluster_rel(). VACUUM FULL just won't make use of an existing > index to provide presorted input or perform a sort, whereas CLUSTER > will attempt to choose the cheapest out of these two to get sorted > results. > > If the timing for each is similar, it just means that using an index > scan or sorting isn't very expensive compared to the other work that's > being done. Both CLUSTER and VACUUM FULL require reading every heap > page and writing out new pages into a new heap and maintaining all > indexes on the new heap. That's quite an effort. > My original CLUSTER command didn't have to change the order of the data very much, thus, the sort didn't have to do much work. CLUSTER on a different index was indeed much slower than VACUUM FULL. --000000000000c16f9f0616a681e3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, Apr 21, 2024 at 9:35=E2=80=AFPM D= avid Rowley <dgrowleyml@gmail.co= m> wrote:
On Mon, 22 Apr 2024 at 12:16, Ron Johnson <<= a href=3D"mailto:ronljohnsonjr@gmail.com" target=3D"_blank">ronljohnsonjr@g= mail.com> wrote:
>
> On Sun, Apr 21, 2024 at 6:45=E2=80=AFPM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Ron Johnson <ronljohnsonjr@gmail.com> writes:
>> > Why is VACUUM FULL recommended for compressing a table, when = CLUSTER does
>> > the same thing (similarly doubling disk space), and apparentl= y runs just as
>> > fast?
>>
>> CLUSTER makes the additional effort to sort the data per the order= ing
>> of the specified index.=C2=A0 I'm surprised that's not not= iceable in your
>> test case.
>
> Clustering on a completely different index=C2=A0 was also 44 seconds.<= br>
Both VACUUM FULL and CLUSTER go through a very similar code path. Both
use cluster_rel().=C2=A0 VACUUM FULL just won't make use of an existing=
index to provide presorted input or perform a sort, whereas CLUSTER
will attempt to choose the cheapest out of these two to get sorted
results.

If the timing for each is similar, it just means that using an index
scan or sorting isn't very expensive compared to the other work that= 9;s
being done.=C2=A0 Both CLUSTER and VACUUM FULL require reading every heap page and writing out new pages into a new heap and maintaining=C2=A0 all indexes on the new heap. That's quite an effort.
<= br>
My original CLUSTER command didn't have to change the ord= er of the data very=C2=A0much, thus, the=C2=A0sort didn't have to do mu= ch work.

CLUSTER on a different index was indeed m= uch slower than VACUUM FULL.

--000000000000c16f9f0616a681e3--