Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ryiaX-0006tS-4e for pgsql-general@arkaria.postgresql.org; Mon, 22 Apr 2024 01:35:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ryiaT-00GsJM-TR for pgsql-general@arkaria.postgresql.org; Mon, 22 Apr 2024 01:35:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ryiaT-00GsJE-GX for pgsql-general@lists.postgresql.org; Mon, 22 Apr 2024 01:35:17 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ryiaO-003ztG-8u for pgsql-general@postgresql.org; Mon, 22 Apr 2024 01:35:16 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2db101c11beso44704611fa.0 for ; Sun, 21 Apr 2024 18:35:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713749710; x=1714354510; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OKLcGhe7t/evi2c7Zw/gaI/iaWMKDRCEdMD5ckqf0+A=; b=N8YKmk2aoHcUt+VJYaXzWuhyYzAhybFxWemNNFPv7K+XdZDsFvFOT1tfHj2m421N3k 2w5yir/ccqTsVvTeYmFW1enXMQUJZTDsfuKiYTjVNqFvyztZOrqN1n/xNGWAugdQf5oz ZjVzFgv5zl2fDKMGhngipduEkXqiqBvF9lCwndPA282Sf7c5lGrlJSoLIyfCtmedVnWF 7aRoqEZw/sOf2aklTwhX9F6MsYdtQT3CNXlsLtiPlWmCmxRvwJFc5aeCm14gJeOXE3pu e7AttTaAP1JT99v9ju7sUmkqMc7FghDbjSuc93FB4eZNMkl8tFzQZEyRNKFDy2SEp4ql EEuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713749710; x=1714354510; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OKLcGhe7t/evi2c7Zw/gaI/iaWMKDRCEdMD5ckqf0+A=; b=BLfpbrUHIWB/vvUDLiNSrnZ9jqivr51a2xxq4lUk5pnw5RfefQckD8uUQFSMeZWj01 ZpBBEVlxcJWr2b6NZGFbmsaW8e4VdjpXEe9jYMEwD0jUdI9orouKccCpV2kqjkU5ufKY ciWiUWm3jOS2RgpL79HV6qiY1CPhUTJ9ZC4gm27bH8DjICCTw6GAq7ezK1kvGlytcJWH /yhWNtYBtI+8ZRNcjz9MJ9GvEO3aYT5A3Jh5Ptd+Fyp23v/+XehUi7AaRoL0vIpJOaUL TRMwPfLXl/U7fiifydrM9baLicnI853eFJ22BvJ6Vy0D9wuhYVKSCmxP02svBMcMM/FR U2ng== X-Gm-Message-State: AOJu0Yx5IUxme0aOT03KdZUjJX/Mp+jEfBUBe6Z7/cAM26YYGUuEMe+X lpTRyHbTiJy8tzlNmJD39rjTsTU+keyWOsE90FkkJOy5r2lw0PUTN75ceRfDpY7IjOR5aziyQFj q4F2LP5TbPy4/XjltWhtHlEuQHXw= X-Google-Smtp-Source: AGHT+IFTiqI84dZjblKW26HpEwREBLueJ/HTRFix3NOaUnzgFbxk86GVtsOCsk6H+It3GxmrUtyUQu2XmFzAKNIWnIc= X-Received: by 2002:a05:651c:60e:b0:2d8:3dc7:e302 with SMTP id k14-20020a05651c060e00b002d83dc7e302mr3925307lje.2.1713749710110; Sun, 21 Apr 2024 18:35:10 -0700 (PDT) MIME-Version: 1.0 References: <2870091.1713739514@sss.pgh.pa.us> In-Reply-To: From: David Rowley Date: Mon, 22 Apr 2024 13:34:58 +1200 Message-ID: Subject: Re: CLUSTER vs. VACUUM FULL To: Ron Johnson Cc: pgsql-general Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 22 Apr 2024 at 12:16, Ron Johnson wrote: > > On Sun, Apr 21, 2024 at 6:45=E2=80=AFPM Tom Lane wrot= e: >> >> Ron Johnson writes: >> > Why is VACUUM FULL recommended for compressing a table, when CLUSTER d= oes >> > the same thing (similarly doubling disk space), and apparently runs ju= st as >> > fast? >> >> CLUSTER makes the additional effort to sort the data per the ordering >> of the specified index. I'm surprised that's not noticeable in your >> test case. > > Clustering on a completely different index was also 44 seconds. Both VACUUM FULL and CLUSTER go through a very similar code path. Both use cluster_rel(). VACUUM FULL just won't make use of an existing index to provide presorted input or perform a sort, whereas CLUSTER will attempt to choose the cheapest out of these two to get sorted results. If the timing for each is similar, it just means that using an index scan or sorting isn't very expensive compared to the other work that's being done. Both CLUSTER and VACUUM FULL require reading every heap page and writing out new pages into a new heap and maintaining all indexes on the new heap. That's quite an effort. To satisfy your curiosity, you could always run some EXPLAIN ANALYZE SELECT queries to measure how much time was spent sorting the entire table. You'd have to set work_mem to the value of maintenance_work_mem. David