Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufqgg-00Ca03-Fn for pgsql-hackers@arkaria.postgresql.org; Sun, 27 Jul 2025 02:00:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ufqff-00Dd45-V8 for pgsql-hackers@arkaria.postgresql.org; Sun, 27 Jul 2025 01:59:28 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufqff-00Dd3x-I9 for pgsql-hackers@lists.postgresql.org; Sun, 27 Jul 2025 01:59:27 +0000 Received: from mail-vk1-xa2b.google.com ([2607:f8b0:4864:20::a2b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ufqfd-000xrx-2D for pgsql-hackers@lists.postgresql.org; Sun, 27 Jul 2025 01:59:26 +0000 Received: by mail-vk1-xa2b.google.com with SMTP id 71dfb90a1353d-53167fb5690so1072090e0c.3 for ; Sat, 26 Jul 2025 18:59:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xzilla-net.20230601.gappssmtp.com; s=20230601; t=1753581564; x=1754186364; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ED69CvZnZ1O3vOVu0q4UOM803eDL1FkyQtfMb5pfXb8=; b=aCmIKzZEpDazfYu1vOW/Rn7M3AQGI7mN3GNaVAWS39AxtCRRaNEW84VYes0TXHnLZa Ngwi55BNhb2Lu3iuk7yqj2aJu+glRB5KCf+ggPDQCqMkr4iBCq08So6mtPiWQx8zD7QA nI8Mq0w0yt2B2ACOkchdhG5Y7pu7QlaRLf/fNA/fENXeZyiwjD3kaEtV7da+0EFKiISE OwiRG/g/aUzePPmB7z3cMp0Y0MkCqmi2DXwcXf7QA0qIMVOWk46JuloUfsorplDCy9qO CmfCGlfoVqZiNqO3M9uKuHcea4eFc3pWGbT4fTTz34fjNav+/CuR0CuZ1e9LbbnqXNp8 aSdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753581564; x=1754186364; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ED69CvZnZ1O3vOVu0q4UOM803eDL1FkyQtfMb5pfXb8=; b=FkXmA2P+La5u2zjy6MeL+NT5B52AbrTCUrhJtW0lHATTY+ESdYy8hZtxol6RNu1yTg PGDL8zNHs9AVBaVJjBfK/O4oTJGVbIP9wXAdUht9hFb096dNwq+UZL6mTOxB0dNpoGWq GpsX0+cpAS/nnk8zqVgMeXzsJgGN27r0UcnCFtfnGlRxitfQddfHsnXn/U40a7I2plGN iYUUgAl8AFBFMis6L4XIOb5TxkEzQF/kHaS7N/B5fWd+zR238HpBFMjsqllXjqVHU8Oa D9SdJczlPOmcbNwDYkGddGfLLcnl+Mq+djYu9a0uVO5OFsEspvFCjqwOl9T5RsML+25U 0OFg== X-Gm-Message-State: AOJu0YyPCIeCNLUMRCRcf5gTy6czSIITnD+fHF6g7I6aWytV0E0lQ0Kk kmKs0z7vkdvUy/eq/QDLW9LvfriyhoQMfuzTRVG0efEiGrnWmvV7x0/E5cb8PbXx4y35NtD3set uyWlOczykefCY9ZANStDi4pr7Z+FmnWerZoAxvL8Ez991RunA/XpOsw4= X-Gm-Gg: ASbGnctuCGt0fS2pDnhixFwe2o+U+c8UJ54PGPzzp4wwXIp6h9dbDxUJTaHZmzLTNTj bxwbCAvtbsg09SDK7MAuUxQ8QktiBoyKOypm0NgBprm4bcvEfy7iawPDdT/NUj2BHBCGbb+L+5n 16Jf212aoImlid+yFcr9I6XSrpxpfP+4AEj38pB4byYLWwc2DfgQ5nqS+wuI95nfRzwsyZbSOq6 QTdKvfWX8hJNkYqZQ== X-Google-Smtp-Source: AGHT+IGCsjdnLFe6+4ygBGBTV0HhE54bd2ByMPejnePZ5IYXv3da6tTDLLJmk4VT7ZWqA6Ctpw6WOXr9SavAullJ2as= X-Received: by 2002:a05:6122:3090:b0:530:720b:abe9 with SMTP id 71dfb90a1353d-538db5d0bacmr2428411e0c.7.1753581564112; Sat, 26 Jul 2025 18:59:24 -0700 (PDT) MIME-Version: 1.0 References: <202507262156.sb455angijk6@alvherre.pgsql> In-Reply-To: <202507262156.sb455angijk6@alvherre.pgsql> From: Robert Treat Date: Sat, 26 Jul 2025 21:59:10 -0400 X-Gm-Features: Ac12FXxrjh2uLG0u3OmDtOq-ugCXa0o2KwCFXevzhxpEsK5tzAFcE4NY397PbTg Message-ID: Subject: Re: Adding REPACK [concurrently] To: Alvaro Herrera Cc: Pg Hackers , Antonin Houska Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Jul 26, 2025 at 5:56=E2=80=AFPM Alvaro Herrera wrote: > > Hello, > > Here's a patch to add REPACK and eventually the CONCURRENTLY flag to it. > This is coming from [1]. The ultimate goal is to have an in-core tool > to allow concurrent table rewrite to get rid of bloat; right now, VACUUM > FULL does that, but it's not concurrent. Users have resorted to using > the pg_repack third-party tool, which is ancient and uses a weird > internal implementation, as well as pg_squeeze, which uses logical > decoding to capture changes that occur during the table rewrite. The > patch submitted here, largely by Antonin Houska with some changes by me, > is based on the the pg_squeeze code which he authored, and first > introduces a new command called REPACK to absorb both VACUUM FULL and > CLUSTER, followed by addition of a CONCURRENTLY flag to allow some forms > of REPACK to operate online using logical decoding. > > Essentially, this first patch just reshuffles the CLUSTER code to create > the REPACK command. > Thanks for keeping this ball rolling. > > My other change to Antonin's patch is that I made REPACK USING INDEX set > the 'indisclustered' flag to the index being used, so REPACK behaves > identically to CLUSTER. We can discuss whether we really want this. > For instance we could add an option so that by default REPACK omits > persisting the clustered index, and instead it only does that when you > give it some special option, say something like > "REPACK (persist_clustered_index=3Dtrue) tab USING INDEX idx" > Overall I'm not sure this is terribly interesting, since clustered > indexes are not very useful for most users anyway. > I think I would lean towards having it work like CLUSTER (preserve the index), since that helps people making the transition, and it doesn't feel terribly useful to invent new syntax for a feature that I would agree isn't very useful for most people. > I made a few other minor changes not worthy of individual mention, and > there are a few others pending, such as updates to the > pg_stat_progress_repack view infrastructure, as well as phasing out > pg_stat_progress_cluster (maybe the latter would offer a subset of the > former; not yet sure about this.) Also, I'd like to work on adding a > `repackdb` command for completeness. > > On repackdb: I think is going to be very similar to vacuumdb, mostly in > that it is going to need to be able to run tasks in parallel; but there > are things it doesn't have to deal with, such as analyze-in-stages, > which I think is a large burden. I estimate about 1k LOC there, > extremely similar to vacuumdb. Maybe it makes sense to share the source > code and make the new executable a symlink instead, with some additional > code to support the two different modes. Again, I'm not sure about > this -- I like the idea, but I'd have to see the implementation. > > I'll be rebasing the rest of Antonin's patch series afterwards, > including the logical decoding changes necessary for CONCURRENTLY. In > the meantime, if people want to review those, which would be very > valuable, they can go back to branch master from around the time he > submitted it and apply the old patches there. > For clarity, are you intending to commit this patch before having the other parts ready? (If that sounds like an objection, it isn't) After a first pass, I think there's some confusing bits in the new docs that could use straightening out, but there likely going to overlap changes once concurrently is brought in, so it might make sense to hold off on those. Either way I definitely want to dive into this a bit deeper with some fresh eyes, there's a lot to digest... speaking of, for this bit in src/backend/commands/cluster.c + switch (cmd) + { + case REPACK_COMMAND_REPACK: + return "REPACK"; + case REPACK_COMMAND_VACUUMFULL: + return "VACUUM"; + case REPACK_COMMAND_CLUSTER: + return "VACUUM"; + } + return "???"; The last one should return "CLUSTER" no? Robert Treat https://xzilla.net