public inbox for [email protected]
help / color / mirror / Atom feedFrom: Mihail Nikalayeu <[email protected]>
To: Antonin Houska <[email protected]>
Cc: Alvaro Herrera <[email protected]>
Cc: Pg Hackers <[email protected]>
Cc: Robert Treat <[email protected]>
Cc: Fujii Masao <[email protected]>
Subject: Re: Adding REPACK [concurrently]
Date: Tue, 2 Sep 2025 12:44:27 +0200
Message-ID: <CADzfLwUgPMLiFkXRnk97ugPqkDfsNJ3TRdw9gjJM=8WB4_nXwQ@mail.gmail.com> (raw)
In-Reply-To: <51329.1756740618@localhost>
References: <[email protected]>
<[email protected]>
<CADzfLwWJqoG6uPt+HywKOFjXhqSbfCr+VXpfio9YQ6yqQaihPA@mail.gmail.com>
<4607.1756703531@localhost>
<CADzfLwU4kuHSPd9ty8DQpRNWRjX6rJJVVzzWvT4+MEoTyyGtDg@mail.gmail.com>
<51329.1756740618@localhost>
Hello!
Antonin Houska <[email protected]>:
> I'll apply it to the next version of the "Add CONCURRENTLY option to REPACK
> command" patch.
I have added it to the v21 patchset.
Also, I’ve updated the MVCC-safe patch:
* it uses the "XactLockTableWait before replay + SnapshotSelf" approach from [0]
* it includes a TAP test to ensure MVCC safety - not intended to be
committed in its current form (too heavy)
* documentation has been updated.
It's now much simpler and does not negatively impact performance. It
is less aggressive in tuple freezing, but can be updated to match the
non-MVCC-safe version if needed.
While testing MVCC-safe version with stress-tests
007_repack_concurrently_mvcc.pl I encountered some random crashes with
such logs:
25-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl ERROR: relcache reference
0x7715b9f394a8 is not owned by resource owner TopTransaction
2025-09-02 12:24:40.039 CEST client backend[261907]
007_repack_concurrently_mvcc.pl STATEMENT: REPACK (CONCURRENTLY) tbl1
USING INDEX tbl1_pkey;
TRAP: failed Assert("rel->rd_refcnt > 0"), File:
"../src/backend/utils/cache/relcache.c", Line: 6992, PID: 261907
postgres: CIC_test: nkey postgres [local]
REPACK(ExceptionalCondition+0xbe)[0x5b7ac41d79f9]
postgres: CIC_test: nkey postgres [local] REPACK(+0x852d2e)[0x5b7ac41cbd2e]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aa4a6)[0x5b7ac42234a6]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aad3b)[0x5b7ac4223d3b]
postgres: CIC_test: nkey postgres [local] REPACK(+0x8aac69)[0x5b7ac4223c69]
postgres: CIC_test: nkey postgres [local]
REPACK(ResourceOwnerRelease+0x32)[0x5b7ac4223c26]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f43bf)[0x5b7ac3b6d3bf]
postgres: CIC_test: nkey postgres [local] REPACK(+0x1f4dfa)[0x5b7ac3b6ddfa]
postgres: CIC_test: nkey postgres [local]
REPACK(AbortCurrentTransaction+0xe)[0x5b7ac3b6dd6b]
postgres: CIC_test: nkey postgres [local]
REPACK(PostgresMain+0x57d)[0x5b7ac3fd7238]
postgres: CIC_test: nkey postgres [local] REPACK(+0x654102)[0x5b7ac3fcd102]
postgres: CIC_test: nkey postgres [local]
REPACK(postmaster_child_launch+0x191)[0x5b7ac3eceb7a]
postgres: CIC_test: nkey postgres [local] REPACK(+0x55c8c1)[0x5b7ac3ed58c1]
postgres: CIC_test: nkey postgres [local] REPACK(+0x559d1e)[0x5b7ac3ed2d1e]
postgres: CIC_test: nkey postgres [local]
REPACK(PostmasterMain+0x168a)[0x5b7ac3ed25f8]
postgres: CIC_test: nkey postgres [local] REPACK(main+0x3a1)[0x5b7ac3da2bd6]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7715b962a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7715b962a28b]
This time I was clever and tried to attempt to reproduce the issue on
a non-MVCC safe version at first - and it is reproducible.
Just comment \if :p_t1 != :p_t2 (and its internals, because they
catching non-mvcc behaviour which is expected without 0006 patch); and
set
'--no-vacuum --client=30 --jobs=4 --exit-on-abort --transactions=25000'
It takes about a minute on my PC to get the crash.
[0]: https://www.postgresql.org/message-id/flat/CADzfLwXCTXNdxK-XGTKmObvT%3D_QnaCviwgrcGtG9chsj5sYzrg%40m...
Best regards,
Mikhail.
Attachments:
[application/octet-stream] v21-0006-Preserve-visibility-information-of-the-concurren.patch (30.5K, 2-v21-0006-Preserve-visibility-information-of-the-concurren.patch)
download | inline diff:
From 946862e2a4dbfd91ac6802c2e8da104dce81c43a Mon Sep 17 00:00:00 2001
From: Mikhail Nikalayeu <[email protected]>
Date: Tue, 2 Sep 2025 11:30:55 +0200
Subject: [PATCH v21 6/6] Preserve visibility information of the concurrent
data changes.
As explained in the commit message of the preceding patch of the series, the
data changes done by applications while REPACK CONCURRENTLY is copying the
table contents to a new file are decoded from WAL and eventually also applied
to the new file. To reduce the complexity a little bit, the preceding patch
uses the current transaction (i.e. transaction opened by the REPACK command)
to execute those INSERT, UPDATE and DELETE commands.
However, REPACK is not expected to change visibility of tuples. Therefore,
this patch fixes the handling of the "concurrent data changes". It ensures
that tuples written into the new table have the same XID and command ID (CID)
as they had in the old table.
To "replay" an UPDATE or DELETE command on the new table, we use SnapshotSelf to find the last alive version of tuple and update with stamp with xid of original transaction. It is safe because:
* all transactions we replaying are committed
* apply worker working without any concurrent modifiers of the table
As long as we preserve the tuple visibility information (which includes XID),
it's important to avoid logical decoding of the WAL generated by DMLs on the
new table: the logical decoding subsystem probably does not expect that the
incoming WAL records contain XIDs of an already decoded transactions. (And of
course, repeated decoding would be wasted effort.)
Author: Antonin Houska <[email protected]> with changes from Mikhail Nikalayeu <[email protected]
---
contrib/amcheck/meson.build | 1 +
.../amcheck/t/007_repack_concurrently_mvcc.pl | 113 ++++++++++++++++++
doc/src/sgml/mvcc.sgml | 12 +-
doc/src/sgml/ref/repack.sgml | 9 --
src/backend/access/common/toast_internals.c | 3 +-
src/backend/access/heap/heapam.c | 46 ++++---
src/backend/access/heap/heapam_handler.c | 24 ++--
src/backend/commands/cluster.c | 85 +++++++++----
.../pgoutput_repack/pgoutput_repack.c | 18 +--
src/include/access/heapam.h | 12 +-
src/include/commands/cluster.h | 2 +
.../injection_points/specs/repack.spec | 4 -
12 files changed, 249 insertions(+), 80 deletions(-)
create mode 100644 contrib/amcheck/t/007_repack_concurrently_mvcc.pl
diff --git a/contrib/amcheck/meson.build b/contrib/amcheck/meson.build
index 1f0c347ed54..d07d6ed3f0c 100644
--- a/contrib/amcheck/meson.build
+++ b/contrib/amcheck/meson.build
@@ -50,6 +50,7 @@ tests += {
't/004_verify_nbtree_unique.pl',
't/005_pitr.pl',
't/006_verify_gin.pl',
+ 't/007_repack_concurrently_mvcc.pl',
],
},
}
diff --git a/contrib/amcheck/t/007_repack_concurrently_mvcc.pl b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl
new file mode 100644
index 00000000000..a83fd5b8141
--- /dev/null
+++ b/contrib/amcheck/t/007_repack_concurrently_mvcc.pl
@@ -0,0 +1,113 @@
+
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+# Test REPACK CONCURRENTLY with concurrent modifications
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+my $node;
+
+#
+# Test set-up
+#
+$node = PostgreSQL::Test::Cluster->new('CIC_test');
+$node->init;
+$node->append_conf('postgresql.conf',
+ 'lock_timeout = ' . (1000 * $PostgreSQL::Test::Utils::timeout_default));
+$node->append_conf(
+ 'postgresql.conf', qq(
+wal_level = logical
+));
+$node->start;
+$node->safe_psql('postgres', q(CREATE TABLE tbl1(i int PRIMARY KEY, j int)));
+$node->safe_psql('postgres', q(CREATE TABLE tbl2(i int PRIMARY KEY, j int)));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+ INSERT INTO tbl1 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+# Insert 100 rows into tbl2
+$node->safe_psql('postgres', q(
+ INSERT INTO tbl2 SELECT i, i % 100 FROM generate_series(1,100) i
+));
+
+
+# Insert 100 rows into tbl1
+$node->safe_psql('postgres', q(
+ CREATE OR REPLACE FUNCTION log_raise(i int, j1 int, j2 int) RETURNS VOID AS $$
+ BEGIN
+ RAISE NOTICE 'ERROR i=% j1=% j2=%', i, j1, j2;
+ END;$$ LANGUAGE plpgsql;
+));
+
+$node->safe_psql('postgres', q(CREATE UNLOGGED SEQUENCE in_row_rebuild START 1 INCREMENT 1;));
+$node->safe_psql('postgres', q(SELECT nextval('in_row_rebuild');));
+
+
+$node->pgbench(
+'--no-vacuum --client=10 --jobs=4 --exit-on-abort --transactions=2500',
+0,
+[qr{actually processed}],
+[qr{^$}],
+'concurrent operations with REINDEX/CREATE INDEX CONCURRENTLY',
+{
+ 'concurrent_ops' => q(
+ SELECT pg_try_advisory_lock(42)::integer AS gotlock \gset
+ \if :gotlock
+ SELECT nextval('in_row_rebuild') AS last_value \gset
+ \if :last_value = 2
+ REPACK (CONCURRENTLY) tbl1 USING INDEX tbl1_pkey;
+ \sleep 10 ms
+ REPACK (CONCURRENTLY) tbl2 USING INDEX tbl2_pkey;
+ \sleep 10 ms
+ \endif
+ SELECT pg_advisory_unlock(42);
+ \else
+ \set num random(1, 100)
+ BEGIN;
+ UPDATE tbl1 SET j = j + 1 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl1 SET j = j + 2 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl1 SET j = j + 3 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl1 SET j = j + 4 WHERE i = :num;
+ \sleep 1 ms
+
+ UPDATE tbl2 SET j = j + 1 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl2 SET j = j + 2 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl2 SET j = j + 3 WHERE i = :num;
+ \sleep 1 ms
+ UPDATE tbl2 SET j = j + 4 WHERE i = :num;
+
+ COMMIT;
+ SELECT setval('in_row_rebuild', 1);
+
+ BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
+ SELECT COALESCE(SUM(j), 0) AS t1 FROM tbl1 WHERE i = :num \gset p_
+ \sleep 10 ms
+ SELECT COALESCE(SUM(j), 0) AS t2 FROM tbl2 WHERE i = :num \gset p_
+ \if :p_t1 != :p_t2
+ COMMIT;
+ SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+ \sleep 10 ms
+ SELECT log_raise(tbl1.i, tbl1.j, tbl2.j) FROM tbl1 LEFT OUTER JOIN tbl2 ON tbl1.i = tbl2.i WHERE tbl1.j != tbl2.j;
+ SELECT (:p_t1 + :p_t2) / 0;
+ \endif
+
+ COMMIT;
+ \endif
+ )
+});
+
+$node->stop;
+done_testing();
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 0f5c34af542..049ee75a4ba 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,17 +1833,15 @@ SELECT pg_advisory_lock(q.id) FROM
<title>Caveats</title>
<para>
- Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
- table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
- TABLE</command></link> and <command>REPACK</command> with
- the <literal>CONCURRENTLY</literal> option, are not
+ Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
+ table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
MVCC-safe. This means that after the truncation or rewrite commits, the
table will appear empty to concurrent transactions, if they are using a
- snapshot taken before the command committed. This will only be an
+ snapshot taken before the DDL command committed. This will only be an
issue for a transaction that did not access the table in question
- before the command started — any transaction that has done so
+ before the DDL command started — any transaction that has done so
would hold at least an <literal>ACCESS SHARE</literal> table lock,
- which would block the truncating or rewriting command until that transaction completes.
+ which would block the DDL command until that transaction completes.
So these commands will not cause any apparent inconsistency in the
table contents for successive queries on the target table, but they
could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index ff5ce48de55..271923a5a60 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -292,15 +292,6 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
</listitem>
</itemizedlist>
</para>
-
- <warning>
- <para>
- <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
- option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
- details.
- </para>
- </warning>
-
</listitem>
</varlistentry>
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index a1d0eed8953..586eb42a137 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -320,7 +320,8 @@ toast_save_datum(Relation rel, Datum value,
memcpy(VARDATA(&chunk_data), data_p, chunk_size);
toasttup = heap_form_tuple(toasttupDesc, t_values, t_isnull);
- heap_insert(toastrel, toasttup, mycid, options, NULL);
+ heap_insert(toastrel, toasttup, GetCurrentTransactionId(), mycid,
+ options, NULL);
/*
* Create the index entry. We cheat a little here by not using
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f9a4fe3faed..45da5902de0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2070,7 +2070,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
/*
* heap_insert - insert tuple into a heap
*
- * The new tuple is stamped with current transaction ID and the specified
+ * The new tuple is stamped with specified transaction ID and the specified
* command ID.
*
* See table_tuple_insert for comments about most of the input flags, except
@@ -2086,15 +2086,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
* reflected into *tup.
*/
void
-heap_insert(Relation relation, HeapTuple tup, CommandId cid,
- int options, BulkInsertState bistate)
+heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+ CommandId cid, int options, BulkInsertState bistate)
{
- TransactionId xid = GetCurrentTransactionId();
HeapTuple heaptup;
Buffer buffer;
Buffer vmbuffer = InvalidBuffer;
bool all_visible_cleared = false;
+ Assert(TransactionIdIsValid(xid));
+
/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
RelationGetNumberOfAttributes(relation));
@@ -2176,8 +2177,15 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
/*
* If this is a catalog, we need to transmit combo CIDs to properly
* decode, so log that as well.
+ *
+ * HEAP_INSERT_NO_LOGICAL should be set when applying data changes
+ * done by other transactions during REPACK CONCURRENTLY. In such a
+ * case, the insertion should not be decoded at all - see
+ * heap_decode(). (It's also set by raw_heap_insert() for TOAST, but
+ * TOAST does not pass this test anyway.)
*/
- if (RelationIsAccessibleInLogicalDecoding(relation))
+ if ((options & HEAP_INSERT_NO_LOGICAL) == 0 &&
+ RelationIsAccessibleInLogicalDecoding(relation))
log_heap_new_cid(relation, heaptup);
/*
@@ -2723,7 +2731,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
void
simple_heap_insert(Relation relation, HeapTuple tup)
{
- heap_insert(relation, tup, GetCurrentCommandId(true), 0, NULL);
+ heap_insert(relation, tup, GetCurrentTransactionId(),
+ GetCurrentCommandId(true), 0, NULL);
}
/*
@@ -2780,11 +2789,11 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
*/
TM_Result
heap_delete(Relation relation, ItemPointer tid,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, bool changingPart, bool wal_logical)
+ TransactionId xid, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, bool changingPart,
+ bool wal_logical)
{
TM_Result result;
- TransactionId xid = GetCurrentTransactionId();
ItemId lp;
HeapTupleData tp;
Page page;
@@ -2801,6 +2810,7 @@ heap_delete(Relation relation, ItemPointer tid,
bool old_key_copied = false;
Assert(ItemPointerIsValid(tid));
+ Assert(TransactionIdIsValid(xid));
AssertHasSnapshotForToast(relation);
@@ -3217,7 +3227,7 @@ simple_heap_delete(Relation relation, ItemPointer tid)
TM_Result result;
TM_FailureData tmfd;
- result = heap_delete(relation, tid,
+ result = heap_delete(relation, tid, GetCurrentTransactionId(),
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
&tmfd, false, /* changingPart */
@@ -3260,12 +3270,11 @@ simple_heap_delete(Relation relation, ItemPointer tid)
*/
TM_Result
heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
+ TransactionId xid, CommandId cid, Snapshot crosscheck,
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
TU_UpdateIndexes *update_indexes, bool wal_logical)
{
TM_Result result;
- TransactionId xid = GetCurrentTransactionId();
Bitmapset *hot_attrs;
Bitmapset *sum_attrs;
Bitmapset *key_attrs;
@@ -3305,6 +3314,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
infomask2_new_tuple;
Assert(ItemPointerIsValid(otid));
+ Assert(TransactionIdIsValid(xid));
/* Cheap, simplistic check that the tuple matches the rel's rowtype. */
Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
@@ -4144,8 +4154,12 @@ l2:
/*
* For logical decoding we need combo CIDs to properly decode the
* catalog.
+ *
+ * Like in heap_insert(), visibility is unchanged when called from
+ * VACUUM FULL / CLUSTER.
*/
- if (RelationIsAccessibleInLogicalDecoding(relation))
+ if (wal_logical &&
+ RelationIsAccessibleInLogicalDecoding(relation))
{
log_heap_new_cid(relation, &oldtup);
log_heap_new_cid(relation, heaptup);
@@ -4511,7 +4525,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
TM_FailureData tmfd;
LockTupleMode lockmode;
- result = heap_update(relation, otid, tup,
+ result = heap_update(relation, otid, tup, GetCurrentTransactionId(),
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
&tmfd, &lockmode, update_indexes,
@@ -5351,8 +5365,6 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
uint16 new_infomask,
new_infomask2;
- Assert(TransactionIdIsCurrentTransactionId(add_to_xmax));
-
l5:
new_infomask = 0;
new_infomask2 = 0;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d03084768e0..6733e5fdda6 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -253,7 +253,8 @@ heapam_tuple_insert(Relation relation, TupleTableSlot *slot, CommandId cid,
tuple->t_tableOid = slot->tts_tableOid;
/* Perform the insertion, and copy the resulting ItemPointer */
- heap_insert(relation, tuple, cid, options, bistate);
+ heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+ bistate);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
if (shouldFree)
@@ -276,7 +277,8 @@ heapam_tuple_insert_speculative(Relation relation, TupleTableSlot *slot,
options |= HEAP_INSERT_SPECULATIVE;
/* Perform the insertion, and copy the resulting ItemPointer */
- heap_insert(relation, tuple, cid, options, bistate);
+ heap_insert(relation, tuple, GetCurrentTransactionId(), cid, options,
+ bistate);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
if (shouldFree)
@@ -310,8 +312,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
* the storage itself is cleaning the dead tuples by itself, it is the
* time to call the index tuple deletion also.
*/
- return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
- true);
+ return heap_delete(relation, tid, GetCurrentTransactionId(), cid,
+ crosscheck, wait, tmfd, changingPart, true);
}
@@ -329,7 +331,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
- result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
+ result = heap_update(relation, otid, tuple, GetCurrentTransactionId(),
+ cid, crosscheck, wait,
tmfd, lockmode, update_indexes, true);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
@@ -2477,9 +2480,16 @@ reform_and_rewrite_tuple(HeapTuple tuple,
* flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
* the relation files, it drops this relation, so no logical
* replication subscription should need the data.
+ *
+ * It is also crucial to stamp the new record with the exact same xid
+ * and cid, because the tuple must be visible to the snapshots of the
+ * concurrent transactions later.
*/
- heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
- HEAP_INSERT_NO_LOGICAL, NULL);
+ // TODO: looks like cid is not required
+ CommandId cid = HeapTupleHeaderGetRawCommandId(tuple->t_data);
+ TransactionId xid = HeapTupleHeaderGetXmin(tuple->t_data);
+
+ heap_insert(NewHeap, copiedTuple, xid, cid, HEAP_INSERT_NO_LOGICAL, NULL);
}
heap_freetuple(copiedTuple);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 61224a3adf2..936cb0ae429 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -55,6 +55,7 @@
#include "storage/ipc.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
+#include "storage/procarray.h"
#include "utils/acl.h"
#include "utils/fmgroids.h"
#include "utils/guc.h"
@@ -146,6 +147,7 @@ static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
ConcurrentChange *change);
static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
HeapTuple tup_key,
+ Snapshot snapshot,
IndexInsertState *iistate,
TupleTableSlot *ident_slot,
IndexScanDesc *scan_p);
@@ -1008,7 +1010,14 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
/* The historic snapshot won't be needed anymore. */
if (snapshot)
+ {
+ TransactionId xmin = snapshot->xmin;
PopActiveSnapshot();
+ Assert(concurrent);
+ // TODO: seems like it not required: need to check SnapBuildInitialSnapshotForRepack
+ WaitForOlderSnapshots(xmin, false);
+ }
+
if (concurrent)
{
@@ -1299,30 +1308,35 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
* not to be aggressive about this.
*/
memset(¶ms, 0, sizeof(VacuumParams));
- vacuum_get_cutoffs(OldHeap, params, &cutoffs);
-
- /*
- * FreezeXid will become the table's new relfrozenxid, and that mustn't go
- * backwards, so take the max.
- */
+ if (!concurrent)
{
TransactionId relfrozenxid = OldHeap->rd_rel->relfrozenxid;
+ MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
+ vacuum_get_cutoffs(OldHeap, params, &cutoffs);
+ /*
+ * FreezeXid will become the table's new relfrozenxid, and that mustn't go
+ * backwards, so take the max.
+ */
if (TransactionIdIsValid(relfrozenxid) &&
TransactionIdPrecedes(cutoffs.FreezeLimit, relfrozenxid))
cutoffs.FreezeLimit = relfrozenxid;
- }
-
- /*
- * MultiXactCutoff, similarly, shouldn't go backwards either.
- */
- {
- MultiXactId relminmxid = OldHeap->rd_rel->relminmxid;
-
+ /*
+ * MultiXactCutoff, similarly, shouldn't go backwards either.
+ */
if (MultiXactIdIsValid(relminmxid) &&
MultiXactIdPrecedes(cutoffs.MultiXactCutoff, relminmxid))
cutoffs.MultiXactCutoff = relminmxid;
}
+ else
+ {
+ /*
+ * In concurrent mode we reuse all the xmin/xmax,
+ * so just use current values for simplicity.
+ */
+ cutoffs.FreezeLimit = OldHeap->rd_rel->relfrozenxid;
+ cutoffs.MultiXactCutoff = OldHeap->rd_rel->relminmxid;
+ }
/*
* Decide whether to use an indexscan or seqscan-and-optional-sort to scan
@@ -2675,6 +2689,16 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
continue;
}
+ if (TransactionIdIsInProgress(change.xid))
+ {
+ /* xid is committed for sure because we got that update from reorderbuffer.
+ * but there is a possibility procarray is not yet updated and current backend still see it as
+ * in-progress. Let's wait for procarray to be updated. */
+ XactLockTableWait(change.xid, NULL, NULL, XLTW_None);
+ Assert(!TransactionIdIsInProgress(change.xid));
+ Assert(TransactionIdDidCommit(change.xid));
+ }
+
/*
* Extract the tuple from the change. The tuple is copied here because
* it might be assigned to 'tup_old', in which case it needs to
@@ -2712,9 +2736,13 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
}
/*
- * Find the tuple to be updated or deleted.
+ * Find the tuple to be updated or deleted using SnapshotSelf.
+ * That way we receive the last alive version in case of HOT chain.
+ * It is guaranteed there is no any non-yet committed, but updated version
+ * because we here replaying all-committed transactions without any concurrency
+ * involved.
*/
- tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+ tup_exist = find_target_tuple(rel, key, nkeys, tup_key, SnapshotSelf,
iistate, ident_slot, &ind_scan);
if (tup_exist == NULL)
elog(ERROR, "Failed to find target tuple");
@@ -2743,6 +2771,7 @@ apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
*/
if (change.kind != CHANGE_UPDATE_OLD)
{
+ // TODO: not sure it is required at all: we are replaying committed transactions stamping them with committed XID
CommandCounterIncrement();
UpdateActiveSnapshotCommandId();
}
@@ -2771,9 +2800,11 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
* Like simple_heap_insert(), but make sure that the INSERT is not
* logically decoded - see reform_and_rewrite_tuple() for more
* information.
+ *
+ * Use already committed xid to stamp the tuple.
*/
- heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
- NULL);
+ heap_insert(rel, tup, change->xid, GetCurrentCommandId(true),
+ HEAP_INSERT_NO_LOGICAL, NULL);
/*
* Update indexes.
@@ -2781,6 +2812,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
* In case functions in the index need the active snapshot and caller
* hasn't set one.
*/
+ PushActiveSnapshot(GetLatestSnapshot());
ExecStoreHeapTuple(tup, index_slot, false);
recheck = ExecInsertIndexTuples(iistate->rri,
index_slot,
@@ -2791,6 +2823,7 @@ apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
NIL, /* arbiterIndexes */
false /* onlySummarizing */
);
+ PopActiveSnapshot();
/*
* If recheck is required, it must have been preformed on the source
@@ -2819,9 +2852,11 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
*
* Do it like in simple_heap_update(), except for 'wal_logical' (and
* except for 'wait').
+ *
+ * Use already committed xid to stamp the tuple.
*/
res = heap_update(rel, &tup_target->t_self, tup,
- GetCurrentCommandId(true),
+ change->xid, GetCurrentCommandId(true),
InvalidSnapshot,
false, /* no wait - only we are doing changes */
&tmfd, &lockmode, &update_indexes,
@@ -2833,6 +2868,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
if (update_indexes != TU_None)
{
+ PushActiveSnapshot(GetLatestSnapshot());
recheck = ExecInsertIndexTuples(iistate->rri,
index_slot,
iistate->estate,
@@ -2842,6 +2878,7 @@ apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
NIL, /* arbiterIndexes */
/* onlySummarizing */
update_indexes == TU_Summarizing);
+ PopActiveSnapshot();
list_free(recheck);
}
@@ -2860,9 +2897,11 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
*
* Do it like in simple_heap_delete(), except for 'wal_logical' (and
* except for 'wait').
+ *
+ * Use already committed xid to stamp the tuple.
*/
- res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
- InvalidSnapshot, false,
+ res = heap_delete(rel, &tup_target->t_self, change->xid,
+ GetCurrentCommandId(true), InvalidSnapshot, false,
&tmfd,
false, /* no wait - only we are doing changes */
false /* wal_logical */ );
@@ -2886,7 +2925,7 @@ apply_concurrent_delete(Relation rel, HeapTuple tup_target,
*/
static HeapTuple
find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
- IndexInsertState *iistate,
+ Snapshot snapshot, IndexInsertState *iistate,
TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
{
IndexScanDesc scan;
@@ -2895,7 +2934,7 @@ find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
HeapTuple result = NULL;
/* XXX no instrumentation for now */
- scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+ scan = index_beginscan(rel, iistate->ident_index, snapshot,
NULL, nkeys, 0);
*scan_p = scan;
index_rescan(scan, key, nkeys, NULL, 0);
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
index 687fbbc59bb..020ff7b7c80 100644
--- a/src/backend/replication/pgoutput_repack/pgoutput_repack.c
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -32,7 +32,8 @@ static void plugin_truncate(struct LogicalDecodingContext *ctx,
Relation relations[],
ReorderBufferChange *change);
static void store_change(LogicalDecodingContext *ctx,
- ConcurrentChangeKind kind, HeapTuple tuple);
+ ConcurrentChangeKind kind, HeapTuple tuple,
+ TransactionId xid);
void
_PG_output_plugin_init(OutputPluginCallbacks *cb)
@@ -124,7 +125,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (newtuple == NULL)
elog(ERROR, "Incomplete insert info.");
- store_change(ctx, CHANGE_INSERT, newtuple);
+ store_change(ctx, CHANGE_INSERT, newtuple, change->txn->xid);
}
break;
case REORDER_BUFFER_CHANGE_UPDATE:
@@ -141,9 +142,11 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
elog(ERROR, "Incomplete update info.");
if (oldtuple != NULL)
- store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+ store_change(ctx, CHANGE_UPDATE_OLD, oldtuple,
+ change->txn->xid);
- store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+ store_change(ctx, CHANGE_UPDATE_NEW, newtuple,
+ change->txn->xid);
}
break;
case REORDER_BUFFER_CHANGE_DELETE:
@@ -156,7 +159,7 @@ plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (oldtuple == NULL)
elog(ERROR, "Incomplete delete info.");
- store_change(ctx, CHANGE_DELETE, oldtuple);
+ store_change(ctx, CHANGE_DELETE, oldtuple, change->txn->xid);
}
break;
default:
@@ -190,13 +193,13 @@ plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
if (i == nrelations)
return;
- store_change(ctx, CHANGE_TRUNCATE, NULL);
+ store_change(ctx, CHANGE_TRUNCATE, NULL, InvalidTransactionId);
}
/* Store concurrent data change. */
static void
store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
- HeapTuple tuple)
+ HeapTuple tuple, TransactionId xid)
{
RepackDecodingState *dstate;
char *change_raw;
@@ -266,6 +269,7 @@ store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
dst = dst_start + SizeOfConcurrentChange;
memcpy(dst, tuple->t_data, tuple->t_len);
+ change.xid = xid;
/* The data has been copied. */
if (flattened)
pfree(tuple);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b82dd17a966..981425f23b6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -316,22 +316,24 @@ extern BulkInsertState GetBulkInsertState(void);
extern void FreeBulkInsertState(BulkInsertState);
extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
- int options, BulkInsertState bistate);
+extern void heap_insert(Relation relation, HeapTuple tup, TransactionId xid,
+ CommandId cid, int options, BulkInsertState bistate);
extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
int ntuples, CommandId cid, int options,
BulkInsertState bistate);
extern TM_Result heap_delete(Relation relation, ItemPointer tid,
- CommandId cid, Snapshot crosscheck, bool wait,
+ TransactionId xid, CommandId cid,
+ Snapshot crosscheck, bool wait,
struct TM_FailureData *tmfd, bool changingPart,
bool wal_logical);
extern void heap_finish_speculative(Relation relation, ItemPointer tid);
extern void heap_abort_speculative(Relation relation, ItemPointer tid);
extern TM_Result heap_update(Relation relation, ItemPointer otid,
- HeapTuple newtup,
+ HeapTuple newtup, TransactionId xid,
CommandId cid, Snapshot crosscheck, bool wait,
struct TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes, bool wal_logical);
+ TU_UpdateIndexes *update_indexes,
+ bool wal_logical);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 4a508c57a50..242f8da770a 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -61,6 +61,8 @@ typedef struct ConcurrentChange
/* See the enum above. */
ConcurrentChangeKind kind;
+ /* Transaction that changes the data. */
+ TransactionId xid;
/*
* The actual tuple.
*
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
index 75850334986..3711a7c92b9 100644
--- a/src/test/modules/injection_points/specs/repack.spec
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -86,9 +86,6 @@ step change_new
# When applying concurrent data changes, we should see the effects of an
# in-progress subtransaction.
#
-# XXX Not sure this test is useful now - it was designed for the patch that
-# preserves tuple visibility and which therefore modifies
-# TransactionIdIsCurrentTransactionId().
step change_subxact1
{
BEGIN;
@@ -103,7 +100,6 @@ step change_subxact1
# When applying concurrent data changes, we should not see the effects of a
# rolled back subtransaction.
#
-# XXX Is this test useful? See above.
step change_subxact2
{
BEGIN;
--
2.43.0
[application/octet-stream] v21-0003-Refactor-index_concurrently_create_copy-for-use-.patch (4.1K, 3-v21-0003-Refactor-index_concurrently_create_copy-for-use-.patch)
download | inline diff:
From 896f4fc90d128f0a8625f47b82b08eb0da145be7 Mon Sep 17 00:00:00 2001
From: Antonin Houska <[email protected]>
Date: Mon, 11 Aug 2025 15:31:34 +0200
Subject: [PATCH v21 3/6] Refactor index_concurrently_create_copy() for use
with REPACK (CONCURRENTLY).
This patch moves the code to index_create_copy() and adds a "concurrently"
parameter so it can be used by REPACK (CONCURRENTLY).
With the CONCURRENTLY option, REPACK cannot simply swap the heap file and
rebuild its indexes. Instead, it needs to build a separate set of indexes
(including system catalog entries) *before* the actual swap, to reduce the
time AccessExclusiveLock needs to be held for.
---
src/backend/catalog/index.c | 36 ++++++++++++++++++++++++++++--------
src/include/catalog/index.h | 3 +++
2 files changed, 31 insertions(+), 8 deletions(-)
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3063abff9a5..0dee1b1a9d8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1290,15 +1290,31 @@ index_create(Relation heapRelation,
/*
* index_concurrently_create_copy
*
- * Create concurrently an index based on the definition of the one provided by
- * caller. The index is inserted into catalogs and needs to be built later
- * on. This is called during concurrent reindex processing.
- *
- * "tablespaceOid" is the tablespace to use for this index.
+ * Variant of index_create_copy(), called during concurrent reindex
+ * processing.
*/
Oid
index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
Oid tablespaceOid, const char *newName)
+{
+ return index_create_copy(heapRelation, oldIndexId, tablespaceOid, newName,
+ true);
+}
+
+/*
+ * index_create_copy
+ *
+ * Create an index based on the definition of the one provided by caller. The
+ * index is inserted into catalogs and needs to be built later on.
+ *
+ * "tablespaceOid" is the tablespace to use for this index.
+ *
+ * The actual implementation of index_concurrently_create_copy(), reusable for
+ * other purposes.
+ */
+Oid
+index_create_copy(Relation heapRelation, Oid oldIndexId, Oid tablespaceOid,
+ const char *newName, bool concurrently)
{
Relation indexRelation;
IndexInfo *oldInfo,
@@ -1317,6 +1333,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
List *indexColNames = NIL;
List *indexExprs = NIL;
List *indexPreds = NIL;
+ int flags = 0;
indexRelation = index_open(oldIndexId, RowExclusiveLock);
@@ -1325,9 +1342,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
/*
* Concurrent build of an index with exclusion constraints is not
- * supported.
+ * supported. If !concurrently, ii_ExclusinOps is currently not needed.
*/
- if (oldInfo->ii_ExclusionOps != NULL)
+ if (oldInfo->ii_ExclusionOps != NULL && concurrently)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation for exclusion constraints is not supported")));
@@ -1435,6 +1452,9 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
stattargets[i].isnull = isnull;
}
+ if (concurrently)
+ flags = INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT;
+
/*
* Now create the new index.
*
@@ -1458,7 +1478,7 @@ index_concurrently_create_copy(Relation heapRelation, Oid oldIndexId,
indcoloptions->values,
stattargets,
reloptionsDatum,
- INDEX_CREATE_SKIP_BUILD | INDEX_CREATE_CONCURRENT,
+ flags,
0,
true, /* allow table to be a system catalog? */
false, /* is_internal? */
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 4daa8bef5ee..063a891351a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -99,6 +99,9 @@ extern Oid index_concurrently_create_copy(Relation heapRelation,
Oid oldIndexId,
Oid tablespaceOid,
const char *newName);
+extern Oid index_create_copy(Relation heapRelation, Oid oldIndexId,
+ Oid tablespaceOid, const char *newName,
+ bool concurrently);
extern void index_concurrently_build(Oid heapRelationId,
Oid indexRelationId);
--
2.43.0
[application/octet-stream] v21-0005-Add-CONCURRENTLY-option-to-REPACK-command.patch (147.2K, 4-v21-0005-Add-CONCURRENTLY-option-to-REPACK-command.patch)
download | inline diff:
From a9411b077bc121215b230556be5a114d5effd847 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <[email protected]>
Date: Sat, 30 Aug 2025 19:13:38 +0200
Subject: [PATCH v21 5/6] Add CONCURRENTLY option to REPACK command.
The REPACK command copies the relation data into a new file, creates new
indexes and eventually swaps the files. To make sure that the old file does
not change during the copying, the relation is locked in an exclusive mode,
which prevents applications from both reading and writing. (To keep the data
consistent, we'd only need to prevent the applications from writing, but even
reading needs to be blocked before we can swap the files - otherwise some
applications could continue using the old file. Since we should not request a
stronger lock without releasing the weaker one first, we acquire the exclusive
lock in the beginning and keep it till the end of the processing.)
This patch introduces an alternative workflow, which only requires the
exclusive lock when the relation (and index) files are being swapped.
(Supposedly, the swapping should be pretty fast.) On the other hand, when we
copy the data to the new file, we allow applications to read from the relation
and even to write to it.
First, we scan the relation using a "historic snapshot", and insert all the
tuples satisfying this snapshot into the new file.
Second, logical decoding is used to capture the data changes done by
applications during the copying (i.e. changes that do not satisfy the historic
snapshot mentioned above), and those are applied to the new file before we
acquire the exclusive lock that we need to swap the files. (Of course, more
data changes can take place while we are waiting for the lock - these will be
applied to the new file after we have acquired the lock, before we swap the
files.)
Since the logical decoding system, during its startup, waits until all the
transactions which already have XID assigned have finished, there is a risk of
deadlock if a transaction that already changed anything in the database tries
to acquire a conflicting lock on the table REPACK CONCURRENTLY is working
on. As an example, consider transaction running CREATE INDEX command on the
table that is being REPACKed CONCURRENTLY. On the other hand, DML commands
(INSERT, UPDATE, DELETE) are not a problem as their lock does not conflict
with REPACK CONCURRENTLY.
The current approach is that we accept the risk. If we tried to avoid it, it'd
be necessary to unlock the table before the logical decoding is setup and lock
it again afterwards. Such temporary unlocking would imply re-checking if the
table still meets all the requirements for REPACK CONCURRENTLY.
Like the existing implementation of REPACK, the variant with the CONCURRENTLY
option also requires an extra space for the new relation and index files
(which coexist with the old files for some time). In addition, the
CONCURRENTLY option might introduce a lag in releasing WAL segments for
archiving / recycling. This is due to the decoding of the data changes done by
applications concurrently. When copying the table contents into the new file,
we check the lag periodically. If it exceeds the size of a WAL segment, we
decode all the available WAL before resuming the copying. (Of course, the
changes are not applied until the whole table contents is copied.) A
background worker might be a better approach for the decoding - let's consider
implementing it in the future.
The WAL records produced by running DML commands on the new relation do not
contain enough information to be processed by the logical decoding system. All
we need from the new relation is the file (relfilenode), while the actual
relation is eventually dropped. Thus there is no point in replaying the DMLs
anywhere.
Author: Antonin Houska <[email protected]>
---
doc/src/sgml/monitoring.sgml | 37 +-
doc/src/sgml/mvcc.sgml | 12 +-
doc/src/sgml/ref/repack.sgml | 129 +-
src/Makefile | 1 +
src/backend/access/heap/heapam.c | 34 +-
src/backend/access/heap/heapam_handler.c | 219 ++-
src/backend/access/heap/rewriteheap.c | 6 +-
src/backend/access/transam/xact.c | 11 +-
src/backend/catalog/system_views.sql | 30 +-
src/backend/commands/cluster.c | 1677 +++++++++++++++--
src/backend/commands/matview.c | 2 +-
src/backend/commands/tablecmds.c | 1 +
src/backend/commands/vacuum.c | 12 +-
src/backend/meson.build | 1 +
src/backend/replication/logical/decode.c | 83 +
src/backend/replication/logical/snapbuild.c | 21 +
.../replication/pgoutput_repack/Makefile | 32 +
.../replication/pgoutput_repack/meson.build | 18 +
.../pgoutput_repack/pgoutput_repack.c | 288 +++
src/backend/storage/ipc/ipci.c | 1 +
.../storage/lmgr/generate-lwlocknames.pl | 2 +-
src/backend/utils/cache/relcache.c | 1 +
src/backend/utils/time/snapmgr.c | 3 +-
src/bin/psql/tab-complete.in.c | 25 +-
src/include/access/heapam.h | 9 +-
src/include/access/heapam_xlog.h | 2 +
src/include/access/tableam.h | 10 +
src/include/commands/cluster.h | 91 +-
src/include/commands/progress.h | 23 +-
src/include/replication/snapbuild.h | 1 +
src/include/storage/lockdefs.h | 4 +-
src/include/utils/snapmgr.h | 2 +
src/test/modules/injection_points/Makefile | 5 +-
.../injection_points/expected/repack.out | 113 ++
.../modules/injection_points/logical.conf | 1 +
src/test/modules/injection_points/meson.build | 4 +
.../injection_points/specs/repack.spec | 143 ++
src/test/regress/expected/rules.out | 29 +-
src/tools/pgindent/typedefs.list | 4 +
39 files changed, 2816 insertions(+), 271 deletions(-)
create mode 100644 src/backend/replication/pgoutput_repack/Makefile
create mode 100644 src/backend/replication/pgoutput_repack/meson.build
create mode 100644 src/backend/replication/pgoutput_repack/pgoutput_repack.c
create mode 100644 src/test/modules/injection_points/expected/repack.out
create mode 100644 src/test/modules/injection_points/logical.conf
create mode 100644 src/test/modules/injection_points/specs/repack.spec
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 12e103d319d..61c0197555f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6074,14 +6074,35 @@ FROM pg_stat_get_backend_idset() AS backendid;
<row>
<entry role="catalog_table_entry"><para role="column_definition">
- <structfield>heap_tuples_written</structfield> <type>bigint</type>
+ <structfield>heap_tuples_inserted</structfield> <type>bigint</type>
</para>
<para>
- Number of heap tuples written.
+ Number of heap tuples inserted.
This counter only advances when the phase is
<literal>seq scanning heap</literal>,
- <literal>index scanning heap</literal>
- or <literal>writing new heap</literal>.
+ <literal>index scanning heap</literal>,
+ <literal>writing new heap</literal>
+ or <literal>catch-up</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_tuples_updated</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of heap tuples updated.
+ This counter only advances when the phase is <literal>catch-up</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_tuples_deleted</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of heap tuples deleted.
+ This counter only advances when the phase is <literal>catch-up</literal>.
</para></entry>
</row>
@@ -6162,6 +6183,14 @@ FROM pg_stat_get_backend_idset() AS backendid;
<command>REPACK</command> is currently writing the new heap.
</entry>
</row>
+ <row>
+ <entry><literal>catch-up</literal></entry>
+ <entry>
+ <command>REPACK CONCURRENTLY</command> is currently processing the DML
+ commands that other transactions executed during any of the preceding
+ phase.
+ </entry>
+ </row>
<row>
<entry><literal>swapping relation files</literal></entry>
<entry>
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 049ee75a4ba..0f5c34af542 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -1833,15 +1833,17 @@ SELECT pg_advisory_lock(q.id) FROM
<title>Caveats</title>
<para>
- Some DDL commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link> and the
- table-rewriting forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link>, are not
+ Some commands, currently only <link linkend="sql-truncate"><command>TRUNCATE</command></link>, the
+ table-rewriting forms of <link linkend="sql-altertable"><command>ALTER
+ TABLE</command></link> and <command>REPACK</command> with
+ the <literal>CONCURRENTLY</literal> option, are not
MVCC-safe. This means that after the truncation or rewrite commits, the
table will appear empty to concurrent transactions, if they are using a
- snapshot taken before the DDL command committed. This will only be an
+ snapshot taken before the command committed. This will only be an
issue for a transaction that did not access the table in question
- before the DDL command started — any transaction that has done so
+ before the command started — any transaction that has done so
would hold at least an <literal>ACCESS SHARE</literal> table lock,
- which would block the DDL command until that transaction completes.
+ which would block the truncating or rewriting command until that transaction completes.
So these commands will not cause any apparent inconsistency in the
table contents for successive queries on the target table, but they
could cause visible inconsistency between the contents of the target
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
index fd9d89f8aaa..ff5ce48de55 100644
--- a/doc/src/sgml/ref/repack.sgml
+++ b/doc/src/sgml/ref/repack.sgml
@@ -27,6 +27,7 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
ANALYSE | ANALYZE
+ CONCURRENTLY
</synopsis>
</refsynopsisdiv>
@@ -49,7 +50,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
processes every table and materialized view in the current database that
the current user has the <literal>MAINTAIN</literal> privilege on. This
form of <command>REPACK</command> cannot be executed inside a transaction
- block.
+ block. Also, this form is not allowed if
+ the <literal>CONCURRENTLY</literal> option is used.
</para>
<para>
@@ -62,7 +64,8 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
is acquired on it. This prevents any other database operations (both reads
and writes) from operating on the table until the <command>REPACK</command>
- is finished.
+ is finished. If you want to keep the table accessible during the repacking,
+ consider using the <literal>CONCURRENTLY</literal> option.
</para>
<refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
@@ -179,6 +182,128 @@ REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ Allow other transactions to use the table while it is being repacked.
+ </para>
+
+ <para>
+ Internally, <command>REPACK</command> copies the contents of the table
+ (ignoring dead tuples) into a new file, sorted by the specified index,
+ and also creates a new file for each index. Then it swaps the old and
+ new files for the table and all the indexes, and deletes the old
+ files. The <literal>ACCESS EXCLUSIVE</literal> lock is needed to make
+ sure that the old files do not change during the processing because the
+ changes would get lost due to the swap.
+ </para>
+
+ <para>
+ With the <literal>CONCURRENTLY</literal> option, the <literal>ACCESS
+ EXCLUSIVE</literal> lock is only acquired to swap the table and index
+ files. The data changes that took place during the creation of the new
+ table and index files are captured using logical decoding
+ (<xref linkend="logicaldecoding"/>) and applied before
+ the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+ is typically held only for the time needed to swap the files, which
+ should be pretty short. However, the time might still be noticeable if
+ too many data changes have been done to the table while
+ <command>REPACK</command> was waiting for the lock: those changes must
+ be processed just before the files are swapped, while the
+ <literal>ACCESS EXCLUSIVE</literal> lock is being held.
+ </para>
+
+ <para>
+ Note that <command>REPACK</command> with the
+ the <literal>CONCURRENTLY</literal> option does not try to order the
+ rows inserted into the table after the repacking started. Also
+ note <command>REPACK</command> might fail to complete due to DDL
+ commands executed on the table by other transactions during the
+ repacking.
+ </para>
+
+ <note>
+ <para>
+ In addition to the temporary space requirements explained in
+ <xref linkend="sql-repack-notes-on-resources"/>,
+ the <literal>CONCURRENTLY</literal> option can add to the usage of
+ temporary space a bit more. The reason is that other transactions can
+ perform DML operations which cannot be applied to the new file until
+ <command>REPACK</command> has copied all the tuples from the old
+ file. Thus the tuples inserted into the old file during the copying are
+ also stored separately in a temporary file, so they can eventually be
+ applied to the new file.
+ </para>
+
+ <para>
+ Furthermore, the data changes performed during the copying are
+ extracted from <link linkend="wal">write-ahead log</link> (WAL), and
+ this extraction (decoding) only takes place when certain amount of WAL
+ has been written. Therefore, WAL removal can be delayed by this
+ threshold. Currently the threshold is equal to the value of
+ the <link linkend="guc-wal-segment-size"><varname>wal_segment_size</varname></link>
+ configuration parameter.
+ </para>
+ </note>
+
+ <para>
+ The <literal>CONCURRENTLY</literal> option cannot be used in the
+ following cases:
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ The table is <literal>UNLOGGED</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The table is partitioned.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The table is a system catalog or a <acronym>TOAST</acronym> table.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <command>REPACK</command> is executed inside a transaction block.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <link linkend="guc-wal-level"><varname>wal_level</varname></link>
+ configuration parameter is less than <literal>logical</literal>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ The <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+ configuration parameter does not allow for creation of an additional
+ replication slot.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <warning>
+ <para>
+ <command>REPACK</command> with the <literal>CONCURRENTLY</literal>
+ option is not MVCC-safe, see <xref linkend="mvcc-caveats"/> for
+ details.
+ </para>
+ </warning>
+
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><literal>VERBOSE</literal></term>
<listitem>
diff --git a/src/Makefile b/src/Makefile
index 2f31a2f20a7..b18c9a14ffa 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -23,6 +23,7 @@ SUBDIRS = \
interfaces \
backend/replication/libpqwalreceiver \
backend/replication/pgoutput \
+ backend/replication/pgoutput_repack \
fe_utils \
bin \
pl \
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e3e7307ef5f..f9a4fe3faed 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -60,7 +60,8 @@ static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup,
HeapTuple newtup, HeapTuple old_key_tuple,
- bool all_visible_cleared, bool new_all_visible_cleared);
+ bool all_visible_cleared, bool new_all_visible_cleared,
+ bool wal_logical);
#ifdef USE_ASSERT_CHECKING
static void check_lock_if_inplace_updateable_rel(Relation relation,
ItemPointer otid,
@@ -2780,7 +2781,7 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
TM_Result
heap_delete(Relation relation, ItemPointer tid,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, bool changingPart)
+ TM_FailureData *tmfd, bool changingPart, bool wal_logical)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
@@ -3027,7 +3028,8 @@ l1:
* Compute replica identity tuple before entering the critical section so
* we don't PANIC upon a memory allocation failure.
*/
- old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);
+ old_key_tuple = wal_logical ?
+ ExtractReplicaIdentity(relation, &tp, true, &old_key_copied) : NULL;
/*
* If this is the first possibly-multixact-able operation in the current
@@ -3117,6 +3119,15 @@ l1:
xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
}
+ /*
+ * Unlike UPDATE, DELETE is decoded even if there is no old key, so it
+ * does not help to clear both XLH_DELETE_CONTAINS_OLD_TUPLE and
+ * XLH_DELETE_CONTAINS_OLD_KEY. Thus we need an extra flag. TODO
+ * Consider not decoding tuples w/o the old tuple/key instead.
+ */
+ if (!wal_logical)
+ xlrec.flags |= XLH_DELETE_NO_LOGICAL;
+
XLogBeginInsert();
XLogRegisterData(&xlrec, SizeOfHeapDelete);
@@ -3209,7 +3220,8 @@ simple_heap_delete(Relation relation, ItemPointer tid)
result = heap_delete(relation, tid,
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, false /* changingPart */ );
+ &tmfd, false, /* changingPart */
+ true /* wal_logical */ );
switch (result)
{
case TM_SelfModified:
@@ -3250,7 +3262,7 @@ TM_Result
heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TU_UpdateIndexes *update_indexes, bool wal_logical)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
@@ -4143,7 +4155,8 @@ l2:
newbuf, &oldtup, heaptup,
old_key_tuple,
all_visible_cleared,
- all_visible_cleared_new);
+ all_visible_cleared_new,
+ wal_logical);
if (newbuf != buffer)
{
PageSetLSN(BufferGetPage(newbuf), recptr);
@@ -4501,7 +4514,8 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
result = heap_update(relation, otid, tup,
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode, update_indexes,
+ true /* wal_logical */ );
switch (result)
{
case TM_SelfModified:
@@ -8842,7 +8856,8 @@ static XLogRecPtr
log_heap_update(Relation reln, Buffer oldbuf,
Buffer newbuf, HeapTuple oldtup, HeapTuple newtup,
HeapTuple old_key_tuple,
- bool all_visible_cleared, bool new_all_visible_cleared)
+ bool all_visible_cleared, bool new_all_visible_cleared,
+ bool wal_logical)
{
xl_heap_update xlrec;
xl_heap_header xlhdr;
@@ -8853,7 +8868,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
suffixlen = 0;
XLogRecPtr recptr;
Page page = BufferGetPage(newbuf);
- bool need_tuple_data = RelationIsLogicallyLogged(reln);
+ bool need_tuple_data = RelationIsLogicallyLogged(reln) &&
+ wal_logical;
bool init;
int bufflags;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 79f9de5d760..d03084768e0 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -33,6 +33,7 @@
#include "catalog/index.h"
#include "catalog/storage.h"
#include "catalog/storage_xlog.h"
+#include "commands/cluster.h"
#include "commands/progress.h"
#include "executor/executor.h"
#include "miscadmin.h"
@@ -309,7 +310,8 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
* the storage itself is cleaning the dead tuples by itself, it is the
* time to call the index tuple deletion also.
*/
- return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
+ return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart,
+ true);
}
@@ -328,7 +330,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, lockmode, update_indexes, true);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -685,13 +687,15 @@ static void
heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
Relation OldIndex, bool use_sort,
TransactionId OldestXmin,
+ Snapshot snapshot,
+ LogicalDecodingContext *decoding_ctx,
TransactionId *xid_cutoff,
MultiXactId *multi_cutoff,
double *num_tuples,
double *tups_vacuumed,
double *tups_recently_dead)
{
- RewriteState rwstate;
+ RewriteState rwstate = NULL;
IndexScanDesc indexScan;
TableScanDesc tableScan;
HeapScanDesc heapScan;
@@ -705,6 +709,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
bool *isnull;
BufferHeapTupleTableSlot *hslot;
BlockNumber prev_cblock = InvalidBlockNumber;
+ bool concurrent = snapshot != NULL;
+ XLogRecPtr end_of_wal_prev = GetFlushRecPtr(NULL);
/* Remember if it's a system catalog */
is_system_catalog = IsSystemRelation(OldHeap);
@@ -720,9 +726,12 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
values = (Datum *) palloc(natts * sizeof(Datum));
isnull = (bool *) palloc(natts * sizeof(bool));
- /* Initialize the rewrite operation */
- rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin, *xid_cutoff,
- *multi_cutoff);
+ /*
+ * Initialize the rewrite operation.
+ */
+ if (!concurrent)
+ rwstate = begin_heap_rewrite(OldHeap, NewHeap, OldestXmin,
+ *xid_cutoff, *multi_cutoff);
/* Set up sorting if wanted */
@@ -737,6 +746,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
* Prepare to scan the OldHeap. To ensure we see recently-dead tuples
* that still need to be copied, we scan with SnapshotAny and use
* HeapTupleSatisfiesVacuum for the visibility test.
+ *
+ * In the CONCURRENTLY case, we do regular MVCC visibility tests, using
+ * the snapshot passed by the caller.
*/
if (OldIndex != NULL && !use_sort)
{
@@ -753,7 +765,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tableScan = NULL;
heapScan = NULL;
- indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, NULL, 0, 0);
+ indexScan = index_beginscan(OldHeap, OldIndex,
+ snapshot ? snapshot : SnapshotAny,
+ NULL, 0, 0);
index_rescan(indexScan, NULL, 0, NULL, 0);
}
else
@@ -762,7 +776,9 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
- tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
+ tableScan = table_beginscan(OldHeap,
+ snapshot ? snapshot : SnapshotAny,
+ 0, (ScanKey) NULL);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
@@ -785,6 +801,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
HeapTuple tuple;
Buffer buf;
bool isdead;
+ HTSV_Result vis;
CHECK_FOR_INTERRUPTS();
@@ -837,70 +854,84 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
tuple = ExecFetchSlotHeapTuple(slot, false, NULL);
buf = hslot->buffer;
- LockBuffer(buf, BUFFER_LOCK_SHARE);
-
- switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
+ /*
+ * Regarding CONCURRENTLY, see the comments on MVCC snapshot above.
+ */
+ if (!concurrent)
{
- case HEAPTUPLE_DEAD:
- /* Definitely dead */
- isdead = true;
- break;
- case HEAPTUPLE_RECENTLY_DEAD:
- *tups_recently_dead += 1;
- /* fall through */
- case HEAPTUPLE_LIVE:
- /* Live or recently dead, must copy it */
- isdead = false;
- break;
- case HEAPTUPLE_INSERT_IN_PROGRESS:
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
- /*
- * Since we hold exclusive lock on the relation, normally the
- * only way to see this is if it was inserted earlier in our
- * own transaction. However, it can happen in system
- * catalogs, since we tend to release write lock before commit
- * there. Give a warning if neither case applies; but in any
- * case we had better copy it.
- */
- if (!is_system_catalog &&
- !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
- elog(WARNING, "concurrent insert in progress within table \"%s\"",
- RelationGetRelationName(OldHeap));
- /* treat as live */
- isdead = false;
- break;
- case HEAPTUPLE_DELETE_IN_PROGRESS:
+ switch ((vis = HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf)))
+ {
+ case HEAPTUPLE_DEAD:
+ /* Definitely dead */
+ isdead = true;
+ break;
+ case HEAPTUPLE_RECENTLY_DEAD:
+ *tups_recently_dead += 1;
+ /* fall through */
+ case HEAPTUPLE_LIVE:
+ /* Live or recently dead, must copy it */
+ isdead = false;
+ break;
+ case HEAPTUPLE_INSERT_IN_PROGRESS:
- /*
- * Similar situation to INSERT_IN_PROGRESS case.
- */
- if (!is_system_catalog &&
- !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
- elog(WARNING, "concurrent delete in progress within table \"%s\"",
- RelationGetRelationName(OldHeap));
- /* treat as recently dead */
- *tups_recently_dead += 1;
- isdead = false;
- break;
- default:
- elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
- isdead = false; /* keep compiler quiet */
- break;
- }
+ /*
+ * As long as we hold exclusive lock on the relation,
+ * normally the only way to see this is if it was inserted
+ * earlier in our own transaction. However, it can happen
+ * in system catalogs, since we tend to release write lock
+ * before commit there. Also, there's no exclusive lock
+ * during concurrent processing. Give a warning if neither
+ * case applies; but in any case we had better copy it.
+ */
+ if (!is_system_catalog && !concurrent &&
+ !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tuple->t_data)))
+ elog(WARNING, "concurrent insert in progress within table \"%s\"",
+ RelationGetRelationName(OldHeap));
+ /* treat as live */
+ isdead = false;
+ break;
+ case HEAPTUPLE_DELETE_IN_PROGRESS:
- LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ /*
+ * Similar situation to INSERT_IN_PROGRESS case.
+ */
+ if (!is_system_catalog && !concurrent &&
+ !TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tuple->t_data)))
+ elog(WARNING, "concurrent delete in progress within table \"%s\"",
+ RelationGetRelationName(OldHeap));
+ /* treat as recently dead */
+ *tups_recently_dead += 1;
+ isdead = false;
+ break;
+ default:
+ elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
+ isdead = false; /* keep compiler quiet */
+ break;
+ }
- if (isdead)
- {
- *tups_vacuumed += 1;
- /* heap rewrite module still needs to see it... */
- if (rewrite_heap_dead_tuple(rwstate, tuple))
+ if (isdead)
{
- /* A previous recently-dead tuple is now known dead */
*tups_vacuumed += 1;
- *tups_recently_dead -= 1;
+ /* heap rewrite module still needs to see it... */
+ if (rewrite_heap_dead_tuple(rwstate, tuple))
+ {
+ /* A previous recently-dead tuple is now known dead */
+ *tups_vacuumed += 1;
+ *tups_recently_dead -= 1;
+ }
+
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ continue;
}
- continue;
+
+ /*
+ * In the concurrent case, we have a copy of the tuple, so we
+ * don't worry whether the source tuple will be deleted / updated
+ * after we release the lock.
+ */
+ LockBuffer(buf, BUFFER_LOCK_UNLOCK);
}
*num_tuples += 1;
@@ -919,7 +950,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
{
const int ct_index[] = {
PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
- PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
+ PROGRESS_REPACK_HEAP_TUPLES_INSERTED
};
int64 ct_val[2];
@@ -934,6 +965,31 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
ct_val[1] = *num_tuples;
pgstat_progress_update_multi_param(2, ct_index, ct_val);
}
+
+ /*
+ * Process the WAL produced by the load, as well as by other
+ * transactions, so that the replication slot can advance and WAL does
+ * not pile up. Use wal_segment_size as a threshold so that we do not
+ * introduce the decoding overhead too often.
+ *
+ * Of course, we must not apply the changes until the initial load has
+ * completed.
+ *
+ * Note that our insertions into the new table should not be decoded
+ * as we (intentionally) do not write the logical decoding specific
+ * information to WAL.
+ */
+ if (concurrent)
+ {
+ XLogRecPtr end_of_wal;
+
+ end_of_wal = GetFlushRecPtr(NULL);
+ if ((end_of_wal - end_of_wal_prev) > wal_segment_size)
+ {
+ repack_decode_concurrent_changes(decoding_ctx, end_of_wal);
+ end_of_wal_prev = end_of_wal;
+ }
+ }
}
if (indexScan != NULL)
@@ -977,7 +1033,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
values, isnull,
rwstate);
/* Report n_tuples */
- pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
+ pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED,
n_tuples);
}
@@ -985,7 +1041,8 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
}
/* Write out any remaining tuples, and fsync if needed */
- end_heap_rewrite(rwstate);
+ if (rwstate)
+ end_heap_rewrite(rwstate);
/* Clean up */
pfree(values);
@@ -2376,6 +2433,10 @@ heapam_scan_sample_next_tuple(TableScanDesc scan, SampleScanState *scanstate,
* SET WITHOUT OIDS.
*
* So, we must reconstruct the tuple from component Datums.
+ *
+ * If rwstate=NULL, use simple_heap_insert() instead of rewriting - in that
+ * case we still need to deform/form the tuple. TODO Shouldn't we rename the
+ * function, as might not do any rewrite?
*/
static void
reform_and_rewrite_tuple(HeapTuple tuple,
@@ -2398,8 +2459,28 @@ reform_and_rewrite_tuple(HeapTuple tuple,
copiedTuple = heap_form_tuple(newTupDesc, values, isnull);
- /* The heap rewrite module does the rest */
- rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+ if (rwstate)
+ /* The heap rewrite module does the rest */
+ rewrite_heap_tuple(rwstate, tuple, copiedTuple);
+ else
+ {
+ /*
+ * Insert tuple when processing REPACK CONCURRENTLY.
+ *
+ * rewriteheap.c is not used in the CONCURRENTLY case because it'd be
+ * difficult to do the same in the catch-up phase (as the logical
+ * decoding does not provide us with sufficient visibility
+ * information). Thus we must use heap_insert() both during the
+ * catch-up and here.
+ *
+ * The following is like simple_heap_insert() except that we pass the
+ * flag to skip logical decoding: as soon as REPACK CONCURRENTLY swaps
+ * the relation files, it drops this relation, so no logical
+ * replication subscription should need the data.
+ */
+ heap_insert(NewHeap, copiedTuple, GetCurrentCommandId(true),
+ HEAP_INSERT_NO_LOGICAL, NULL);
+ }
heap_freetuple(copiedTuple);
}
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index e6d2b5fced1..6aa2ed214f2 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -617,9 +617,9 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
int options = HEAP_INSERT_SKIP_FSM;
/*
- * While rewriting the heap for VACUUM FULL / CLUSTER, make sure data
- * for the TOAST table are not logically decoded. The main heap is
- * WAL-logged as XLOG FPI records, which are not logically decoded.
+ * While rewriting the heap for REPACK, make sure data for the TOAST
+ * table are not logically decoded. The main heap is WAL-logged as
+ * XLOG FPI records, which are not logically decoded.
*/
options |= HEAP_INSERT_NO_LOGICAL;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b46e7e9c2a6..5670f2bfbde 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -215,6 +215,7 @@ typedef struct TransactionStateData
bool parallelChildXact; /* is any parent transaction parallel? */
bool chain; /* start a new block after this one */
bool topXidLogged; /* for a subxact: is top-level XID logged? */
+ bool internal; /* for a subxact: launched internally? */
struct TransactionStateData *parent; /* back link to parent */
} TransactionStateData;
@@ -4735,6 +4736,7 @@ BeginInternalSubTransaction(const char *name)
/* Normal subtransaction start */
PushTransaction();
s = CurrentTransactionState; /* changed by push */
+ s->internal = true;
/*
* Savepoint names, like the TransactionState block itself, live
@@ -5251,7 +5253,13 @@ AbortSubTransaction(void)
LWLockReleaseAll();
pgstat_report_wait_end();
- pgstat_progress_end_command();
+
+ /*
+ * Internal subtransacion might be used by an user command, in which case
+ * the command outlives the subtransaction.
+ */
+ if (!s->internal)
+ pgstat_progress_end_command();
pgaio_error_cleanup();
@@ -5468,6 +5476,7 @@ PushTransaction(void)
s->parallelModeLevel = 0;
s->parallelChildXact = (p->parallelModeLevel != 0 || p->parallelChildXact);
s->topXidLogged = false;
+ s->internal = false;
CurrentTransactionState = s;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index b2b7b10c2be..a92ac78ad9e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1266,16 +1266,17 @@ CREATE VIEW pg_stat_progress_cluster AS
WHEN 2 THEN 'index scanning heap'
WHEN 3 THEN 'sorting tuples'
WHEN 4 THEN 'writing new heap'
- WHEN 5 THEN 'swapping relation files'
- WHEN 6 THEN 'rebuilding index'
- WHEN 7 THEN 'performing final cleanup'
+ -- 5 is 'catch-up', but that should not appear here.
+ WHEN 6 THEN 'swapping relation files'
+ WHEN 7 THEN 'rebuilding index'
+ WHEN 8 THEN 'performing final cleanup'
END AS phase,
CAST(S.param3 AS oid) AS cluster_index_relid,
S.param4 AS heap_tuples_scanned,
S.param5 AS heap_tuples_written,
- S.param6 AS heap_blks_total,
- S.param7 AS heap_blks_scanned,
- S.param8 AS index_rebuild_count
+ S.param8 AS heap_blks_total,
+ S.param9 AS heap_blks_scanned,
+ S.param10 AS index_rebuild_count
FROM pg_stat_get_progress_info('CLUSTER') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
@@ -1291,16 +1292,19 @@ CREATE VIEW pg_stat_progress_repack AS
WHEN 2 THEN 'index scanning heap'
WHEN 3 THEN 'sorting tuples'
WHEN 4 THEN 'writing new heap'
- WHEN 5 THEN 'swapping relation files'
- WHEN 6 THEN 'rebuilding index'
- WHEN 7 THEN 'performing final cleanup'
+ WHEN 5 THEN 'catch-up'
+ WHEN 6 THEN 'swapping relation files'
+ WHEN 7 THEN 'rebuilding index'
+ WHEN 8 THEN 'performing final cleanup'
END AS phase,
CAST(S.param3 AS oid) AS repack_index_relid,
S.param4 AS heap_tuples_scanned,
- S.param5 AS heap_tuples_written,
- S.param6 AS heap_blks_total,
- S.param7 AS heap_blks_scanned,
- S.param8 AS index_rebuild_count
+ S.param5 AS heap_tuples_inserted,
+ S.param6 AS heap_tuples_updated,
+ S.param7 AS heap_tuples_deleted,
+ S.param8 AS heap_blks_total,
+ S.param9 AS heap_blks_scanned,
+ S.param10 AS index_rebuild_count
FROM pg_stat_get_progress_info('REPACK') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8b64f9e6795..61224a3adf2 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -25,6 +25,10 @@
#include "access/toast_internals.h"
#include "access/transam.h"
#include "access/xact.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
#include "catalog/heap.h"
@@ -32,6 +36,7 @@
#include "catalog/namespace.h"
#include "catalog/objectaccess.h"
#include "catalog/pg_am.h"
+#include "catalog/pg_control.h"
#include "catalog/pg_inherits.h"
#include "catalog/toasting.h"
#include "commands/cluster.h"
@@ -39,15 +44,21 @@
#include "commands/progress.h"
#include "commands/tablecmds.h"
#include "commands/vacuum.h"
+#include "executor/executor.h"
#include "miscadmin.h"
#include "optimizer/optimizer.h"
#include "pgstat.h"
+#include "replication/decode.h"
+#include "replication/logical.h"
+#include "replication/snapbuild.h"
#include "storage/bufmgr.h"
+#include "storage/ipc.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "utils/acl.h"
#include "utils/fmgroids.h"
#include "utils/guc.h"
+#include "utils/injection_point.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
@@ -67,13 +78,45 @@ typedef struct
Oid indexOid;
} RelToCluster;
+/*
+ * The following definitions are used for concurrent processing.
+ */
+
+/*
+ * The locators are used to avoid logical decoding of data that we do not need
+ * for our table.
+ */
+RelFileLocator repacked_rel_locator = {.relNumber = InvalidOid};
+RelFileLocator repacked_rel_toast_locator = {.relNumber = InvalidOid};
+
+/*
+ * Everything we need to call ExecInsertIndexTuples().
+ */
+typedef struct IndexInsertState
+{
+ ResultRelInfo *rri;
+ EState *estate;
+
+ Relation ident_index;
+} IndexInsertState;
+
+/* The WAL segment being decoded. */
+static XLogSegNo repack_current_segment = 0;
+
+
static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
- Oid indexOid, Oid userid, int options);
+ Oid indexOid, Oid userid, LOCKMODE lmode,
+ int options);
+static void check_repack_concurrently_requirements(Relation rel);
static void rebuild_relation(RepackCommand cmd, bool usingindex,
- Relation OldHeap, Relation index, bool verbose);
+ Relation OldHeap, Relation index, Oid userid,
+ bool verbose, bool concurrent);
static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
- bool verbose, bool *pSwapToastByContent,
- TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+ Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+ bool verbose,
+ bool *pSwapToastByContent,
+ TransactionId *pFreezeXid,
+ MultiXactId *pCutoffMulti);
static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
MemoryContext permcxt);
static List *get_tables_to_repack_partitioned(RepackCommand cmd,
@@ -81,12 +124,61 @@ static List *get_tables_to_repack_partitioned(RepackCommand cmd,
Oid relid, bool rel_is_index);
static bool cluster_is_permitted_for_relation(RepackCommand cmd,
Oid relid, Oid userid);
+
+static void begin_concurrent_repack(Relation rel);
+static void end_concurrent_repack(void);
+static LogicalDecodingContext *setup_logical_decoding(Oid relid,
+ const char *slotname,
+ TupleDesc tupdesc);
+static HeapTuple get_changed_tuple(char *change);
+static void apply_concurrent_changes(RepackDecodingState *dstate,
+ Relation rel, ScanKey key, int nkeys,
+ IndexInsertState *iistate);
+static void apply_concurrent_insert(Relation rel, ConcurrentChange *change,
+ HeapTuple tup, IndexInsertState *iistate,
+ TupleTableSlot *index_slot);
+static void apply_concurrent_update(Relation rel, HeapTuple tup,
+ HeapTuple tup_target,
+ ConcurrentChange *change,
+ IndexInsertState *iistate,
+ TupleTableSlot *index_slot);
+static void apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+ ConcurrentChange *change);
+static HeapTuple find_target_tuple(Relation rel, ScanKey key, int nkeys,
+ HeapTuple tup_key,
+ IndexInsertState *iistate,
+ TupleTableSlot *ident_slot,
+ IndexScanDesc *scan_p);
+static void process_concurrent_changes(LogicalDecodingContext *ctx,
+ XLogRecPtr end_of_wal,
+ Relation rel_dst,
+ Relation rel_src,
+ ScanKey ident_key,
+ int ident_key_nentries,
+ IndexInsertState *iistate);
+static IndexInsertState *get_index_insert_state(Relation relation,
+ Oid ident_index_id);
+static ScanKey build_identity_key(Oid ident_idx_oid, Relation rel_src,
+ int *nentries);
+static void free_index_insert_state(IndexInsertState *iistate);
+static void cleanup_logical_decoding(LogicalDecodingContext *ctx);
+static void rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+ Relation cl_index,
+ LogicalDecodingContext *ctx,
+ bool swap_toast_by_content,
+ TransactionId frozenXid,
+ MultiXactId cutoffMulti);
+static List *build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes);
static Relation process_single_relation(RepackStmt *stmt,
+ LOCKMODE lockmode,
+ bool isTopLevel,
ClusterParams *params);
static Oid determine_clustered_index(Relation rel, bool usingindex,
const char *indexname);
+#define REPL_PLUGIN_NAME "pgoutput_repack"
+
static const char *
RepackCommandAsString(RepackCommand cmd)
{
@@ -95,7 +187,7 @@ RepackCommandAsString(RepackCommand cmd)
case REPACK_COMMAND_REPACK:
return "REPACK";
case REPACK_COMMAND_VACUUMFULL:
- return "VACUUM";
+ return "VACUUM (FULL)";
case REPACK_COMMAND_CLUSTER:
return "CLUSTER";
}
@@ -132,6 +224,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
ClusterParams params = {0};
Relation rel = NULL;
MemoryContext repack_context;
+ LOCKMODE lockmode;
List *rtcs;
/* Parse option list */
@@ -142,6 +235,16 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
else if (strcmp(opt->defname, "analyze") == 0 ||
strcmp(opt->defname, "analyse") == 0)
params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
+ else if (strcmp(opt->defname, "concurrently") == 0 &&
+ defGetBoolean(opt))
+ {
+ if (stmt->command != REPACK_COMMAND_REPACK)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("CONCURRENTLY option not supported for %s",
+ RepackCommandAsString(stmt->command)));
+ params.options |= CLUOPT_CONCURRENT;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -151,13 +254,25 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
parser_errposition(pstate, opt->location)));
}
+ /*
+ * Determine the lock mode expected by cluster_rel().
+ *
+ * In the exclusive case, we obtain AccessExclusiveLock right away to
+ * avoid lock-upgrade hazard in the single-transaction case. In the
+ * CONCURRENTLY case, the AccessExclusiveLock will only be used at the end
+ * of processing, supposedly for very short time. Until then, we'll have
+ * to unlock the relation temporarily, so there's no lock-upgrade hazard.
+ */
+ lockmode = (params.options & CLUOPT_CONCURRENT) == 0 ?
+ AccessExclusiveLock : ShareUpdateExclusiveLock;
+
/*
* If a single relation is specified, process it and we're done ... unless
* the relation is a partitioned table, in which case we fall through.
*/
if (stmt->relation != NULL)
{
- rel = process_single_relation(stmt, ¶ms);
+ rel = process_single_relation(stmt, lockmode, isTopLevel, ¶ms);
if (rel == NULL)
return;
}
@@ -169,10 +284,29 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
errmsg("cannot ANALYZE multiple tables"));
/*
- * By here, we know we are in a multi-table situation. In order to avoid
- * holding locks for too long, we want to process each table in its own
- * transaction. This forces us to disallow running inside a user
- * transaction block.
+ * By here, we know we are in a multi-table situation.
+ *
+ * Concurrent processing is currently considered rather special (e.g. in
+ * terms of resources consumed) so it is not performed in bulk.
+ */
+ if (params.options & CLUOPT_CONCURRENT)
+ {
+ if (rel != NULL)
+ {
+ Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+ ereport(ERROR,
+ errmsg("REPACK CONCURRENTLY not supported for partitioned tables"),
+ errhint("Consider running the command for individual partitions."));
+ }
+ else
+ ereport(ERROR,
+ errmsg("REPACK CONCURRENTLY requires explicit table name"));
+ }
+
+ /*
+ * In order to avoid holding locks for too long, we want to process each
+ * table in its own transaction. This forces us to disallow running
+ * inside a user transaction block.
*/
PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
@@ -252,7 +386,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
* Open the target table, coping with the case where it has been
* dropped.
*/
- rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+ rel = try_table_open(rtc->tableOid, lockmode);
if (rel == NULL)
{
CommitTransactionCommand();
@@ -264,7 +398,7 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
/* Process this table */
cluster_rel(stmt->command, stmt->usingindex,
- rel, rtc->indexOid, ¶ms);
+ rel, rtc->indexOid, ¶ms, isTopLevel);
/* cluster_rel closes the relation, but keeps lock */
PopActiveSnapshot();
@@ -293,22 +427,55 @@ ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
* If indexOid is InvalidOid, the table will be rewritten in physical order
* instead of index order.
*
+ * Note that, in the concurrent case, the function releases the lock at some
+ * point, in order to get AccessExclusiveLock for the final steps (i.e. to
+ * swap the relation files). To make things simpler, the caller should expect
+ * OldHeap to be closed on return, regardless CLUOPT_CONCURRENT. (The
+ * AccessExclusiveLock is kept till the end of the transaction.)
+ *
* 'cmd' indicates which command is being executed, to be used for error
* messages.
*/
void
cluster_rel(RepackCommand cmd, bool usingindex,
- Relation OldHeap, Oid indexOid, ClusterParams *params)
+ Relation OldHeap, Oid indexOid, ClusterParams *params,
+ bool isTopLevel)
{
Oid tableOid = RelationGetRelid(OldHeap);
+ Relation index;
+ LOCKMODE lmode;
Oid save_userid;
int save_sec_context;
int save_nestlevel;
bool verbose = ((params->options & CLUOPT_VERBOSE) != 0);
bool recheck = ((params->options & CLUOPT_RECHECK) != 0);
- Relation index;
+ bool concurrent = ((params->options & CLUOPT_CONCURRENT) != 0);
- Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false));
+ /*
+ * Check that the correct lock is held. The lock mode is
+ * AccessExclusiveLock for normal processing and ShareUpdateExclusiveLock
+ * for concurrent processing (so that SELECT, INSERT, UPDATE and DELETE
+ * commands work, but cluster_rel() cannot be called concurrently for the
+ * same relation).
+ */
+ lmode = !concurrent ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+
+ /* There are specific requirements on concurrent processing. */
+ if (concurrent)
+ {
+ /*
+ * Make sure we have no XID assigned, otherwise call of
+ * setup_logical_decoding() can cause a deadlock.
+ *
+ * The existence of transaction block actually does not imply that XID
+ * was already assigned, but it very likely is. We might want to check
+ * the result of GetCurrentTransactionIdIfAny() instead, but that
+ * would be less clear from user's perspective.
+ */
+ PreventInTransactionBlock(isTopLevel, "REPACK (CONCURRENTLY)");
+
+ check_repack_concurrently_requirements(OldHeap);
+ }
/* Check for user-requested abort. */
CHECK_FOR_INTERRUPTS();
@@ -351,11 +518,13 @@ cluster_rel(RepackCommand cmd, bool usingindex,
* If this is a single-transaction CLUSTER, we can skip these tests. We
* *must* skip the one on indisclustered since it would reject an attempt
* to cluster a not-previously-clustered index.
+ *
+ * XXX move [some of] these comments to where the RECHECK flag is
+ * determined?
*/
- if (recheck)
- if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
- params->options))
- goto out;
+ if (recheck && !cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+ lmode, params->options))
+ goto out;
/*
* We allow repacking shared catalogs only when not using an index. It
@@ -369,6 +538,12 @@ cluster_rel(RepackCommand cmd, bool usingindex,
errmsg("cannot run \"%s\" on a shared catalog",
RepackCommandAsString(cmd))));
+ /*
+ * The CONCURRENTLY case should have been rejected earlier because it does
+ * not support system catalogs.
+ */
+ Assert(!(OldHeap->rd_rel->relisshared && concurrent));
+
/*
* Don't process temp tables of other backends ... their local buffer
* manager is not going to cope.
@@ -404,7 +579,7 @@ cluster_rel(RepackCommand cmd, bool usingindex,
if (OidIsValid(indexOid))
{
/* verify the index is good and lock it */
- check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
+ check_index_is_clusterable(OldHeap, indexOid, lmode);
/* also open it */
index = index_open(indexOid, NoLock);
}
@@ -421,7 +596,9 @@ cluster_rel(RepackCommand cmd, bool usingindex,
if (OldHeap->rd_rel->relkind == RELKIND_MATVIEW &&
!RelationIsPopulated(OldHeap))
{
- relation_close(OldHeap, AccessExclusiveLock);
+ if (index)
+ index_close(index, lmode);
+ relation_close(OldHeap, lmode);
goto out;
}
@@ -434,11 +611,35 @@ cluster_rel(RepackCommand cmd, bool usingindex,
* invalid, because we move tuples around. Promote them to relation
* locks. Predicate locks on indexes will be promoted when they are
* reindexed.
+ *
+ * During concurrent processing, the heap as well as its indexes stay in
+ * operation, so we postpone this step until they are locked using
+ * AccessExclusiveLock near the end of the processing.
*/
- TransferPredicateLocksToHeapRelation(OldHeap);
+ if (!concurrent)
+ TransferPredicateLocksToHeapRelation(OldHeap);
/* rebuild_relation does all the dirty work */
- rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
+ PG_TRY();
+ {
+ /*
+ * For concurrent processing, make sure that our logical decoding
+ * ignores data changes of other tables than the one we are
+ * processing.
+ */
+ if (concurrent)
+ begin_concurrent_repack(OldHeap);
+
+ rebuild_relation(cmd, usingindex, OldHeap, index, save_userid,
+ verbose, concurrent);
+ }
+ PG_FINALLY();
+ {
+ if (concurrent)
+ end_concurrent_repack();
+ }
+ PG_END_TRY();
+
/* rebuild_relation closes OldHeap, and index if valid */
out:
@@ -457,14 +658,14 @@ out:
*/
static bool
cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
- Oid userid, int options)
+ Oid userid, LOCKMODE lmode, int options)
{
Oid tableOid = RelationGetRelid(OldHeap);
/* Check that the user still has privileges for the relation */
if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
{
- relation_close(OldHeap, AccessExclusiveLock);
+ relation_close(OldHeap, lmode);
return false;
}
@@ -478,7 +679,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
*/
if (RELATION_IS_OTHER_TEMP(OldHeap))
{
- relation_close(OldHeap, AccessExclusiveLock);
+ relation_close(OldHeap, lmode);
return false;
}
@@ -489,7 +690,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
*/
if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
{
- relation_close(OldHeap, AccessExclusiveLock);
+ relation_close(OldHeap, lmode);
return false;
}
@@ -500,7 +701,7 @@ cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
!get_index_isclustered(indexOid))
{
- relation_close(OldHeap, AccessExclusiveLock);
+ relation_close(OldHeap, lmode);
return false;
}
}
@@ -641,19 +842,89 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
table_close(pg_index, RowExclusiveLock);
}
+/*
+ * Check if the CONCURRENTLY option is legal for the relation.
+ */
+static void
+check_repack_concurrently_requirements(Relation rel)
+{
+ char relpersistence,
+ replident;
+ Oid ident_idx;
+
+ /* Data changes in system relations are not logically decoded. */
+ if (IsCatalogRelation(rel))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot repack relation \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("REPACK CONCURRENTLY is not supported for catalog relations.")));
+
+ /*
+ * reorderbuffer.c does not seem to handle processing of TOAST relation
+ * alone.
+ */
+ if (IsToastRelation(rel))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot repack relation \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("REPACK CONCURRENTLY is not supported for TOAST relations, unless the main relation is repacked too.")));
+
+ relpersistence = rel->rd_rel->relpersistence;
+ if (relpersistence != RELPERSISTENCE_PERMANENT)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot repack relation \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("REPACK CONCURRENTLY is only allowed for permanent relations.")));
+
+ /* With NOTHING, WAL does not contain the old tuple. */
+ replident = rel->rd_rel->relreplident;
+ if (replident == REPLICA_IDENTITY_NOTHING)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot repack relation \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("Relation \"%s\" has insufficient replication identity.",
+ RelationGetRelationName(rel))));
+
+ /*
+ * Identity index is not set if the replica identity is FULL, but PK might
+ * exist in such a case.
+ */
+ ident_idx = RelationGetReplicaIndex(rel);
+ if (!OidIsValid(ident_idx) && OidIsValid(rel->rd_pkindex))
+ ident_idx = rel->rd_pkindex;
+ if (!OidIsValid(ident_idx))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot process relation \"%s\"",
+ RelationGetRelationName(rel)),
+ (errhint("Relation \"%s\" has no identity index.",
+ RelationGetRelationName(rel)))));
+}
+
+
/*
* rebuild_relation: rebuild an existing relation in index or physical order
*
- * OldHeap: table to rebuild.
+ * OldHeap: table to rebuild. See cluster_rel() for comments on the required
+ * lock strength.
+ *
* index: index to cluster by, or NULL to rewrite in physical order.
*
- * On entry, heap and index (if one is given) must be open, and
- * AccessExclusiveLock held on them.
- * On exit, they are closed, but locks on them are not released.
+ * On entry, heap and index (if one is given) must be open, and the
+ * appropriate lock held on them -- AccessExclusiveLock for exclusive
+ * processing and ShareUpdateExclusiveLock for concurrent processing.
+ *
+ * On exit, they are closed, but still locked with AccessExclusiveLock. (The
+ * function handles the lock upgrade if 'concurrent' is true.)
*/
static void
rebuild_relation(RepackCommand cmd, bool usingindex,
- Relation OldHeap, Relation index, bool verbose)
+ Relation OldHeap, Relation index, Oid userid,
+ bool verbose, bool concurrent)
{
Oid tableOid = RelationGetRelid(OldHeap);
Oid accessMethod = OldHeap->rd_rel->relam;
@@ -661,13 +932,55 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
Oid OIDNewHeap;
Relation NewHeap;
char relpersistence;
- bool is_system_catalog;
bool swap_toast_by_content;
TransactionId frozenXid;
MultiXactId cutoffMulti;
+ NameData slotname;
+ LogicalDecodingContext *ctx = NULL;
+ Snapshot snapshot = NULL;
+#if USE_ASSERT_CHECKING
+ LOCKMODE lmode;
- Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
- (index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
+ lmode = concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock;
+
+ Assert(CheckRelationLockedByMe(OldHeap, lmode, false));
+ Assert(!usingindex || CheckRelationLockedByMe(index, lmode, false));
+#endif
+
+ if (concurrent)
+ {
+ TupleDesc tupdesc;
+
+ /*
+ * REPACK CONCURRENTLY is not allowed in a transaction block, so this
+ * should never fire.
+ */
+ Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+
+ /*
+ * A single backend should not execute multiple REPACK commands at a
+ * time, so use PID to make the slot unique.
+ */
+ snprintf(NameStr(slotname), NAMEDATALEN, "repack_%d", MyProcPid);
+
+ tupdesc = CreateTupleDescCopy(RelationGetDescr(OldHeap));
+
+ /*
+ * Prepare to capture the concurrent data changes.
+ *
+ * Note that this call waits for all transactions with XID already
+ * assigned to finish. If some of those transactions is waiting for a
+ * lock conflicting with ShareUpdateExclusiveLock on our table (e.g.
+ * it runs CREATE INDEX), we can end up in a deadlock. Not sure this
+ * risk is worth unlocking/locking the table (and its clustering
+ * index) and checking again if its still eligible for REPACK
+ * CONCURRENTLY.
+ */
+ ctx = setup_logical_decoding(tableOid, NameStr(slotname), tupdesc);
+
+ snapshot = SnapBuildInitialSnapshotForRepack(ctx->snapshot_builder);
+ PushActiveSnapshot(snapshot);
+ }
/* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
if (usingindex)
@@ -675,7 +988,6 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
/* Remember info about rel before closing OldHeap */
relpersistence = OldHeap->rd_rel->relpersistence;
- is_system_catalog = IsSystemRelation(OldHeap);
/*
* Create the transient table that will receive the re-ordered data.
@@ -691,30 +1003,67 @@ rebuild_relation(RepackCommand cmd, bool usingindex,
NewHeap = table_open(OIDNewHeap, NoLock);
/* Copy the heap data into the new table in the desired order */
- copy_table_data(NewHeap, OldHeap, index, verbose,
+ copy_table_data(NewHeap, OldHeap, index, snapshot, ctx, verbose,
&swap_toast_by_content, &frozenXid, &cutoffMulti);
+ /* The historic snapshot won't be needed anymore. */
+ if (snapshot)
+ PopActiveSnapshot();
- /* Close relcache entries, but keep lock until transaction commit */
- table_close(OldHeap, NoLock);
- if (index)
- index_close(index, NoLock);
-
- /*
- * Close the new relation so it can be dropped as soon as the storage is
- * swapped. The relation is not visible to others, so no need to unlock it
- * explicitly.
- */
- table_close(NewHeap, NoLock);
-
- /*
- * Swap the physical files of the target and transient tables, then
- * rebuild the target's indexes and throw away the transient table.
- */
- finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
- swap_toast_by_content, false, true,
- frozenXid, cutoffMulti,
- relpersistence);
+ if (concurrent)
+ {
+ /*
+ * Push a snapshot that we will use to find old versions of rows when
+ * processing concurrent UPDATE and DELETE commands. (That snapshot
+ * should also be used by index expressions.)
+ */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Make sure we can find the tuples just inserted when applying DML
+ * commands on top of those.
+ */
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+
+ rebuild_relation_finish_concurrent(NewHeap, OldHeap, index,
+ ctx, swap_toast_by_content,
+ frozenXid, cutoffMulti);
+ PopActiveSnapshot();
+
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
+
+ /* Done with decoding. */
+ cleanup_logical_decoding(ctx);
+ ReplicationSlotRelease();
+ ReplicationSlotDrop(NameStr(slotname), false);
+ }
+ else
+ {
+ bool is_system_catalog = IsSystemRelation(OldHeap);
+
+ /* Close relcache entries, but keep lock until transaction commit */
+ table_close(OldHeap, NoLock);
+ if (index)
+ index_close(index, NoLock);
+
+ /*
+ * Close the new relation so it can be dropped as soon as the storage
+ * is swapped. The relation is not visible to others, so no need to
+ * unlock it explicitly.
+ */
+ table_close(NewHeap, NoLock);
+
+ /*
+ * Swap the physical files of the target and transient tables, then
+ * rebuild the target's indexes and throw away the transient table.
+ */
+ finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
+ swap_toast_by_content, false, true, true,
+ frozenXid, cutoffMulti,
+ relpersistence);
+ }
}
@@ -849,15 +1198,19 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
/*
* Do the physical copying of table data.
*
+ * 'snapshot' and 'decoding_ctx': see table_relation_copy_for_cluster(). Pass
+ * iff concurrent processing is required.
+ *
* There are three output parameters:
* *pSwapToastByContent is set true if toast tables must be swapped by content.
* *pFreezeXid receives the TransactionId used as freeze cutoff point.
* *pCutoffMulti receives the MultiXactId used as a cutoff point.
*/
static void
-copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verbose,
- bool *pSwapToastByContent, TransactionId *pFreezeXid,
- MultiXactId *pCutoffMulti)
+copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
+ Snapshot snapshot, LogicalDecodingContext *decoding_ctx,
+ bool verbose, bool *pSwapToastByContent,
+ TransactionId *pFreezeXid, MultiXactId *pCutoffMulti)
{
Relation relRelation;
HeapTuple reltup;
@@ -875,6 +1228,8 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
PGRUsage ru0;
char *nspname;
+ bool concurrent = snapshot != NULL;
+
pg_rusage_init(&ru0);
/* Store a copy of the namespace name for logging purposes */
@@ -977,8 +1332,48 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
* provided, else plain seqscan.
*/
if (OldIndex != NULL && OldIndex->rd_rel->relam == BTREE_AM_OID)
+ {
+ ResourceOwner oldowner = NULL;
+ ResourceOwner resowner = NULL;
+
+ /*
+ * In the CONCURRENT case, use a dedicated resource owner so we don't
+ * leave any additional locks behind us that we cannot release easily.
+ */
+ if (concurrent)
+ {
+ Assert(CheckRelationLockedByMe(OldHeap, ShareUpdateExclusiveLock,
+ false));
+ Assert(CheckRelationLockedByMe(OldIndex, ShareUpdateExclusiveLock,
+ false));
+
+ resowner = ResourceOwnerCreate(CurrentResourceOwner,
+ "plan_cluster_use_sort");
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = resowner;
+ }
+
use_sort = plan_cluster_use_sort(RelationGetRelid(OldHeap),
RelationGetRelid(OldIndex));
+
+ if (concurrent)
+ {
+ CurrentResourceOwner = oldowner;
+
+ /*
+ * We are primarily concerned about locks, but if the planner
+ * happened to allocate any other resources, we should release
+ * them too because we're going to delete the whole resowner.
+ */
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_BEFORE_LOCKS,
+ false, false);
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_LOCKS,
+ false, false);
+ ResourceOwnerRelease(resowner, RESOURCE_RELEASE_AFTER_LOCKS,
+ false, false);
+ ResourceOwnerDelete(resowner);
+ }
+ }
else
use_sort = false;
@@ -1007,7 +1402,9 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
* values (e.g. because the AM doesn't use freezing).
*/
table_relation_copy_for_cluster(OldHeap, NewHeap, OldIndex, use_sort,
- cutoffs.OldestXmin, &cutoffs.FreezeLimit,
+ cutoffs.OldestXmin, snapshot,
+ decoding_ctx,
+ &cutoffs.FreezeLimit,
&cutoffs.MultiXactCutoff,
&num_tuples, &tups_vacuumed,
&tups_recently_dead);
@@ -1016,7 +1413,11 @@ copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex, bool verb
*pFreezeXid = cutoffs.FreezeLimit;
*pCutoffMulti = cutoffs.MultiXactCutoff;
- /* Reset rd_toastoid just to be tidy --- it shouldn't be looked at again */
+ /*
+ * Reset rd_toastoid just to be tidy --- it shouldn't be looked at again.
+ * In the CONCURRENTLY case, we need to set it again before applying the
+ * concurrent changes.
+ */
NewHeap->rd_toastoid = InvalidOid;
num_pages = RelationGetNumberOfBlocks(NewHeap);
@@ -1474,14 +1875,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
bool swap_toast_by_content,
bool check_constraints,
bool is_internal,
+ bool reindex,
TransactionId frozenXid,
MultiXactId cutoffMulti,
char newrelpersistence)
{
ObjectAddress object;
Oid mapped_tables[4];
- int reindex_flags;
- ReindexParams reindex_params = {0};
int i;
/* Report that we are now swapping relation files */
@@ -1507,39 +1907,47 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (is_system_catalog)
CacheInvalidateCatalog(OIDOldHeap);
- /*
- * Rebuild each index on the relation (but not the toast table, which is
- * all-new at this point). It is important to do this before the DROP
- * step because if we are processing a system catalog that will be used
- * during DROP, we want to have its indexes available. There is no
- * advantage to the other order anyway because this is all transactional,
- * so no chance to reclaim disk space before commit. We do not need a
- * final CommandCounterIncrement() because reindex_relation does it.
- *
- * Note: because index_build is called via reindex_relation, it will never
- * set indcheckxmin true for the indexes. This is OK even though in some
- * sense we are building new indexes rather than rebuilding existing ones,
- * because the new heap won't contain any HOT chains at all, let alone
- * broken ones, so it can't be necessary to set indcheckxmin.
- */
- reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
- if (check_constraints)
- reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+ if (reindex)
+ {
+ int reindex_flags;
+ ReindexParams reindex_params = {0};
- /*
- * Ensure that the indexes have the same persistence as the parent
- * relation.
- */
- if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
- reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
- reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
+ /*
+ * Rebuild each index on the relation (but not the toast table, which
+ * is all-new at this point). It is important to do this before the
+ * DROP step because if we are processing a system catalog that will
+ * be used during DROP, we want to have its indexes available. There
+ * is no advantage to the other order anyway because this is all
+ * transactional, so no chance to reclaim disk space before commit. We
+ * do not need a final CommandCounterIncrement() because
+ * reindex_relation does it.
+ *
+ * Note: because index_build is called via reindex_relation, it will
+ * never set indcheckxmin true for the indexes. This is OK even
+ * though in some sense we are building new indexes rather than
+ * rebuilding existing ones, because the new heap won't contain any
+ * HOT chains at all, let alone broken ones, so it can't be necessary
+ * to set indcheckxmin.
+ */
+ reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
+ if (check_constraints)
+ reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
- /* Report that we are now reindexing relations */
- pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
- PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+ /*
+ * Ensure that the indexes have the same persistence as the parent
+ * relation.
+ */
+ if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+ reindex_flags |= REINDEX_REL_FORCE_INDEXES_UNLOGGED;
+ else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
- reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+ /* Report that we are now reindexing relations */
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+ reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
+ }
/* Report that we are now doing clean up */
pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
@@ -1881,7 +2289,8 @@ cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
* resolve in this case.
*/
static Relation
-process_single_relation(RepackStmt *stmt, ClusterParams *params)
+process_single_relation(RepackStmt *stmt, LOCKMODE lockmode, bool isTopLevel,
+ ClusterParams *params)
{
Relation rel;
Oid tableOid;
@@ -1890,13 +2299,9 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
stmt->command == REPACK_COMMAND_REPACK);
- /*
- * Find, lock, and check permissions on the table. We obtain
- * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
- * single-transaction case.
- */
+ /* Find, lock, and check permissions on the table. */
tableOid = RangeVarGetRelidExtended(stmt->relation,
- AccessExclusiveLock,
+ lockmode,
0,
RangeVarCallbackMaintainsTable,
NULL);
@@ -1922,26 +2327,17 @@ process_single_relation(RepackStmt *stmt, ClusterParams *params)
return rel;
else
{
- Oid indexOid;
+ Oid indexOid = InvalidOid;
- indexOid = determine_clustered_index(rel, stmt->usingindex,
- stmt->indexname);
- if (OidIsValid(indexOid))
- check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
- cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
-
- /* Do an analyze, if requested */
- if (params->options & CLUOPT_ANALYZE)
+ if (stmt->usingindex)
{
- VacuumParams vac_params = {0};
-
- vac_params.options |= VACOPT_ANALYZE;
- if (params->options & CLUOPT_VERBOSE)
- vac_params.options |= VACOPT_VERBOSE;
- analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
- NULL);
+ indexOid = determine_clustered_index(rel, stmt->usingindex,
+ stmt->indexname);
+ check_index_is_clusterable(rel, indexOid, lockmode);
}
+ cluster_rel(stmt->command, stmt->usingindex, rel, indexOid,
+ params, isTopLevel);
return NULL;
}
}
@@ -1998,3 +2394,1048 @@ determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
return indexOid;
}
+
+
+/*
+ * Call this function before REPACK CONCURRENTLY starts to setup logical
+ * decoding. It makes sure that other users of the table put enough
+ * information into WAL.
+ *
+ * The point is that at various places we expect that the table we're
+ * processing is treated like a system catalog. For example, we need to be
+ * able to scan it using a "historic snapshot" anytime during the processing
+ * (as opposed to scanning only at the start point of the decoding, as logical
+ * replication does during initial table synchronization), in order to apply
+ * concurrent UPDATE / DELETE commands.
+ *
+ * Note that TOAST table needs no attention here as it's not scanned using
+ * historic snapshot.
+ */
+static void
+begin_concurrent_repack(Relation rel)
+{
+ Oid toastrelid;
+
+ /* Avoid logical decoding of other relations by this backend. */
+ repacked_rel_locator = rel->rd_locator;
+ toastrelid = rel->rd_rel->reltoastrelid;
+ if (OidIsValid(toastrelid))
+ {
+ Relation toastrel;
+
+ /* Avoid logical decoding of other TOAST relations. */
+ toastrel = table_open(toastrelid, AccessShareLock);
+ repacked_rel_toast_locator = toastrel->rd_locator;
+ table_close(toastrel, AccessShareLock);
+ }
+}
+
+/*
+ * Call this when done with REPACK CONCURRENTLY.
+ */
+static void
+end_concurrent_repack(void)
+{
+ /*
+ * Restore normal function of (future) logical decoding for this backend.
+ */
+ repacked_rel_locator.relNumber = InvalidOid;
+ repacked_rel_toast_locator.relNumber = InvalidOid;
+}
+
+/*
+ * This function is much like pg_create_logical_replication_slot() except that
+ * the new slot is neither released (if anyone else could read changes from
+ * our slot, we could miss changes other backends do while we copy the
+ * existing data into temporary table), nor persisted (it's easier to handle
+ * crash by restarting all the work from scratch).
+ */
+static LogicalDecodingContext *
+setup_logical_decoding(Oid relid, const char *slotname, TupleDesc tupdesc)
+{
+ LogicalDecodingContext *ctx;
+ RepackDecodingState *dstate;
+
+ /*
+ * Check if we can use logical decoding.
+ */
+ CheckSlotPermissions();
+ CheckLogicalDecodingRequirements();
+
+ /* RS_TEMPORARY so that the slot gets cleaned up on ERROR. */
+ ReplicationSlotCreate(slotname, true, RS_TEMPORARY, false, false, false);
+
+ /*
+ * Neither prepare_write nor do_write callback nor update_progress is
+ * useful for us.
+ */
+ ctx = CreateInitDecodingContext(REPL_PLUGIN_NAME,
+ NIL,
+ true,
+ InvalidXLogRecPtr,
+ XL_ROUTINE(.page_read = read_local_xlog_page,
+ .segment_open = wal_segment_open,
+ .segment_close = wal_segment_close),
+ NULL, NULL, NULL);
+
+ /*
+ * We don't have control on setting fast_forward, so at least check it.
+ */
+ Assert(!ctx->fast_forward);
+
+ DecodingContextFindStartpoint(ctx);
+
+ /* Some WAL records should have been read. */
+ Assert(ctx->reader->EndRecPtr != InvalidXLogRecPtr);
+
+ XLByteToSeg(ctx->reader->EndRecPtr, repack_current_segment,
+ wal_segment_size);
+
+ /*
+ * Setup structures to store decoded changes.
+ */
+ dstate = palloc0(sizeof(RepackDecodingState));
+ dstate->relid = relid;
+ dstate->tstore = tuplestore_begin_heap(false, false,
+ maintenance_work_mem);
+
+ dstate->tupdesc = tupdesc;
+
+ /* Initialize the descriptor to store the changes ... */
+ dstate->tupdesc_change = CreateTemplateTupleDesc(1);
+
+ TupleDescInitEntry(dstate->tupdesc_change, 1, NULL, BYTEAOID, -1, 0);
+ /* ... as well as the corresponding slot. */
+ dstate->tsslot = MakeSingleTupleTableSlot(dstate->tupdesc_change,
+ &TTSOpsMinimalTuple);
+
+ dstate->resowner = ResourceOwnerCreate(CurrentResourceOwner,
+ "logical decoding");
+
+ ctx->output_writer_private = dstate;
+ return ctx;
+}
+
+/*
+ * Retrieve tuple from ConcurrentChange structure.
+ *
+ * The input data starts with the structure but it might not be appropriately
+ * aligned.
+ */
+static HeapTuple
+get_changed_tuple(char *change)
+{
+ HeapTupleData tup_data;
+ HeapTuple result;
+ char *src;
+
+ /*
+ * Ensure alignment before accessing the fields. (This is why we can't use
+ * heap_copytuple() instead of this function.)
+ */
+ src = change + offsetof(ConcurrentChange, tup_data);
+ memcpy(&tup_data, src, sizeof(HeapTupleData));
+
+ result = (HeapTuple) palloc(HEAPTUPLESIZE + tup_data.t_len);
+ memcpy(result, &tup_data, sizeof(HeapTupleData));
+ result->t_data = (HeapTupleHeader) ((char *) result + HEAPTUPLESIZE);
+ src = change + SizeOfConcurrentChange;
+ memcpy(result->t_data, src, result->t_len);
+
+ return result;
+}
+
+/*
+ * Decode logical changes from the WAL sequence up to end_of_wal.
+ */
+void
+repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+ XLogRecPtr end_of_wal)
+{
+ RepackDecodingState *dstate;
+ ResourceOwner resowner_old;
+
+ /*
+ * Invalidate the "present" cache before moving to "(recent) history".
+ */
+ InvalidateSystemCaches();
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+ resowner_old = CurrentResourceOwner;
+ CurrentResourceOwner = dstate->resowner;
+
+ PG_TRY();
+ {
+ while (ctx->reader->EndRecPtr < end_of_wal)
+ {
+ XLogRecord *record;
+ XLogSegNo segno_new;
+ char *errm = NULL;
+ XLogRecPtr end_lsn;
+
+ record = XLogReadRecord(ctx->reader, &errm);
+ if (errm)
+ elog(ERROR, "%s", errm);
+
+ if (record != NULL)
+ LogicalDecodingProcessRecord(ctx, ctx->reader);
+
+ /*
+ * If WAL segment boundary has been crossed, inform the decoding
+ * system that the catalog_xmin can advance. (We can confirm more
+ * often, but a filling a single WAL segment should not take much
+ * time.)
+ */
+ end_lsn = ctx->reader->EndRecPtr;
+ XLByteToSeg(end_lsn, segno_new, wal_segment_size);
+ if (segno_new != repack_current_segment)
+ {
+ LogicalConfirmReceivedLocation(end_lsn);
+ elog(DEBUG1, "REPACK: confirmed receive location %X/%X",
+ (uint32) (end_lsn >> 32), (uint32) end_lsn);
+ repack_current_segment = segno_new;
+ }
+
+ CHECK_FOR_INTERRUPTS();
+ }
+ InvalidateSystemCaches();
+ CurrentResourceOwner = resowner_old;
+ }
+ PG_CATCH();
+ {
+ /* clear all timetravel entries */
+ InvalidateSystemCaches();
+ CurrentResourceOwner = resowner_old;
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+}
+
+/*
+ * Apply changes that happened during the initial load.
+ *
+ * Scan key is passed by caller, so it does not have to be constructed
+ * multiple times. Key entries have all fields initialized, except for
+ * sk_argument.
+ */
+static void
+apply_concurrent_changes(RepackDecodingState *dstate, Relation rel,
+ ScanKey key, int nkeys, IndexInsertState *iistate)
+{
+ TupleTableSlot *index_slot,
+ *ident_slot;
+ HeapTuple tup_old = NULL;
+
+ if (dstate->nchanges == 0)
+ return;
+
+ /* TupleTableSlot is needed to pass the tuple to ExecInsertIndexTuples(). */
+ index_slot = MakeSingleTupleTableSlot(dstate->tupdesc, &TTSOpsHeapTuple);
+
+ /* A slot to fetch tuples from identity index. */
+ ident_slot = table_slot_create(rel, NULL);
+
+ while (tuplestore_gettupleslot(dstate->tstore, true, false,
+ dstate->tsslot))
+ {
+ bool shouldFree;
+ HeapTuple tup_change,
+ tup,
+ tup_exist;
+ char *change_raw,
+ *src;
+ ConcurrentChange change;
+ bool isnull[1];
+ Datum values[1];
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* Get the change from the single-column tuple. */
+ tup_change = ExecFetchSlotHeapTuple(dstate->tsslot, false, &shouldFree);
+ heap_deform_tuple(tup_change, dstate->tupdesc_change, values, isnull);
+ Assert(!isnull[0]);
+
+ /* Make sure we access aligned data. */
+ change_raw = (char *) DatumGetByteaP(values[0]);
+ src = (char *) VARDATA(change_raw);
+ memcpy(&change, src, SizeOfConcurrentChange);
+
+ /* TRUNCATE change contains no tuple, so process it separately. */
+ if (change.kind == CHANGE_TRUNCATE)
+ {
+ /*
+ * All the things that ExecuteTruncateGuts() does (such as firing
+ * triggers or handling the DROP_CASCADE behavior) should have
+ * taken place on the source relation. Thus we only do the actual
+ * truncation of the new relation (and its indexes).
+ */
+ heap_truncate_one_rel(rel);
+
+ pfree(tup_change);
+ continue;
+ }
+
+ /*
+ * Extract the tuple from the change. The tuple is copied here because
+ * it might be assigned to 'tup_old', in which case it needs to
+ * survive into the next iteration.
+ */
+ tup = get_changed_tuple(src);
+
+ if (change.kind == CHANGE_UPDATE_OLD)
+ {
+ Assert(tup_old == NULL);
+ tup_old = tup;
+ }
+ else if (change.kind == CHANGE_INSERT)
+ {
+ Assert(tup_old == NULL);
+
+ apply_concurrent_insert(rel, &change, tup, iistate, index_slot);
+
+ pfree(tup);
+ }
+ else if (change.kind == CHANGE_UPDATE_NEW ||
+ change.kind == CHANGE_DELETE)
+ {
+ IndexScanDesc ind_scan = NULL;
+ HeapTuple tup_key;
+
+ if (change.kind == CHANGE_UPDATE_NEW)
+ {
+ tup_key = tup_old != NULL ? tup_old : tup;
+ }
+ else
+ {
+ Assert(tup_old == NULL);
+ tup_key = tup;
+ }
+
+ /*
+ * Find the tuple to be updated or deleted.
+ */
+ tup_exist = find_target_tuple(rel, key, nkeys, tup_key,
+ iistate, ident_slot, &ind_scan);
+ if (tup_exist == NULL)
+ elog(ERROR, "Failed to find target tuple");
+
+ if (change.kind == CHANGE_UPDATE_NEW)
+ apply_concurrent_update(rel, tup, tup_exist, &change, iistate,
+ index_slot);
+ else
+ apply_concurrent_delete(rel, tup_exist, &change);
+
+ if (tup_old != NULL)
+ {
+ pfree(tup_old);
+ tup_old = NULL;
+ }
+
+ pfree(tup);
+ index_endscan(ind_scan);
+ }
+ else
+ elog(ERROR, "Unrecognized kind of change: %d", change.kind);
+
+ /*
+ * If a change was applied now, increment CID for next writes and
+ * update the snapshot so it sees the changes we've applied so far.
+ */
+ if (change.kind != CHANGE_UPDATE_OLD)
+ {
+ CommandCounterIncrement();
+ UpdateActiveSnapshotCommandId();
+ }
+
+ /* TTSOpsMinimalTuple has .get_heap_tuple==NULL. */
+ Assert(shouldFree);
+ pfree(tup_change);
+ }
+
+ tuplestore_clear(dstate->tstore);
+ dstate->nchanges = 0;
+
+ /* Cleanup. */
+ ExecDropSingleTupleTableSlot(index_slot);
+ ExecDropSingleTupleTableSlot(ident_slot);
+}
+
+static void
+apply_concurrent_insert(Relation rel, ConcurrentChange *change, HeapTuple tup,
+ IndexInsertState *iistate, TupleTableSlot *index_slot)
+{
+ List *recheck;
+
+
+ /*
+ * Like simple_heap_insert(), but make sure that the INSERT is not
+ * logically decoded - see reform_and_rewrite_tuple() for more
+ * information.
+ */
+ heap_insert(rel, tup, GetCurrentCommandId(true), HEAP_INSERT_NO_LOGICAL,
+ NULL);
+
+ /*
+ * Update indexes.
+ *
+ * In case functions in the index need the active snapshot and caller
+ * hasn't set one.
+ */
+ ExecStoreHeapTuple(tup, index_slot, false);
+ recheck = ExecInsertIndexTuples(iistate->rri,
+ index_slot,
+ iistate->estate,
+ false, /* update */
+ false, /* noDupErr */
+ NULL, /* specConflict */
+ NIL, /* arbiterIndexes */
+ false /* onlySummarizing */
+ );
+
+ /*
+ * If recheck is required, it must have been preformed on the source
+ * relation by now. (All the logical changes we process here are already
+ * committed.)
+ */
+ list_free(recheck);
+
+ pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_INSERTED, 1);
+}
+
+static void
+apply_concurrent_update(Relation rel, HeapTuple tup, HeapTuple tup_target,
+ ConcurrentChange *change, IndexInsertState *iistate,
+ TupleTableSlot *index_slot)
+{
+ LockTupleMode lockmode;
+ TM_FailureData tmfd;
+ TU_UpdateIndexes update_indexes;
+ TM_Result res;
+ List *recheck;
+
+ /*
+ * Write the new tuple into the new heap. ('tup' gets the TID assigned
+ * here.)
+ *
+ * Do it like in simple_heap_update(), except for 'wal_logical' (and
+ * except for 'wait').
+ */
+ res = heap_update(rel, &tup_target->t_self, tup,
+ GetCurrentCommandId(true),
+ InvalidSnapshot,
+ false, /* no wait - only we are doing changes */
+ &tmfd, &lockmode, &update_indexes,
+ false /* wal_logical */ );
+ if (res != TM_Ok)
+ ereport(ERROR, (errmsg("failed to apply concurrent UPDATE")));
+
+ ExecStoreHeapTuple(tup, index_slot, false);
+
+ if (update_indexes != TU_None)
+ {
+ recheck = ExecInsertIndexTuples(iistate->rri,
+ index_slot,
+ iistate->estate,
+ true, /* update */
+ false, /* noDupErr */
+ NULL, /* specConflict */
+ NIL, /* arbiterIndexes */
+ /* onlySummarizing */
+ update_indexes == TU_Summarizing);
+ list_free(recheck);
+ }
+
+ pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_UPDATED, 1);
+}
+
+static void
+apply_concurrent_delete(Relation rel, HeapTuple tup_target,
+ ConcurrentChange *change)
+{
+ TM_Result res;
+ TM_FailureData tmfd;
+
+ /*
+ * Delete tuple from the new heap.
+ *
+ * Do it like in simple_heap_delete(), except for 'wal_logical' (and
+ * except for 'wait').
+ */
+ res = heap_delete(rel, &tup_target->t_self, GetCurrentCommandId(true),
+ InvalidSnapshot, false,
+ &tmfd,
+ false, /* no wait - only we are doing changes */
+ false /* wal_logical */ );
+
+ if (res != TM_Ok)
+ ereport(ERROR, (errmsg("failed to apply concurrent DELETE")));
+
+ pgstat_progress_incr_param(PROGRESS_REPACK_HEAP_TUPLES_DELETED, 1);
+}
+
+/*
+ * Find the tuple to be updated or deleted.
+ *
+ * 'key' is a pre-initialized scan key, into which the function will put the
+ * key values.
+ *
+ * 'tup_key' is a tuple containing the key values for the scan.
+ *
+ * On exit,'*scan_p' contains the scan descriptor used. The caller must close
+ * it when he no longer needs the tuple returned.
+ */
+static HeapTuple
+find_target_tuple(Relation rel, ScanKey key, int nkeys, HeapTuple tup_key,
+ IndexInsertState *iistate,
+ TupleTableSlot *ident_slot, IndexScanDesc *scan_p)
+{
+ IndexScanDesc scan;
+ Form_pg_index ident_form;
+ int2vector *ident_indkey;
+ HeapTuple result = NULL;
+
+ /* XXX no instrumentation for now */
+ scan = index_beginscan(rel, iistate->ident_index, GetActiveSnapshot(),
+ NULL, nkeys, 0);
+ *scan_p = scan;
+ index_rescan(scan, key, nkeys, NULL, 0);
+
+ /* Info needed to retrieve key values from heap tuple. */
+ ident_form = iistate->ident_index->rd_index;
+ ident_indkey = &ident_form->indkey;
+
+ /* Use the incoming tuple to finalize the scan key. */
+ for (int i = 0; i < scan->numberOfKeys; i++)
+ {
+ ScanKey entry;
+ bool isnull;
+ int16 attno_heap;
+
+ entry = &scan->keyData[i];
+ attno_heap = ident_indkey->values[i];
+ entry->sk_argument = heap_getattr(tup_key,
+ attno_heap,
+ rel->rd_att,
+ &isnull);
+ Assert(!isnull);
+ }
+ if (index_getnext_slot(scan, ForwardScanDirection, ident_slot))
+ {
+ bool shouldFree;
+
+ result = ExecFetchSlotHeapTuple(ident_slot, false, &shouldFree);
+ /* TTSOpsBufferHeapTuple has .get_heap_tuple != NULL. */
+ Assert(!shouldFree);
+ }
+
+ return result;
+}
+
+/*
+ * Decode and apply concurrent changes.
+ *
+ * Pass rel_src iff its reltoastrelid is needed.
+ */
+static void
+process_concurrent_changes(LogicalDecodingContext *ctx, XLogRecPtr end_of_wal,
+ Relation rel_dst, Relation rel_src, ScanKey ident_key,
+ int ident_key_nentries, IndexInsertState *iistate)
+{
+ RepackDecodingState *dstate;
+
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_CATCH_UP);
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+ repack_decode_concurrent_changes(ctx, end_of_wal);
+
+ if (dstate->nchanges == 0)
+ return;
+
+ PG_TRY();
+ {
+ /*
+ * Make sure that TOAST values can eventually be accessed via the old
+ * relation - see comment in copy_table_data().
+ */
+ if (rel_src)
+ rel_dst->rd_toastoid = rel_src->rd_rel->reltoastrelid;
+
+ apply_concurrent_changes(dstate, rel_dst, ident_key,
+ ident_key_nentries, iistate);
+ }
+ PG_FINALLY();
+ {
+ if (rel_src)
+ rel_dst->rd_toastoid = InvalidOid;
+ }
+ PG_END_TRY();
+}
+
+static IndexInsertState *
+get_index_insert_state(Relation relation, Oid ident_index_id)
+{
+ EState *estate;
+ int i;
+ IndexInsertState *result;
+
+ result = (IndexInsertState *) palloc0(sizeof(IndexInsertState));
+ estate = CreateExecutorState();
+
+ result->rri = (ResultRelInfo *) palloc(sizeof(ResultRelInfo));
+ InitResultRelInfo(result->rri, relation, 0, 0, 0);
+ ExecOpenIndices(result->rri, false);
+
+ /*
+ * Find the relcache entry of the identity index so that we spend no extra
+ * effort to open / close it.
+ */
+ for (i = 0; i < result->rri->ri_NumIndices; i++)
+ {
+ Relation ind_rel;
+
+ ind_rel = result->rri->ri_IndexRelationDescs[i];
+ if (ind_rel->rd_id == ident_index_id)
+ result->ident_index = ind_rel;
+ }
+ if (result->ident_index == NULL)
+ elog(ERROR, "Failed to open identity index");
+
+ /* Only initialize fields needed by ExecInsertIndexTuples(). */
+ result->estate = estate;
+
+ return result;
+}
+
+/*
+ * Build scan key to process logical changes.
+ */
+static ScanKey
+build_identity_key(Oid ident_idx_oid, Relation rel_src, int *nentries)
+{
+ Relation ident_idx_rel;
+ Form_pg_index ident_idx;
+ int n,
+ i;
+ ScanKey result;
+
+ Assert(OidIsValid(ident_idx_oid));
+ ident_idx_rel = index_open(ident_idx_oid, AccessShareLock);
+ ident_idx = ident_idx_rel->rd_index;
+ n = ident_idx->indnatts;
+ result = (ScanKey) palloc(sizeof(ScanKeyData) * n);
+ for (i = 0; i < n; i++)
+ {
+ ScanKey entry;
+ int16 relattno;
+ Form_pg_attribute att;
+ Oid opfamily,
+ opcintype,
+ opno,
+ opcode;
+
+ entry = &result[i];
+ relattno = ident_idx->indkey.values[i];
+ if (relattno >= 1)
+ {
+ TupleDesc desc;
+
+ desc = rel_src->rd_att;
+ att = TupleDescAttr(desc, relattno - 1);
+ }
+ else
+ elog(ERROR, "Unexpected attribute number %d in index", relattno);
+
+ opfamily = ident_idx_rel->rd_opfamily[i];
+ opcintype = ident_idx_rel->rd_opcintype[i];
+ opno = get_opfamily_member(opfamily, opcintype, opcintype,
+ BTEqualStrategyNumber);
+
+ if (!OidIsValid(opno))
+ elog(ERROR, "Failed to find = operator for type %u", opcintype);
+
+ opcode = get_opcode(opno);
+ if (!OidIsValid(opcode))
+ elog(ERROR, "Failed to find = operator for operator %u", opno);
+
+ /* Initialize everything but argument. */
+ ScanKeyInit(entry,
+ i + 1,
+ BTEqualStrategyNumber, opcode,
+ (Datum) NULL);
+ entry->sk_collation = att->attcollation;
+ }
+ index_close(ident_idx_rel, AccessShareLock);
+
+ *nentries = n;
+ return result;
+}
+
+static void
+free_index_insert_state(IndexInsertState *iistate)
+{
+ ExecCloseIndices(iistate->rri);
+ FreeExecutorState(iistate->estate);
+ pfree(iistate->rri);
+ pfree(iistate);
+}
+
+static void
+cleanup_logical_decoding(LogicalDecodingContext *ctx)
+{
+ RepackDecodingState *dstate;
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+ ExecDropSingleTupleTableSlot(dstate->tsslot);
+ FreeTupleDesc(dstate->tupdesc_change);
+ FreeTupleDesc(dstate->tupdesc);
+ tuplestore_end(dstate->tstore);
+
+ FreeDecodingContext(ctx);
+}
+
+/*
+ * The final steps of rebuild_relation() for concurrent processing.
+ *
+ * On entry, NewHeap is locked in AccessExclusiveLock mode. OldHeap and its
+ * clustering index (if one is passed) are still locked in a mode that allows
+ * concurrent data changes. On exit, both tables and their indexes are closed,
+ * but locked in AccessExclusiveLock mode.
+ */
+static void
+rebuild_relation_finish_concurrent(Relation NewHeap, Relation OldHeap,
+ Relation cl_index,
+ LogicalDecodingContext *ctx,
+ bool swap_toast_by_content,
+ TransactionId frozenXid,
+ MultiXactId cutoffMulti)
+{
+ LOCKMODE lockmode_old PG_USED_FOR_ASSERTS_ONLY;
+ List *ind_oids_new;
+ Oid old_table_oid = RelationGetRelid(OldHeap);
+ Oid new_table_oid = RelationGetRelid(NewHeap);
+ List *ind_oids_old = RelationGetIndexList(OldHeap);
+ ListCell *lc,
+ *lc2;
+ char relpersistence;
+ bool is_system_catalog;
+ Oid ident_idx_old,
+ ident_idx_new;
+ IndexInsertState *iistate;
+ ScanKey ident_key;
+ int ident_key_nentries;
+ XLogRecPtr wal_insert_ptr,
+ end_of_wal;
+ char dummy_rec_data = '\0';
+ Relation *ind_refs,
+ *ind_refs_p;
+ int nind;
+
+ /* Like in cluster_rel(). */
+ lockmode_old = ShareUpdateExclusiveLock;
+ Assert(CheckRelationLockedByMe(OldHeap, lockmode_old, false));
+ Assert(cl_index == NULL ||
+ CheckRelationLockedByMe(cl_index, lockmode_old, false));
+ /* This is expected from the caller. */
+ Assert(CheckRelationLockedByMe(NewHeap, AccessExclusiveLock, false));
+
+ ident_idx_old = RelationGetReplicaIndex(OldHeap);
+
+ /*
+ * Unlike the exclusive case, we build new indexes for the new relation
+ * rather than swapping the storage and reindexing the old relation. The
+ * point is that the index build can take some time, so we do it before we
+ * get AccessExclusiveLock on the old heap and therefore we cannot swap
+ * the heap storage yet.
+ *
+ * index_create() will lock the new indexes using AccessExclusiveLock - no
+ * need to change that.
+ *
+ * We assume that ShareUpdateExclusiveLock on the table prevents anyone
+ * from dropping the existing indexes or adding new ones, so the lists of
+ * old and new indexes should match at the swap time. On the other hand we
+ * do not block ALTER INDEX commands that do not require table lock (e.g.
+ * ALTER INDEX ... SET ...).
+ *
+ * XXX Should we check a the end of our work if another transaction
+ * executed such a command and issue a NOTICE that we might have discarded
+ * its effects? (For example, someone changes storage parameter after we
+ * have created the new index, the new value of that parameter is lost.)
+ * Alternatively, we can lock all the indexes now in a mode that blocks
+ * all the ALTER INDEX commands (ShareUpdateExclusiveLock ?), and keep
+ * them locked till the end of the transactions. That might increase the
+ * risk of deadlock during the lock upgrade below, however SELECT / DML
+ * queries should not be involved in such a deadlock.
+ */
+ ind_oids_new = build_new_indexes(NewHeap, OldHeap, ind_oids_old);
+
+ /*
+ * Processing shouldn't start w/o valid identity index.
+ */
+ Assert(OidIsValid(ident_idx_old));
+
+ /* Find "identity index" on the new relation. */
+ ident_idx_new = InvalidOid;
+ forboth(lc, ind_oids_old, lc2, ind_oids_new)
+ {
+ Oid ind_old = lfirst_oid(lc);
+ Oid ind_new = lfirst_oid(lc2);
+
+ if (ident_idx_old == ind_old)
+ {
+ ident_idx_new = ind_new;
+ break;
+ }
+ }
+ if (!OidIsValid(ident_idx_new))
+
+ /*
+ * Should not happen, given our lock on the old relation.
+ */
+ ereport(ERROR,
+ (errmsg("Identity index missing on the new relation")));
+
+ /* Executor state to update indexes. */
+ iistate = get_index_insert_state(NewHeap, ident_idx_new);
+
+ /*
+ * Build scan key that we'll use to look for rows to be updated / deleted
+ * during logical decoding.
+ */
+ ident_key = build_identity_key(ident_idx_new, OldHeap, &ident_key_nentries);
+
+ /*
+ * During testing, wait for another backend to perform concurrent data
+ * changes which we will process below.
+ */
+ INJECTION_POINT("repack-concurrently-before-lock", NULL);
+
+ /*
+ * Flush all WAL records inserted so far (possibly except for the last
+ * incomplete page, see GetInsertRecPtr), to minimize the amount of data
+ * we need to flush while holding exclusive lock on the source table.
+ */
+ wal_insert_ptr = GetInsertRecPtr();
+ XLogFlush(wal_insert_ptr);
+ end_of_wal = GetFlushRecPtr(NULL);
+
+ /*
+ * Apply concurrent changes first time, to minimize the time we need to
+ * hold AccessExclusiveLock. (Quite some amount of WAL could have been
+ * written during the data copying and index creation.)
+ */
+ process_concurrent_changes(ctx, end_of_wal, NewHeap,
+ swap_toast_by_content ? OldHeap : NULL,
+ ident_key, ident_key_nentries, iistate);
+
+ /*
+ * Acquire AccessExclusiveLock on the table, its TOAST relation (if there
+ * is one), all its indexes, so that we can swap the files.
+ *
+ * Before that, unlock the index temporarily to avoid deadlock in case
+ * another transaction is trying to lock it while holding the lock on the
+ * table.
+ */
+ if (cl_index)
+ {
+ index_close(cl_index, ShareUpdateExclusiveLock);
+ cl_index = NULL;
+ }
+ /* For the same reason, unlock TOAST relation. */
+ if (OldHeap->rd_rel->reltoastrelid)
+ LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+ /* Finally lock the table */
+ LockRelationOid(old_table_oid, AccessExclusiveLock);
+
+ /*
+ * Lock all indexes now, not only the clustering one: all indexes need to
+ * have their files swapped. While doing that, store their relation
+ * references in an array, to handle predicate locks below.
+ */
+ ind_refs_p = ind_refs = palloc_array(Relation, list_length(ind_oids_old));
+ nind = 0;
+ foreach(lc, ind_oids_old)
+ {
+ Oid ind_oid;
+ Relation index;
+
+ ind_oid = lfirst_oid(lc);
+ index = index_open(ind_oid, AccessExclusiveLock);
+
+ /*
+ * TODO 1) Do we need to check if ALTER INDEX was executed since the
+ * new index was created in build_new_indexes()? 2) Specifically for
+ * the clustering index, should check_index_is_clusterable() be called
+ * here? (Not sure about the latter: ShareUpdateExclusiveLock on the
+ * table probably blocks all commands that affect the result of
+ * check_index_is_clusterable().)
+ */
+ *ind_refs_p = index;
+ ind_refs_p++;
+ nind++;
+ }
+
+ /*
+ * In addition, lock the OldHeap's TOAST relation exclusively - again, the
+ * lock is needed to swap the files.
+ */
+ if (OidIsValid(OldHeap->rd_rel->reltoastrelid))
+ LockRelationOid(OldHeap->rd_rel->reltoastrelid, AccessExclusiveLock);
+
+ /*
+ * Tuples and pages of the old heap will be gone, but the heap will stay.
+ */
+ TransferPredicateLocksToHeapRelation(OldHeap);
+ /* The same for indexes. */
+ for (int i = 0; i < nind; i++)
+ {
+ Relation index = ind_refs[i];
+
+ TransferPredicateLocksToHeapRelation(index);
+
+ /*
+ * References to indexes on the old relation are not needed anymore,
+ * however locks stay till the end of the transaction.
+ */
+ index_close(index, NoLock);
+ }
+ pfree(ind_refs);
+
+ /*
+ * Flush anything we see in WAL, to make sure that all changes committed
+ * while we were waiting for the exclusive lock are available for
+ * decoding. This should not be necessary if all backends had
+ * synchronous_commit set, but we can't rely on this setting.
+ *
+ * Unfortunately, GetInsertRecPtr() may lag behind the actual insert
+ * position, and GetLastImportantRecPtr() points at the start of the last
+ * record rather than at the end. Thus the simplest way to determine the
+ * insert position is to insert a dummy record and use its LSN.
+ *
+ * XXX Consider using GetLastImportantRecPtr() and adding the size of the
+ * last record (plus the total size of all the page headers the record
+ * spans)?
+ */
+ XLogBeginInsert();
+ XLogRegisterData(&dummy_rec_data, 1);
+ wal_insert_ptr = XLogInsert(RM_XLOG_ID, XLOG_NOOP);
+ XLogFlush(wal_insert_ptr);
+ end_of_wal = GetFlushRecPtr(NULL);
+
+ /* Apply the concurrent changes again. */
+ process_concurrent_changes(ctx, end_of_wal, NewHeap,
+ swap_toast_by_content ? OldHeap : NULL,
+ ident_key, ident_key_nentries, iistate);
+
+ /* Remember info about rel before closing OldHeap */
+ relpersistence = OldHeap->rd_rel->relpersistence;
+ is_system_catalog = IsSystemRelation(OldHeap);
+
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
+
+ /*
+ * Even ShareUpdateExclusiveLock should have prevented others from
+ * creating / dropping indexes (even using the CONCURRENTLY option), so we
+ * do not need to check whether the lists match.
+ */
+ forboth(lc, ind_oids_old, lc2, ind_oids_new)
+ {
+ Oid ind_old = lfirst_oid(lc);
+ Oid ind_new = lfirst_oid(lc2);
+ Oid mapped_tables[4];
+
+ /* Zero out possible results from swapped_relation_files */
+ memset(mapped_tables, 0, sizeof(mapped_tables));
+
+ swap_relation_files(ind_old, ind_new,
+ (old_table_oid == RelationRelationId),
+ swap_toast_by_content,
+ true,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+
+#ifdef USE_ASSERT_CHECKING
+
+ /*
+ * Concurrent processing is not supported for system relations, so
+ * there should be no mapped tables.
+ */
+ for (int i = 0; i < 4; i++)
+ Assert(mapped_tables[i] == 0);
+#endif
+ }
+
+ /* The new indexes must be visible for deletion. */
+ CommandCounterIncrement();
+
+ /* Close the old heap but keep lock until transaction commit. */
+ table_close(OldHeap, NoLock);
+ /* Close the new heap. (We didn't have to open its indexes). */
+ table_close(NewHeap, NoLock);
+
+ /* Cleanup what we don't need anymore. (And close the identity index.) */
+ pfree(ident_key);
+ free_index_insert_state(iistate);
+
+ /*
+ * Swap the relations and their TOAST relations and TOAST indexes. This
+ * also drops the new relation and its indexes.
+ *
+ * (System catalogs are currently not supported.)
+ */
+ Assert(!is_system_catalog);
+ finish_heap_swap(old_table_oid, new_table_oid,
+ is_system_catalog,
+ swap_toast_by_content,
+ false, true, false,
+ frozenXid, cutoffMulti,
+ relpersistence);
+}
+
+/*
+ * Build indexes on NewHeap according to those on OldHeap.
+ *
+ * OldIndexes is the list of index OIDs on OldHeap.
+ *
+ * A list of OIDs of the corresponding indexes created on NewHeap is
+ * returned. The order of items does match, so we can use these arrays to swap
+ * index storage.
+ */
+static List *
+build_new_indexes(Relation NewHeap, Relation OldHeap, List *OldIndexes)
+{
+ ListCell *lc;
+ List *result = NIL;
+
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_REBUILD_INDEX);
+
+ foreach(lc, OldIndexes)
+ {
+ Oid ind_oid,
+ ind_oid_new;
+ char *newName;
+ Relation ind;
+
+ ind_oid = lfirst_oid(lc);
+ ind = index_open(ind_oid, AccessShareLock);
+
+ newName = ChooseRelationName(get_rel_name(ind_oid),
+ NULL,
+ "repacknew",
+ get_rel_namespace(ind->rd_index->indrelid),
+ false);
+ ind_oid_new = index_create_copy(NewHeap, ind_oid,
+ ind->rd_rel->reltablespace, newName,
+ false);
+ result = lappend_oid(result, ind_oid_new);
+
+ index_close(ind, AccessShareLock);
+ }
+
+ return result;
+}
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 188e26f0e6e..71b73c21ebf 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -904,7 +904,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
static void
refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
{
- finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
+ finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true, true,
RecentXmin, ReadNextMultiXactId(), relpersistence);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 082a3575d62..c79f5b1dc0f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5989,6 +5989,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode,
finish_heap_swap(tab->relid, OIDNewHeap,
false, false, true,
!OidIsValid(tab->newTableSpace),
+ true,
RecentXmin,
ReadNextMultiXactId(),
persistence);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 8863ad0e8bd..6de9d0ba39d 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -125,7 +125,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
TransactionId lastSaneFrozenXid,
MultiXactId lastSaneMinMulti);
static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
- BufferAccessStrategy bstrategy);
+ BufferAccessStrategy bstrategy, bool isTopLevel);
static double compute_parallel_delay(void);
static VacOptValue get_vacoptval_from_boolean(DefElem *def);
static bool vac_tid_reaped(ItemPointer itemptr, void *state);
@@ -633,7 +633,8 @@ vacuum(List *relations, const VacuumParams params, BufferAccessStrategy bstrateg
if (params.options & VACOPT_VACUUM)
{
- if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy))
+ if (!vacuum_rel(vrel->oid, vrel->relation, params, bstrategy,
+ isTopLevel))
continue;
}
@@ -1997,7 +1998,7 @@ vac_truncate_clog(TransactionId frozenXID,
*/
static bool
vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
- BufferAccessStrategy bstrategy)
+ BufferAccessStrategy bstrategy, bool isTopLevel)
{
LOCKMODE lmode;
Relation rel;
@@ -2288,7 +2289,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
- &cluster_params);
+ &cluster_params, isTopLevel);
/* cluster_rel closes the relation, but keeps lock */
rel = NULL;
@@ -2331,7 +2332,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
toast_vacuum_params.options |= VACOPT_PROCESS_MAIN;
toast_vacuum_params.toast_parent = relid;
- vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy);
+ vacuum_rel(toast_relid, NULL, toast_vacuum_params, bstrategy,
+ isTopLevel);
}
/*
diff --git a/src/backend/meson.build b/src/backend/meson.build
index b831a541652..5c148131217 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
subdir('jit/llvm')
subdir('replication/libpqwalreceiver')
subdir('replication/pgoutput')
+subdir('replication/pgoutput_repack')
subdir('snowball')
subdir('utils/mb/conversion_procs')
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index cc03f0706e9..5dc4ae58ffe 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -33,6 +33,7 @@
#include "access/xlogreader.h"
#include "access/xlogrecord.h"
#include "catalog/pg_control.h"
+#include "commands/cluster.h"
#include "replication/decode.h"
#include "replication/logical.h"
#include "replication/message.h"
@@ -472,6 +473,88 @@ heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
TransactionId xid = XLogRecGetXid(buf->record);
SnapBuild *builder = ctx->snapshot_builder;
+ /*
+ * If the change is not intended for logical decoding, do not even
+ * establish transaction for it - REPACK CONCURRENTLY is the typical use
+ * case.
+ *
+ * First, check if REPACK CONCURRENTLY is being performed by this backend.
+ * If so, only decode data changes of the table that it is processing, and
+ * the changes of its TOAST relation.
+ *
+ * (TOAST locator should not be set unless the main is.)
+ */
+ Assert(!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+ OidIsValid(repacked_rel_locator.relNumber));
+
+ if (OidIsValid(repacked_rel_locator.relNumber))
+ {
+ XLogReaderState *r = buf->record;
+ RelFileLocator locator;
+
+ /* Not all records contain the block. */
+ if (XLogRecGetBlockTagExtended(r, 0, &locator, NULL, NULL, NULL) &&
+ !RelFileLocatorEquals(locator, repacked_rel_locator) &&
+ (!OidIsValid(repacked_rel_toast_locator.relNumber) ||
+ !RelFileLocatorEquals(locator, repacked_rel_toast_locator)))
+ return;
+ }
+
+ /*
+ * Second, skip records which do not contain sufficient information for
+ * the decoding.
+ *
+ * The problem we solve here is that REPACK CONCURRENTLY generates WAL
+ * when doing changes in the new table. Those changes should not be useful
+ * for any other user (such as logical replication subscription) because
+ * the new table will eventually be dropped (after REPACK CONCURRENTLY has
+ * assigned its file to the "old table").
+ */
+ switch (info)
+ {
+ case XLOG_HEAP_INSERT:
+ {
+ xl_heap_insert *rec;
+
+ rec = (xl_heap_insert *) XLogRecGetData(buf->record);
+
+ /*
+ * This does happen when 1) raw_heap_insert marks the TOAST
+ * record as HEAP_INSERT_NO_LOGICAL, 2) REPACK CONCURRENTLY
+ * replays inserts performed by other backends.
+ */
+ if ((rec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE) == 0)
+ return;
+
+ break;
+ }
+
+ case XLOG_HEAP_HOT_UPDATE:
+ case XLOG_HEAP_UPDATE:
+ {
+ xl_heap_update *rec;
+
+ rec = (xl_heap_update *) XLogRecGetData(buf->record);
+ if ((rec->flags &
+ (XLH_UPDATE_CONTAINS_NEW_TUPLE |
+ XLH_UPDATE_CONTAINS_OLD_TUPLE |
+ XLH_UPDATE_CONTAINS_OLD_KEY)) == 0)
+ return;
+
+ break;
+ }
+
+ case XLOG_HEAP_DELETE:
+ {
+ xl_heap_delete *rec;
+
+ rec = (xl_heap_delete *) XLogRecGetData(buf->record);
+ if (rec->flags & XLH_DELETE_NO_LOGICAL)
+ return;
+ break;
+ }
+ }
+
ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
/*
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a2f1803622c..d69229905a2 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -486,6 +486,27 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
return SnapBuildMVCCFromHistoric(snap, true);
}
+/*
+ * Build an MVCC snapshot for the initial data load performed by REPACK
+ * CONCURRENTLY command.
+ *
+ * The snapshot will only be used to scan one particular relation, which is
+ * treated like a catalog (therefore ->building_full_snapshot is not
+ * important), and the caller should already have a replication slot setup (so
+ * we do not set MyProc->xmin). XXX Do we yet need to add some restrictions?
+ */
+Snapshot
+SnapBuildInitialSnapshotForRepack(SnapBuild *builder)
+{
+ Snapshot snap;
+
+ Assert(builder->state == SNAPBUILD_CONSISTENT);
+ Assert(builder->building_full_snapshot);
+
+ snap = SnapBuildBuildSnapshot(builder);
+ return SnapBuildMVCCFromHistoric(snap, false);
+}
+
/*
* Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
*
diff --git a/src/backend/replication/pgoutput_repack/Makefile b/src/backend/replication/pgoutput_repack/Makefile
new file mode 100644
index 00000000000..4efeb713b70
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/Makefile
@@ -0,0 +1,32 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Makefile for src/backend/replication/pgoutput_repack
+#
+# IDENTIFICATION
+# src/backend/replication/pgoutput_repack
+#
+#-------------------------------------------------------------------------
+
+subdir = src/backend/replication/pgoutput_repack
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+
+OBJS = \
+ $(WIN32RES) \
+ pgoutput_repack.o
+PGFILEDESC = "pgoutput_repack - logical replication output plugin for REPACK command"
+NAME = pgoutput_repack
+
+all: all-shared-lib
+
+include $(top_srcdir)/src/Makefile.shlib
+
+install: all installdirs install-lib
+
+installdirs: installdirs-lib
+
+uninstall: uninstall-lib
+
+clean distclean: clean-lib
+ rm -f $(OBJS)
diff --git a/src/backend/replication/pgoutput_repack/meson.build b/src/backend/replication/pgoutput_repack/meson.build
new file mode 100644
index 00000000000..133e865a4a0
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/meson.build
@@ -0,0 +1,18 @@
+# Copyright (c) 2022-2024, PostgreSQL Global Development Group
+
+pgoutput_repack_sources = files(
+ 'pgoutput_repack.c',
+)
+
+if host_system == 'windows'
+ pgoutput_repack_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+ '--NAME', 'pgoutput_repack',
+ '--FILEDESC', 'pgoutput_repack - logical replication output plugin for REPACK command',])
+endif
+
+pgoutput_repack = shared_module('pgoutput_repack',
+ pgoutput_repack_sources,
+ kwargs: pg_mod_args,
+)
+
+backend_targets += pgoutput_repack
diff --git a/src/backend/replication/pgoutput_repack/pgoutput_repack.c b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
new file mode 100644
index 00000000000..687fbbc59bb
--- /dev/null
+++ b/src/backend/replication/pgoutput_repack/pgoutput_repack.c
@@ -0,0 +1,288 @@
+/*-------------------------------------------------------------------------
+ *
+ * pgoutput_cluster.c
+ * Logical Replication output plugin for REPACK command
+ *
+ * Copyright (c) 2012-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/backend/replication/pgoutput_cluster/pgoutput_cluster.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heaptoast.h"
+#include "commands/cluster.h"
+#include "replication/snapbuild.h"
+
+PG_MODULE_MAGIC;
+
+static void plugin_startup(LogicalDecodingContext *ctx,
+ OutputPluginOptions *opt, bool is_init);
+static void plugin_shutdown(LogicalDecodingContext *ctx);
+static void plugin_begin_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn);
+static void plugin_commit_txn(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, XLogRecPtr commit_lsn);
+static void plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation rel, ReorderBufferChange *change);
+static void plugin_truncate(struct LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn, int nrelations,
+ Relation relations[],
+ ReorderBufferChange *change);
+static void store_change(LogicalDecodingContext *ctx,
+ ConcurrentChangeKind kind, HeapTuple tuple);
+
+void
+_PG_output_plugin_init(OutputPluginCallbacks *cb)
+{
+ AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
+
+ cb->startup_cb = plugin_startup;
+ cb->begin_cb = plugin_begin_txn;
+ cb->change_cb = plugin_change;
+ cb->truncate_cb = plugin_truncate;
+ cb->commit_cb = plugin_commit_txn;
+ cb->shutdown_cb = plugin_shutdown;
+}
+
+
+/* initialize this plugin */
+static void
+plugin_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
+ bool is_init)
+{
+ ctx->output_plugin_private = NULL;
+
+ /* Probably unnecessary, as we don't use the SQL interface ... */
+ opt->output_type = OUTPUT_PLUGIN_BINARY_OUTPUT;
+
+ if (ctx->output_plugin_options != NIL)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("This plugin does not expect any options")));
+ }
+}
+
+static void
+plugin_shutdown(LogicalDecodingContext *ctx)
+{
+}
+
+/*
+ * As we don't release the slot during processing of particular table, there's
+ * no room for SQL interface, even for debugging purposes. Therefore we need
+ * neither OutputPluginPrepareWrite() nor OutputPluginWrite() in the plugin
+ * callbacks. (Although we might want to write custom callbacks, this API
+ * seems to be unnecessarily generic for our purposes.)
+ */
+
+/* BEGIN callback */
+static void
+plugin_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn)
+{
+}
+
+/* COMMIT callback */
+static void
+plugin_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ XLogRecPtr commit_lsn)
+{
+}
+
+/*
+ * Callback for individual changed tuples
+ */
+static void
+plugin_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ Relation relation, ReorderBufferChange *change)
+{
+ RepackDecodingState *dstate;
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+ /* Only interested in one particular relation. */
+ if (relation->rd_id != dstate->relid)
+ return;
+
+ /* Decode entry depending on its type */
+ switch (change->action)
+ {
+ case REORDER_BUFFER_CHANGE_INSERT:
+ {
+ HeapTuple newtuple;
+
+ newtuple = change->data.tp.newtuple != NULL ?
+ change->data.tp.newtuple : NULL;
+
+ /*
+ * Identity checks in the main function should have made this
+ * impossible.
+ */
+ if (newtuple == NULL)
+ elog(ERROR, "Incomplete insert info.");
+
+ store_change(ctx, CHANGE_INSERT, newtuple);
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_UPDATE:
+ {
+ HeapTuple oldtuple,
+ newtuple;
+
+ oldtuple = change->data.tp.oldtuple != NULL ?
+ change->data.tp.oldtuple : NULL;
+ newtuple = change->data.tp.newtuple != NULL ?
+ change->data.tp.newtuple : NULL;
+
+ if (newtuple == NULL)
+ elog(ERROR, "Incomplete update info.");
+
+ if (oldtuple != NULL)
+ store_change(ctx, CHANGE_UPDATE_OLD, oldtuple);
+
+ store_change(ctx, CHANGE_UPDATE_NEW, newtuple);
+ }
+ break;
+ case REORDER_BUFFER_CHANGE_DELETE:
+ {
+ HeapTuple oldtuple;
+
+ oldtuple = change->data.tp.oldtuple ?
+ change->data.tp.oldtuple : NULL;
+
+ if (oldtuple == NULL)
+ elog(ERROR, "Incomplete delete info.");
+
+ store_change(ctx, CHANGE_DELETE, oldtuple);
+ }
+ break;
+ default:
+ /* Should not come here */
+ Assert(false);
+ break;
+ }
+}
+
+static void
+plugin_truncate(struct LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
+ int nrelations, Relation relations[],
+ ReorderBufferChange *change)
+{
+ RepackDecodingState *dstate;
+ int i;
+ Relation relation = NULL;
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+ /* Find the relation we are processing. */
+ for (i = 0; i < nrelations; i++)
+ {
+ relation = relations[i];
+
+ if (RelationGetRelid(relation) == dstate->relid)
+ break;
+ }
+
+ /* Is this truncation of another relation? */
+ if (i == nrelations)
+ return;
+
+ store_change(ctx, CHANGE_TRUNCATE, NULL);
+}
+
+/* Store concurrent data change. */
+static void
+store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
+ HeapTuple tuple)
+{
+ RepackDecodingState *dstate;
+ char *change_raw;
+ ConcurrentChange change;
+ bool flattened = false;
+ Size size;
+ Datum values[1];
+ bool isnull[1];
+ char *dst,
+ *dst_start;
+
+ dstate = (RepackDecodingState *) ctx->output_writer_private;
+
+ size = MAXALIGN(VARHDRSZ) + SizeOfConcurrentChange;
+
+ if (tuple)
+ {
+ /*
+ * ReorderBufferCommit() stores the TOAST chunks in its private memory
+ * context and frees them after having called apply_change().
+ * Therefore we need flat copy (including TOAST) that we eventually
+ * copy into the memory context which is available to
+ * decode_concurrent_changes().
+ */
+ if (HeapTupleHasExternal(tuple))
+ {
+ /*
+ * toast_flatten_tuple_to_datum() might be more convenient but we
+ * don't want the decompression it does.
+ */
+ tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
+ flattened = true;
+ }
+
+ size += tuple->t_len;
+ }
+
+ /* XXX Isn't there any function / macro to do this? */
+ if (size >= 0x3FFFFFFF)
+ elog(ERROR, "Change is too big.");
+
+ /* Construct the change. */
+ change_raw = (char *) palloc0(size);
+ SET_VARSIZE(change_raw, size);
+
+ /*
+ * Since the varlena alignment might not be sufficient for the structure,
+ * set the fields in a local instance and remember where it should
+ * eventually be copied.
+ */
+ change.kind = kind;
+ dst_start = (char *) VARDATA(change_raw);
+
+ /* No other information is needed for TRUNCATE. */
+ if (change.kind == CHANGE_TRUNCATE)
+ {
+ memcpy(dst_start, &change, SizeOfConcurrentChange);
+ goto store;
+ }
+
+ /*
+ * Copy the tuple.
+ *
+ * CAUTION: change->tup_data.t_data must be fixed on retrieval!
+ */
+ memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
+ dst = dst_start + SizeOfConcurrentChange;
+ memcpy(dst, tuple->t_data, tuple->t_len);
+
+ /* The data has been copied. */
+ if (flattened)
+ pfree(tuple);
+
+store:
+ /* Copy the structure so it can be stored. */
+ memcpy(dst_start, &change, SizeOfConcurrentChange);
+
+ /* Store as tuple of 1 bytea column. */
+ values[0] = PointerGetDatum(change_raw);
+ isnull[0] = false;
+ tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
+ values, isnull);
+
+ /* Accounting. */
+ dstate->nchanges++;
+
+ /* Cleanup. */
+ pfree(change_raw);
+}
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2fa045e6b0f..e9ddf39500c 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -25,6 +25,7 @@
#include "access/xlogprefetcher.h"
#include "access/xlogrecovery.h"
#include "commands/async.h"
+#include "commands/cluster.h"
#include "miscadmin.h"
#include "pgstat.h"
#include "postmaster/autovacuum.h"
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index cd3e43c448a..519f3953638 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -162,7 +162,7 @@ while (<$lwlocklist>)
die
"$wait_event_lwlocks[$lwlock_count] defined in wait_event_names.txt but "
- . " missing from lwlocklist.h"
+ . "missing from lwlocklist.h"
if $lwlock_count < scalar @wait_event_lwlocks;
die
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6fe268a8eec..d27a4c30548 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -64,6 +64,7 @@
#include "catalog/pg_type.h"
#include "catalog/schemapg.h"
#include "catalog/storage.h"
+#include "commands/cluster.h"
#include "commands/policy.h"
#include "commands/publicationcmds.h"
#include "commands/trigger.h"
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index bc7840052fe..6d46537cbe8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -213,7 +213,6 @@ static List *exportedSnapshots = NIL;
/* Prototypes for local functions */
static void UnregisterSnapshotNoOwner(Snapshot snapshot);
-static void FreeSnapshot(Snapshot snapshot);
static void SnapshotResetXmin(void);
/* ResourceOwner callbacks to track snapshot references */
@@ -657,7 +656,7 @@ CopySnapshot(Snapshot snapshot)
* FreeSnapshot
* Free the memory associated with a snapshot.
*/
-static void
+void
FreeSnapshot(Snapshot snapshot)
{
Assert(snapshot->regd_count == 0);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 59ff6e0923b..528fb08154a 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -4998,18 +4998,27 @@ match_previous_words(int pattern_id,
}
/* REPACK */
- else if (Matches("REPACK"))
+ else if (Matches("REPACK") || Matches("REPACK", "(*)"))
+ COMPLETE_WITH_SCHEMA_QUERY_PLUS(Query_for_list_of_clusterables,
+ "CONCURRENTLY");
+ else if (Matches("REPACK", "CONCURRENTLY"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
- else if (Matches("REPACK", "(*)"))
+ else if (Matches("REPACK", "(*)", "CONCURRENTLY"))
COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
- /* If we have REPACK <sth>, then add "USING INDEX" */
- else if (Matches("REPACK", MatchAnyExcept("(")))
+ /* If we have REPACK [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+ else if (Matches("REPACK", MatchAnyExcept("(|CONCURRENTLY")) ||
+ Matches("REPACK", "CONCURRENTLY", MatchAnyExcept("(")))
COMPLETE_WITH("USING INDEX");
- /* If we have REPACK (*) <sth>, then add "USING INDEX" */
- else if (Matches("REPACK", "(*)", MatchAny))
+ /* If we have REPACK (*) [ CONCURRENTLY ] <sth>, then add "USING INDEX" */
+ else if (Matches("REPACK", "(*)", MatchAnyExcept("CONCURRENTLY")) ||
+ Matches("REPACK", "(*)", "CONCURRENTLY", MatchAnyExcept("(")))
COMPLETE_WITH("USING INDEX");
- /* If we have REPACK <sth> USING, then add the index as well */
- else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+
+ /*
+ * Complete ... [ (*) ] [ CONCURRENTLY ] <sth> USING INDEX, with a list of
+ * indexes for <sth>.
+ */
+ else if (TailMatches(MatchAnyExcept("(|CONCURRENTLY"), "USING", "INDEX"))
{
set_completion_reference(prev3_wd);
COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a2bd5a897f8..b82dd17a966 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -323,14 +323,15 @@ extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
BulkInsertState bistate);
extern TM_Result heap_delete(Relation relation, ItemPointer tid,
CommandId cid, Snapshot crosscheck, bool wait,
- struct TM_FailureData *tmfd, bool changingPart);
+ struct TM_FailureData *tmfd, bool changingPart,
+ bool wal_logical);
extern void heap_finish_speculative(Relation relation, ItemPointer tid);
extern void heap_abort_speculative(Relation relation, ItemPointer tid);
extern TM_Result heap_update(Relation relation, ItemPointer otid,
HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
struct TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ TU_UpdateIndexes *update_indexes, bool wal_logical);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -411,6 +412,10 @@ extern HTSV_Result HeapTupleSatisfiesVacuumHorizon(HeapTuple htup, Buffer buffer
TransactionId *dead_after);
extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
uint16 infomask, TransactionId xid);
+extern bool HeapTupleMVCCInserted(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer);
+extern bool HeapTupleMVCCNotDeleted(HeapTuple htup, Snapshot snapshot,
+ Buffer buffer);
extern bool HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple);
extern bool HeapTupleIsSurelyDead(HeapTuple htup,
struct GlobalVisState *vistest);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 277df6b3cf0..8d4af07f840 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -104,6 +104,8 @@
#define XLH_DELETE_CONTAINS_OLD_KEY (1<<2)
#define XLH_DELETE_IS_SUPER (1<<3)
#define XLH_DELETE_IS_PARTITION_MOVE (1<<4)
+/* See heap_delete() */
+#define XLH_DELETE_NO_LOGICAL (1<<5)
/* convenience macro for checking whether any form of old tuple was logged */
#define XLH_DELETE_CONTAINS_OLD \
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 1c9e802a6b1..289b64edfd9 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -22,6 +22,7 @@
#include "access/xact.h"
#include "commands/vacuum.h"
#include "executor/tuptable.h"
+#include "replication/logical.h"
#include "storage/read_stream.h"
#include "utils/rel.h"
#include "utils/snapshot.h"
@@ -623,6 +624,8 @@ typedef struct TableAmRoutine
Relation OldIndex,
bool use_sort,
TransactionId OldestXmin,
+ Snapshot snapshot,
+ LogicalDecodingContext *decoding_ctx,
TransactionId *xid_cutoff,
MultiXactId *multi_cutoff,
double *num_tuples,
@@ -1627,6 +1630,10 @@ table_relation_copy_data(Relation rel, const RelFileLocator *newrlocator)
* not needed for the relation's AM
* - *xid_cutoff - ditto
* - *multi_cutoff - ditto
+ * - snapshot - if != NULL, ignore data changes done by transactions that this
+ * (MVCC) snapshot considers still in-progress or in the future.
+ * - decoding_ctx - logical decoding context, to capture concurrent data
+ * changes.
*
* Output parameters:
* - *xid_cutoff - rel's new relfrozenxid value, may be invalid
@@ -1639,6 +1646,8 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
Relation OldIndex,
bool use_sort,
TransactionId OldestXmin,
+ Snapshot snapshot,
+ LogicalDecodingContext *decoding_ctx,
TransactionId *xid_cutoff,
MultiXactId *multi_cutoff,
double *num_tuples,
@@ -1647,6 +1656,7 @@ table_relation_copy_for_cluster(Relation OldTable, Relation NewTable,
{
OldTable->rd_tableam->relation_copy_for_cluster(OldTable, NewTable, OldIndex,
use_sort, OldestXmin,
+ snapshot, decoding_ctx,
xid_cutoff, multi_cutoff,
num_tuples, tups_vacuumed,
tups_recently_dead);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 890998d84bb..4a508c57a50 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -13,10 +13,15 @@
#ifndef CLUSTER_H
#define CLUSTER_H
+#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
+#include "replication/logical.h"
#include "storage/lock.h"
+#include "storage/relfilelocator.h"
#include "utils/relcache.h"
+#include "utils/resowner.h"
+#include "utils/tuplestore.h"
/* flag bits for ClusterParams->options */
@@ -25,6 +30,8 @@
#define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
* indisclustered */
#define CLUOPT_ANALYZE 0x08 /* do an ANALYZE */
+#define CLUOPT_CONCURRENT 0x08 /* allow concurrent data changes */
+
/* options for CLUSTER */
typedef struct ClusterParams
@@ -33,14 +40,95 @@ typedef struct ClusterParams
} ClusterParams;
+/*
+ * The following definitions are used by REPACK CONCURRENTLY.
+ */
+
+extern RelFileLocator repacked_rel_locator;
+extern RelFileLocator repacked_rel_toast_locator;
+
+typedef enum
+{
+ CHANGE_INSERT,
+ CHANGE_UPDATE_OLD,
+ CHANGE_UPDATE_NEW,
+ CHANGE_DELETE,
+ CHANGE_TRUNCATE
+} ConcurrentChangeKind;
+
+typedef struct ConcurrentChange
+{
+ /* See the enum above. */
+ ConcurrentChangeKind kind;
+
+ /*
+ * The actual tuple.
+ *
+ * The tuple data follows the ConcurrentChange structure. Before use make
+ * sure the tuple is correctly aligned (ConcurrentChange can be stored as
+ * bytea) and that tuple->t_data is fixed.
+ */
+ HeapTupleData tup_data;
+} ConcurrentChange;
+
+#define SizeOfConcurrentChange (offsetof(ConcurrentChange, tup_data) + \
+ sizeof(HeapTupleData))
+
+/*
+ * Logical decoding state.
+ *
+ * Here we store the data changes that we decode from WAL while the table
+ * contents is being copied to a new storage. Also the necessary metadata
+ * needed to apply these changes to the table is stored here.
+ */
+typedef struct RepackDecodingState
+{
+ /* The relation whose changes we're decoding. */
+ Oid relid;
+
+ /*
+ * Decoded changes are stored here. Although we try to avoid excessive
+ * batches, it can happen that the changes need to be stored to disk. The
+ * tuplestore does this transparently.
+ */
+ Tuplestorestate *tstore;
+
+ /* The current number of changes in tstore. */
+ double nchanges;
+
+ /*
+ * Descriptor to store the ConcurrentChange structure serialized (bytea).
+ * We can't store the tuple directly because tuplestore only supports
+ * minimum tuple and we may need to transfer OID system column from the
+ * output plugin. Also we need to transfer the change kind, so it's better
+ * to put everything in the structure than to use 2 tuplestores "in
+ * parallel".
+ */
+ TupleDesc tupdesc_change;
+
+ /* Tuple descriptor needed to update indexes. */
+ TupleDesc tupdesc;
+
+ /* Slot to retrieve data from tstore. */
+ TupleTableSlot *tsslot;
+
+ ResourceOwner resowner;
+} RepackDecodingState;
+
+
+
extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
extern void cluster_rel(RepackCommand command, bool usingindex,
- Relation OldHeap, Oid indexOid, ClusterParams *params);
+ Relation OldHeap, Oid indexOid, ClusterParams *params,
+ bool isTopLevel);
extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
LOCKMODE lockmode);
extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
+extern void repack_decode_concurrent_changes(LogicalDecodingContext *ctx,
+ XLogRecPtr end_of_wal);
+
extern Oid make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
char relpersistence, LOCKMODE lockmode);
extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
@@ -48,6 +136,7 @@ extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
bool swap_toast_by_content,
bool check_constraints,
bool is_internal,
+ bool reindex,
TransactionId frozenXid,
MultiXactId cutoffMulti,
char newrelpersistence);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 5b6639c114c..93917ad5544 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -59,18 +59,20 @@
/*
* Progress parameters for REPACK.
*
- * Note: Since REPACK shares some code with CLUSTER, these values are also
- * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
- * introduce a separate set of constants.)
+ * Note: Since REPACK shares some code with CLUSTER, (some of) these values
+ * are also used by CLUSTER. (CLUSTER is now deprecated, so it makes little
+ * sense to introduce a separate set of constants.)
*/
#define PROGRESS_REPACK_COMMAND 0
#define PROGRESS_REPACK_PHASE 1
#define PROGRESS_REPACK_INDEX_RELID 2
#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED 3
-#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN 4
-#define PROGRESS_REPACK_TOTAL_HEAP_BLKS 5
-#define PROGRESS_REPACK_HEAP_BLKS_SCANNED 6
-#define PROGRESS_REPACK_INDEX_REBUILD_COUNT 7
+#define PROGRESS_REPACK_HEAP_TUPLES_INSERTED 4
+#define PROGRESS_REPACK_HEAP_TUPLES_UPDATED 5
+#define PROGRESS_REPACK_HEAP_TUPLES_DELETED 6
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS 7
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED 8
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT 9
/*
* Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
@@ -79,9 +81,10 @@
#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP 2
#define PROGRESS_REPACK_PHASE_SORT_TUPLES 3
#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP 4
-#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES 5
-#define PROGRESS_REPACK_PHASE_REBUILD_INDEX 6
-#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP 7
+#define PROGRESS_REPACK_PHASE_CATCH_UP 5
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES 6
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX 7
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP 8
/*
* Commands of PROGRESS_REPACK
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 6d4d2d1814c..802fc4b0823 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
extern void SnapBuildSnapDecRefcount(Snapshot snap);
extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildInitialSnapshotForRepack(SnapBuild *builder);
extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
extern void SnapBuildClearExportedSnapshot(void);
diff --git a/src/include/storage/lockdefs.h b/src/include/storage/lockdefs.h
index 7f3ba0352f6..2739327b0da 100644
--- a/src/include/storage/lockdefs.h
+++ b/src/include/storage/lockdefs.h
@@ -36,8 +36,8 @@ typedef int LOCKMODE;
#define AccessShareLock 1 /* SELECT */
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
-#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL), ANALYZE, CREATE
- * INDEX CONCURRENTLY */
+#define ShareUpdateExclusiveLock 4 /* VACUUM (non-exclusive), ANALYZE, CREATE
+ * INDEX CONCURRENTLY, REPACK CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index f65f83c85cd..1f821fd2ccd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -64,6 +64,8 @@ extern Snapshot GetLatestSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);
extern Snapshot CopySnapshot(Snapshot snapshot);
+extern void FreeSnapshot(Snapshot snapshot);
+
extern Snapshot GetCatalogSnapshot(Oid relid);
extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
extern void InvalidateCatalogSnapshot(void);
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index fc82cd67f6c..f16422175f8 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -11,10 +11,11 @@ EXTENSION = injection_points
DATA = injection_points--1.0.sql
PGFILEDESC = "injection_points - facility for injection points"
-REGRESS = injection_points hashagg reindex_conc vacuum
+# REGRESS = injection_points hashagg reindex_conc vacuum
REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned repack
+ISOLATION_OPTS = --temp-config $(top_srcdir)/src/test/modules/injection_points/logical.conf
TAP_TESTS = 1
diff --git a/src/test/modules/injection_points/expected/repack.out b/src/test/modules/injection_points/expected/repack.out
new file mode 100644
index 00000000000..b575e9052ee
--- /dev/null
+++ b/src/test/modules/injection_points/expected/repack.out
@@ -0,0 +1,113 @@
+Parsed test spec with 2 sessions
+
+starting permutation: wait_before_lock change_existing change_new change_subxact1 change_subxact2 check2 wakeup_before_lock check1
+injection_points_attach
+-----------------------
+
+(1 row)
+
+step wait_before_lock:
+ REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+ <waiting ...>
+step change_existing:
+ UPDATE repack_test SET i=10 where i=1;
+ UPDATE repack_test SET j=20 where i=2;
+ UPDATE repack_test SET i=30 where i=3;
+ UPDATE repack_test SET i=40 where i=30;
+ DELETE FROM repack_test WHERE i=4;
+
+step change_new:
+ INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+ UPDATE repack_test SET i=50 where i=5;
+ UPDATE repack_test SET j=60 where i=6;
+ DELETE FROM repack_test WHERE i=7;
+
+step change_subxact1:
+ BEGIN;
+ INSERT INTO repack_test(i, j) VALUES (100, 100);
+ SAVEPOINT s1;
+ UPDATE repack_test SET i=101 where i=100;
+ SAVEPOINT s2;
+ UPDATE repack_test SET i=102 where i=101;
+ COMMIT;
+
+step change_subxact2:
+ BEGIN;
+ SAVEPOINT s1;
+ INSERT INTO repack_test(i, j) VALUES (110, 110);
+ ROLLBACK TO SAVEPOINT s1;
+ INSERT INTO repack_test(i, j) VALUES (110, 111);
+ COMMIT;
+
+step check2:
+ INSERT INTO relfilenodes(node)
+ SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+ SELECT i, j FROM repack_test ORDER BY i, j;
+
+ INSERT INTO data_s2(i, j)
+ SELECT i, j FROM repack_test;
+
+ i| j
+---+---
+ 2| 20
+ 6| 60
+ 8| 8
+ 10| 1
+ 40| 3
+ 50| 5
+102|100
+110|111
+(8 rows)
+
+step wakeup_before_lock:
+ SELECT injection_points_wakeup('repack-concurrently-before-lock');
+
+injection_points_wakeup
+-----------------------
+
+(1 row)
+
+step wait_before_lock: <... completed>
+step check1:
+ INSERT INTO relfilenodes(node)
+ SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+ SELECT count(DISTINCT node) FROM relfilenodes;
+
+ SELECT i, j FROM repack_test ORDER BY i, j;
+
+ INSERT INTO data_s1(i, j)
+ SELECT i, j FROM repack_test;
+
+ SELECT count(*)
+ FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+ WHERE d1.i ISNULL OR d2.i ISNULL;
+
+count
+-----
+ 2
+(1 row)
+
+ i| j
+---+---
+ 2| 20
+ 6| 60
+ 8| 8
+ 10| 1
+ 40| 3
+ 50| 5
+102|100
+110|111
+(8 rows)
+
+count
+-----
+ 0
+(1 row)
+
+injection_points_detach
+-----------------------
+
+(1 row)
+
diff --git a/src/test/modules/injection_points/logical.conf b/src/test/modules/injection_points/logical.conf
new file mode 100644
index 00000000000..c8f264bc6cb
--- /dev/null
+++ b/src/test/modules/injection_points/logical.conf
@@ -0,0 +1 @@
+wal_level = logical
\ No newline at end of file
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index 20390d6b4bf..29561103bbf 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -47,9 +47,13 @@ tests += {
'specs': [
'basic',
'inplace',
+ 'repack',
'syscache-update-pruned',
],
'runningcheck': false, # see syscache-update-pruned
+ # 'repack' requires wal_level = 'logical'.
+ 'regress_args': ['--temp-config', files('logical.conf')],
+
},
'tap': {
'env': {
diff --git a/src/test/modules/injection_points/specs/repack.spec b/src/test/modules/injection_points/specs/repack.spec
new file mode 100644
index 00000000000..75850334986
--- /dev/null
+++ b/src/test/modules/injection_points/specs/repack.spec
@@ -0,0 +1,143 @@
+# Prefix the system columns with underscore as they are not allowed as column
+# names.
+setup
+{
+ CREATE EXTENSION injection_points;
+
+ CREATE TABLE repack_test(i int PRIMARY KEY, j int);
+ INSERT INTO repack_test(i, j) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
+
+ CREATE TABLE relfilenodes(node oid);
+
+ CREATE TABLE data_s1(i int, j int);
+ CREATE TABLE data_s2(i int, j int);
+}
+
+teardown
+{
+ DROP TABLE repack_test;
+ DROP EXTENSION injection_points;
+
+ DROP TABLE relfilenodes;
+ DROP TABLE data_s1;
+ DROP TABLE data_s2;
+}
+
+session s1
+setup
+{
+ SELECT injection_points_set_local();
+ SELECT injection_points_attach('repack-concurrently-before-lock', 'wait');
+}
+# Perform the initial load and wait for s2 to do some data changes.
+step wait_before_lock
+{
+ REPACK (CONCURRENTLY) repack_test USING INDEX repack_test_pkey;
+}
+# Check the table from the perspective of s1.
+#
+# Besides the contents, we also check that relfilenode has changed.
+
+# Have each session write the contents into a table and use FULL JOIN to check
+# if the outputs are identical.
+step check1
+{
+ INSERT INTO relfilenodes(node)
+ SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+ SELECT count(DISTINCT node) FROM relfilenodes;
+
+ SELECT i, j FROM repack_test ORDER BY i, j;
+
+ INSERT INTO data_s1(i, j)
+ SELECT i, j FROM repack_test;
+
+ SELECT count(*)
+ FROM data_s1 d1 FULL JOIN data_s2 d2 USING (i, j)
+ WHERE d1.i ISNULL OR d2.i ISNULL;
+}
+teardown
+{
+ SELECT injection_points_detach('repack-concurrently-before-lock');
+}
+
+session s2
+# Change the existing data. UPDATE changes both key and non-key columns. Also
+# update one row twice to test whether tuple version generated by this session
+# can be found.
+step change_existing
+{
+ UPDATE repack_test SET i=10 where i=1;
+ UPDATE repack_test SET j=20 where i=2;
+ UPDATE repack_test SET i=30 where i=3;
+ UPDATE repack_test SET i=40 where i=30;
+ DELETE FROM repack_test WHERE i=4;
+}
+# Insert new rows and UPDATE / DELETE some of them. Again, update both key and
+# non-key column.
+step change_new
+{
+ INSERT INTO repack_test(i, j) VALUES (5, 5), (6, 6), (7, 7), (8, 8);
+ UPDATE repack_test SET i=50 where i=5;
+ UPDATE repack_test SET j=60 where i=6;
+ DELETE FROM repack_test WHERE i=7;
+}
+
+# When applying concurrent data changes, we should see the effects of an
+# in-progress subtransaction.
+#
+# XXX Not sure this test is useful now - it was designed for the patch that
+# preserves tuple visibility and which therefore modifies
+# TransactionIdIsCurrentTransactionId().
+step change_subxact1
+{
+ BEGIN;
+ INSERT INTO repack_test(i, j) VALUES (100, 100);
+ SAVEPOINT s1;
+ UPDATE repack_test SET i=101 where i=100;
+ SAVEPOINT s2;
+ UPDATE repack_test SET i=102 where i=101;
+ COMMIT;
+}
+
+# When applying concurrent data changes, we should not see the effects of a
+# rolled back subtransaction.
+#
+# XXX Is this test useful? See above.
+step change_subxact2
+{
+ BEGIN;
+ SAVEPOINT s1;
+ INSERT INTO repack_test(i, j) VALUES (110, 110);
+ ROLLBACK TO SAVEPOINT s1;
+ INSERT INTO repack_test(i, j) VALUES (110, 111);
+ COMMIT;
+}
+
+# Check the table from the perspective of s2.
+step check2
+{
+ INSERT INTO relfilenodes(node)
+ SELECT relfilenode FROM pg_class WHERE relname='repack_test';
+
+ SELECT i, j FROM repack_test ORDER BY i, j;
+
+ INSERT INTO data_s2(i, j)
+ SELECT i, j FROM repack_test;
+}
+step wakeup_before_lock
+{
+ SELECT injection_points_wakeup('repack-concurrently-before-lock');
+}
+
+# Test if data changes introduced while one session is performing REPACK
+# CONCURRENTLY find their way into the table.
+permutation
+ wait_before_lock
+ change_existing
+ change_new
+ change_subxact1
+ change_subxact2
+ check2
+ wakeup_before_lock
+ check1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3a1d1d28282..fe227bd8a30 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1999,17 +1999,17 @@ pg_stat_progress_cluster| SELECT s.pid,
WHEN 2 THEN 'index scanning heap'::text
WHEN 3 THEN 'sorting tuples'::text
WHEN 4 THEN 'writing new heap'::text
- WHEN 5 THEN 'swapping relation files'::text
- WHEN 6 THEN 'rebuilding index'::text
- WHEN 7 THEN 'performing final cleanup'::text
+ WHEN 6 THEN 'swapping relation files'::text
+ WHEN 7 THEN 'rebuilding index'::text
+ WHEN 8 THEN 'performing final cleanup'::text
ELSE NULL::text
END AS phase,
(s.param3)::oid AS cluster_index_relid,
s.param4 AS heap_tuples_scanned,
s.param5 AS heap_tuples_written,
- s.param6 AS heap_blks_total,
- s.param7 AS heap_blks_scanned,
- s.param8 AS index_rebuild_count
+ s.param8 AS heap_blks_total,
+ s.param9 AS heap_blks_scanned,
+ s.param10 AS index_rebuild_count
FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_copy| SELECT s.pid,
@@ -2081,17 +2081,20 @@ pg_stat_progress_repack| SELECT s.pid,
WHEN 2 THEN 'index scanning heap'::text
WHEN 3 THEN 'sorting tuples'::text
WHEN 4 THEN 'writing new heap'::text
- WHEN 5 THEN 'swapping relation files'::text
- WHEN 6 THEN 'rebuilding index'::text
- WHEN 7 THEN 'performing final cleanup'::text
+ WHEN 5 THEN 'catch-up'::text
+ WHEN 6 THEN 'swapping relation files'::text
+ WHEN 7 THEN 'rebuilding index'::text
+ WHEN 8 THEN 'performing final cleanup'::text
ELSE NULL::text
END AS phase,
(s.param3)::oid AS repack_index_relid,
s.param4 AS heap_tuples_scanned,
- s.param5 AS heap_tuples_written,
- s.param6 AS heap_blks_total,
- s.param7 AS heap_blks_scanned,
- s.param8 AS index_rebuild_count
+ s.param5 AS heap_tuples_inserted,
+ s.param6 AS heap_tuples_updated,
+ s.param7 AS heap_tuples_deleted,
+ s.param8 AS heap_blks_total,
+ s.param9 AS heap_blks_scanned,
+ s.param10 AS index_rebuild_count
FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 98242e25432..b64ab8dfab4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -485,6 +485,8 @@ CompressFileHandle
CompressionLocation
CompressorState
ComputeXidHorizonsResult
+ConcurrentChange
+ConcurrentChangeKind
ConditionVariable
ConditionVariableMinimallyPadded
ConditionalStack
@@ -1257,6 +1259,7 @@ IndexElem
IndexFetchHeapData
IndexFetchTableData
IndexInfo
+IndexInsertState
IndexList
IndexOnlyScan
IndexOnlyScanState
@@ -2538,6 +2541,7 @@ ReorderBufferUpdateProgressTxnCB
ReorderTuple
RepOriginId
RepackCommand
+RepackDecodingState
RepackStmt
ReparameterizeForeignPathByChild_function
ReplaceVarsFromTargetList_context
--
2.43.0
[application/octet-stream] v21-0002-Add-REPACK-command.patch (133.3K, 5-v21-0002-Add-REPACK-command.patch)
download | inline diff:
From 40965dfef0f26a92249cda7a956bd03c9358a026 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <[email protected]>
Date: Sat, 26 Jul 2025 19:57:26 +0200
Subject: [PATCH v21 2/6] Add REPACK command
REPACK absorbs the functionality of VACUUM FULL and CLUSTER in a single
command. Because this functionality is completely different from
regular VACUUM, having it separate from VACUUM makes it easier for users
to understand; as for CLUSTER, the term is heavily overloaded in the
TI world and even in Postgres itself, so it's good that we can avoid it.
This also adds pg_repackdb, a new utility that can invoke the new
commands. This is heavily based on vacuumdb. We may still change the
implementation, depending on how does Windows like this one.
Author: Antonin Houska <[email protected]>
Reviewed-by: To fill in
Discussion: https://postgr.es/m/82651.1720540558@antos
Discussion: https://postgr.es/m/[email protected]
---
doc/src/sgml/monitoring.sgml | 223 ++++++-
doc/src/sgml/ref/allfiles.sgml | 2 +
doc/src/sgml/ref/cluster.sgml | 97 +--
doc/src/sgml/ref/clusterdb.sgml | 5 +
doc/src/sgml/ref/pg_repackdb.sgml | 479 ++++++++++++++
doc/src/sgml/ref/repack.sgml | 284 +++++++++
doc/src/sgml/ref/vacuum.sgml | 33 +-
doc/src/sgml/reference.sgml | 2 +
src/backend/access/heap/heapam_handler.c | 32 +-
src/backend/catalog/index.c | 2 +-
src/backend/catalog/system_views.sql | 26 +
src/backend/commands/cluster.c | 758 +++++++++++++++--------
src/backend/commands/vacuum.c | 3 +-
src/backend/parser/gram.y | 88 ++-
src/backend/tcop/utility.c | 20 +-
src/backend/utils/adt/pgstatfuncs.c | 2 +
src/bin/psql/tab-complete.in.c | 33 +-
src/bin/scripts/Makefile | 4 +-
src/bin/scripts/meson.build | 2 +
src/bin/scripts/pg_repackdb.c | 226 +++++++
src/bin/scripts/t/103_repackdb.pl | 24 +
src/bin/scripts/vacuuming.c | 60 +-
src/bin/scripts/vacuuming.h | 11 +-
src/include/commands/cluster.h | 8 +-
src/include/commands/progress.h | 61 +-
src/include/nodes/parsenodes.h | 20 +-
src/include/parser/kwlist.h | 1 +
src/include/tcop/cmdtaglist.h | 1 +
src/include/utils/backend_progress.h | 1 +
src/test/regress/expected/cluster.out | 125 +++-
src/test/regress/expected/rules.out | 23 +
src/test/regress/sql/cluster.sql | 59 ++
src/tools/pgindent/typedefs.list | 3 +
33 files changed, 2271 insertions(+), 447 deletions(-)
create mode 100644 doc/src/sgml/ref/pg_repackdb.sgml
create mode 100644 doc/src/sgml/ref/repack.sgml
create mode 100644 src/bin/scripts/pg_repackdb.c
create mode 100644 src/bin/scripts/t/103_repackdb.pl
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3f4a27a736e..12e103d319d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -405,6 +405,14 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
</entry>
</row>
+ <row>
+ <entry><structname>pg_stat_progress_repack</structname><indexterm><primary>pg_stat_progress_repack</primary></indexterm></entry>
+ <entry>One row for each backend running
+ <command>REPACK</command>, showing current progress. See
+ <xref linkend="repack-progress-reporting"/>.
+ </entry>
+ </row>
+
<row>
<entry><structname>pg_stat_progress_basebackup</structname><indexterm><primary>pg_stat_progress_basebackup</primary></indexterm></entry>
<entry>One row for each WAL sender process streaming a base backup,
@@ -5506,7 +5514,8 @@ FROM pg_stat_get_backend_idset() AS backendid;
certain commands during command execution. Currently, the only commands
which support progress reporting are <command>ANALYZE</command>,
<command>CLUSTER</command>,
- <command>CREATE INDEX</command>, <command>VACUUM</command>,
+ <command>CREATE INDEX</command>, <command>REPACK</command>,
+ <command>VACUUM</command>,
<command>COPY</command>,
and <xref linkend="protocol-replication-base-backup"/> (i.e., replication
command that <xref linkend="app-pgbasebackup"/> issues to take
@@ -5965,6 +5974,218 @@ FROM pg_stat_get_backend_idset() AS backendid;
</table>
</sect2>
+ <sect2 id="repack-progress-reporting">
+ <title>REPACK Progress Reporting</title>
+
+ <indexterm>
+ <primary>pg_stat_progress_repack</primary>
+ </indexterm>
+
+ <para>
+ Whenever <command>REPACK</command> is running,
+ the <structname>pg_stat_progress_repack</structname> view will contain a
+ row for each backend that is currently running the command. The tables
+ below describe the information that will be reported and provide
+ information about how to interpret it.
+ </para>
+
+ <table id="pg-stat-progress-repack-view" xreflabel="pg_stat_progress_repack">
+ <title><structname>pg_stat_progress_repack</structname> View</title>
+ <tgroup cols="1">
+ <thead>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ Column Type
+ </para>
+ <para>
+ Description
+ </para></entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pid</structfield> <type>integer</type>
+ </para>
+ <para>
+ Process ID of backend.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>datname</structfield> <type>name</type>
+ </para>
+ <para>
+ Name of the database to which this backend is connected.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ OID of the table being repacked.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>phase</structfield> <type>text</type>
+ </para>
+ <para>
+ Current processing phase. See <xref linkend="repack-phases"/>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>repack_index_relid</structfield> <type>oid</type>
+ </para>
+ <para>
+ If the table is being scanned using an index, this is the OID of the
+ index being used; otherwise, it is zero.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_tuples_scanned</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of heap tuples scanned.
+ This counter only advances when the phase is
+ <literal>seq scanning heap</literal>,
+ <literal>index scanning heap</literal>
+ or <literal>writing new heap</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_tuples_written</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of heap tuples written.
+ This counter only advances when the phase is
+ <literal>seq scanning heap</literal>,
+ <literal>index scanning heap</literal>
+ or <literal>writing new heap</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_blks_total</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Total number of heap blocks in the table. This number is reported
+ as of the beginning of <literal>seq scanning heap</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>heap_blks_scanned</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of heap blocks scanned. This counter only advances when the
+ phase is <literal>seq scanning heap</literal>.
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>index_rebuild_count</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of indexes rebuilt. This counter only advances when the phase
+ is <literal>rebuilding index</literal>.
+ </para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table id="repack-phases">
+ <title>REPACK Phases</title>
+ <tgroup cols="2">
+ <colspec colname="col1" colwidth="1*"/>
+ <colspec colname="col2" colwidth="2*"/>
+ <thead>
+ <row>
+ <entry>Phase</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry><literal>initializing</literal></entry>
+ <entry>
+ The command is preparing to begin scanning the heap. This phase is
+ expected to be very brief.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>seq scanning heap</literal></entry>
+ <entry>
+ The command is currently scanning the table using a sequential scan.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>index scanning heap</literal></entry>
+ <entry>
+ <command>REPACK</command> is currently scanning the table using an index scan.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>sorting tuples</literal></entry>
+ <entry>
+ <command>REPACK</command> is currently sorting tuples.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>writing new heap</literal></entry>
+ <entry>
+ <command>REPACK</command> is currently writing the new heap.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>swapping relation files</literal></entry>
+ <entry>
+ The command is currently swapping newly-built files into place.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>rebuilding index</literal></entry>
+ <entry>
+ The command is currently rebuilding an index.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>performing final cleanup</literal></entry>
+ <entry>
+ The command is performing final cleanup. When this phase is
+ completed, <command>REPACK</command> will end.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
<sect2 id="copy-progress-reporting">
<title>COPY Progress Reporting</title>
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5be638867a..eabf92e3536 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -167,6 +167,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY refreshMaterializedView SYSTEM "refresh_materialized_view.sgml">
<!ENTITY reindex SYSTEM "reindex.sgml">
<!ENTITY releaseSavepoint SYSTEM "release_savepoint.sgml">
+<!ENTITY repack SYSTEM "repack.sgml">
<!ENTITY reset SYSTEM "reset.sgml">
<!ENTITY revoke SYSTEM "revoke.sgml">
<!ENTITY rollback SYSTEM "rollback.sgml">
@@ -212,6 +213,7 @@ Complete list of usable sgml source files in this directory.
<!ENTITY pgIsready SYSTEM "pg_isready.sgml">
<!ENTITY pgReceivewal SYSTEM "pg_receivewal.sgml">
<!ENTITY pgRecvlogical SYSTEM "pg_recvlogical.sgml">
+<!ENTITY pgRepackdb SYSTEM "pg_repackdb.sgml">
<!ENTITY pgResetwal SYSTEM "pg_resetwal.sgml">
<!ENTITY pgRestore SYSTEM "pg_restore.sgml">
<!ENTITY pgRewind SYSTEM "pg_rewind.sgml">
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 8811f169ea0..cfcfb65e349 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -33,51 +33,13 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
<title>Description</title>
<para>
- <command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
- to cluster the table specified
- by <replaceable class="parameter">table_name</replaceable>
- based on the index specified by
- <replaceable class="parameter">index_name</replaceable>. The index must
- already have been defined on
- <replaceable class="parameter">table_name</replaceable>.
+ The <command>CLUSTER</command> command is equivalent to
+ <xref linkend="sql-repack"/> with an <literal>USING INDEX</literal>
+ clause. See there for more details.
</para>
- <para>
- When a table is clustered, it is physically reordered
- based on the index information. Clustering is a one-time operation:
- when the table is subsequently updated, the changes are
- not clustered. That is, no attempt is made to store new or
- updated rows according to their index order. (If one wishes, one can
- periodically recluster by issuing the command again. Also, setting
- the table's <literal>fillfactor</literal> storage parameter to less than
- 100% can aid in preserving cluster ordering during updates, since updated
- rows are kept on the same page if enough space is available there.)
- </para>
-
- <para>
- When a table is clustered, <productname>PostgreSQL</productname>
- remembers which index it was clustered by. The form
- <command>CLUSTER <replaceable class="parameter">table_name</replaceable></command>
- reclusters the table using the same index as before. You can also
- use the <literal>CLUSTER</literal> or <literal>SET WITHOUT CLUSTER</literal>
- forms of <link linkend="sql-altertable"><command>ALTER TABLE</command></link> to set the index to be used for
- future cluster operations, or to clear any previous setting.
- </para>
+<!-- Do we need to describe exactly which options map to what? They seem obvious to me. -->
- <para>
- <command>CLUSTER</command> without a
- <replaceable class="parameter">table_name</replaceable> reclusters all the
- previously-clustered tables in the current database that the calling user
- has privileges for. This form of <command>CLUSTER</command> cannot be
- executed inside a transaction block.
- </para>
-
- <para>
- When a table is being clustered, an <literal>ACCESS
- EXCLUSIVE</literal> lock is acquired on it. This prevents any other
- database operations (both reads and writes) from operating on the
- table until the <command>CLUSTER</command> is finished.
- </para>
</refsect1>
<refsect1>
@@ -136,63 +98,12 @@ CLUSTER [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <r
on the table.
</para>
- <para>
- In cases where you are accessing single rows randomly
- within a table, the actual order of the data in the
- table is unimportant. However, if you tend to access some
- data more than others, and there is an index that groups
- them together, you will benefit from using <command>CLUSTER</command>.
- If you are requesting a range of indexed values from a table, or a
- single indexed value that has multiple rows that match,
- <command>CLUSTER</command> will help because once the index identifies the
- table page for the first row that matches, all other rows
- that match are probably already on the same table page,
- and so you save disk accesses and speed up the query.
- </para>
-
- <para>
- <command>CLUSTER</command> can re-sort the table using either an index scan
- on the specified index, or (if the index is a b-tree) a sequential
- scan followed by sorting. It will attempt to choose the method that
- will be faster, based on planner cost parameters and available statistical
- information.
- </para>
-
<para>
While <command>CLUSTER</command> is running, the <xref
linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
pg_temp</literal>.
</para>
- <para>
- When an index scan is used, a temporary copy of the table is created that
- contains the table data in the index order. Temporary copies of each
- index on the table are created as well. Therefore, you need free space on
- disk at least equal to the sum of the table size and the index sizes.
- </para>
-
- <para>
- When a sequential scan and sort is used, a temporary sort file is
- also created, so that the peak temporary space requirement is as much
- as double the table size, plus the index sizes. This method is often
- faster than the index scan method, but if the disk space requirement is
- intolerable, you can disable this choice by temporarily setting <xref
- linkend="guc-enable-sort"/> to <literal>off</literal>.
- </para>
-
- <para>
- It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to
- a reasonably large value (but not more than the amount of RAM you can
- dedicate to the <command>CLUSTER</command> operation) before clustering.
- </para>
-
- <para>
- Because the planner records statistics about the ordering of
- tables, it is advisable to run <link linkend="sql-analyze"><command>ANALYZE</command></link>
- on the newly clustered table.
- Otherwise, the planner might make poor choices of query plans.
- </para>
-
<para>
Because <command>CLUSTER</command> remembers which indexes are clustered,
one can cluster the tables one wants clustered manually the first time,
diff --git a/doc/src/sgml/ref/clusterdb.sgml b/doc/src/sgml/ref/clusterdb.sgml
index 0d2051bf6f1..546c1289c31 100644
--- a/doc/src/sgml/ref/clusterdb.sgml
+++ b/doc/src/sgml/ref/clusterdb.sgml
@@ -64,6 +64,11 @@ PostgreSQL documentation
this utility and via other methods for accessing the server.
</para>
+ <para>
+ <application>clusterdb</application> has been superceded by
+ <application>pg_repackdb</application>.
+ </para>
+
</refsect1>
diff --git a/doc/src/sgml/ref/pg_repackdb.sgml b/doc/src/sgml/ref/pg_repackdb.sgml
new file mode 100644
index 00000000000..32570d071cb
--- /dev/null
+++ b/doc/src/sgml/ref/pg_repackdb.sgml
@@ -0,0 +1,479 @@
+<!--
+doc/src/sgml/ref/pg_repackdb.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="app-pgrepackdb">
+ <indexterm zone="app-pgrepackdb">
+ <primary>pg_repackdb</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_repackdb</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_repackdb</refname>
+ <refpurpose>repack and analyze a <productname>PostgreSQL</productname>
+ database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_repackdb</command>
+ <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+ <arg choice="plain" rep="repeat">
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><option>-t</option></arg>
+ <arg choice="plain"><option>--table</option></arg>
+ </group>
+ <replaceable>table</replaceable>
+ <arg choice="opt">( <replaceable class="parameter">column</replaceable> [,...] )</arg>
+ </arg>
+ </arg>
+
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><replaceable>dbname</replaceable></arg>
+ <arg choice="plain"><option>-a</option></arg>
+ <arg choice="plain"><option>--all</option></arg>
+ </group>
+ </arg>
+ </cmdsynopsis>
+
+ <cmdsynopsis>
+ <command>pg_repackdb</command>
+ <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+ <arg choice="plain" rep="repeat">
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><option>-n</option></arg>
+ <arg choice="plain"><option>--schema</option></arg>
+ </group>
+ <replaceable>schema</replaceable>
+ </arg>
+ </arg>
+
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><replaceable>dbname</replaceable></arg>
+ <arg choice="plain"><option>-a</option></arg>
+ <arg choice="plain"><option>--all</option></arg>
+ </group>
+ </arg>
+ </cmdsynopsis>
+
+ <cmdsynopsis>
+ <command>pg_repackdb</command>
+ <arg rep="repeat"><replaceable>connection-option</replaceable></arg>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+
+ <arg choice="plain" rep="repeat">
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><option>-N</option></arg>
+ <arg choice="plain"><option>--exclude-schema</option></arg>
+ </group>
+ <replaceable>schema</replaceable>
+ </arg>
+ </arg>
+
+ <arg choice="opt">
+ <group choice="plain">
+ <arg choice="plain"><replaceable>dbname</replaceable></arg>
+ <arg choice="plain"><option>-a</option></arg>
+ <arg choice="plain"><option>--all</option></arg>
+ </group>
+ </arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_repackdb</application> is a utility for repacking a
+ <productname>PostgreSQL</productname> database.
+ <application>pg_repackdb</application> will also generate internal
+ statistics used by the <productname>PostgreSQL</productname> query
+ optimizer.
+ </para>
+
+ <para>
+ <application>pg_repackdb</application> is a wrapper around the SQL
+ command <link linkend="sql-repack"><command>REPACK</command></link> There
+ is no effective difference between repacking and analyzing databases via
+ this utility and via other methods for accessing the server.
+ </para>
+
+ </refsect1>
+
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_repackdb</application> accepts the following command-line arguments:
+ <variablelist>
+ <varlistentry>
+ <term><option>-a</option></term>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Repack all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option><optional>-d</optional> <replaceable class="parameter">dbname</replaceable></option></term>
+ <term><option><optional>--dbname=</optional><replaceable class="parameter">dbname</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to be repacked or analyzed,
+ when <option>-a</option>/<option>--all</option> is not used. If this
+ is not specified, the database name is read from the environment
+ variable <envar>PGDATABASE</envar>. If that is not set, the user name
+ specified for the connection is used.
+ The <replaceable>dbname</replaceable> can be
+ a <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Echo the commands that <application>pg_repackdb</application>
+ generates and sends to the server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
+ <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
+ <listitem>
+ <para>
+ Execute the repack or analyze commands in parallel by running
+ <replaceable class="parameter">njobs</replaceable>
+ commands simultaneously. This option may reduce the processing time
+ but it also increases the load on the database server.
+ </para>
+ <para>
+ <application>pg_repackdb</application> will open
+ <replaceable class="parameter">njobs</replaceable> connections to the
+ database, so make sure your <xref linkend="guc-max-connections"/>
+ setting is high enough to accommodate all connections.
+ </para>
+ <para>
+ Note that using this mode might cause deadlock failures if certain
+ system catalogs are processed in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n <replaceable class="parameter">schema</replaceable></option></term>
+ <term><option>--schema=<replaceable class="parameter">schema</replaceable></option></term>
+ <listitem>
+ <para>
+ Repack or analyze all tables in
+ <replaceable class="parameter">schema</replaceable> only. Multiple
+ schemas can be repacked by writing multiple <option>-n</option>
+ switches.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-N <replaceable class="parameter">schema</replaceable></option></term>
+ <term><option>--exclude-schema=<replaceable class="parameter">schema</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not repack or analyze any tables in
+ <replaceable class="parameter">schema</replaceable>. Multiple schemas
+ can be excluded by writing multiple <option>-N</option> switches.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not display progress messages.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+ <term><option>--table=<replaceable class="parameter">table</replaceable> [ (<replaceable class="parameter">column</replaceable> [,...]) ]</option></term>
+ <listitem>
+ <para>
+ Repack or analyze <replaceable class="parameter">table</replaceable>
+ only. Column names can be specified only in conjunction with
+ the <option>--analyze</option> option. Multiple tables can be
+ repacked by writing multiple
+ <option>-t</option> switches.
+ </para>
+ <tip>
+ <para>
+ If you specify columns, you probably have to escape the parentheses
+ from the shell. (See examples below.)
+ </para>
+ </tip>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Print detailed information during processing.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_repackdb</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-z</option></term>
+ <term><option>--analyze</option></term>
+ <listitem>
+ <para>
+ Also calculate statistics for use by the optimizer.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_repackdb</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ <para>
+ <application>pg_repackdb</application> also accepts
+ the following command-line arguments for connection parameters:
+ <variablelist>
+ <varlistentry>
+ <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+ <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server
+ is running. If the value begins with a slash, it is used
+ as the directory for the Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+ <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file
+ extension on which the server
+ is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U <replaceable class="parameter">username</replaceable></option></term>
+ <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires
+ password authentication and a password is not available by
+ other means such as a <filename>.pgpass</filename> file, the
+ connection attempt will fail. This option can be useful in
+ batch jobs and scripts where no user is present to enter a
+ password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_repackdb</application> to prompt for a
+ password before connecting to a database.
+ </para>
+
+ <para>
+ This option is never essential, since
+ <application>pg_repackdb</application> will automatically prompt
+ for a password if the server demands password authentication.
+ However, <application>pg_repackdb</application> will waste a
+ connection attempt finding out that the server wants a password.
+ In some cases it is worth typing <option>-W</option> to avoid the extra
+ connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--maintenance-db=<replaceable class="parameter">dbname</replaceable></option></term>
+ <listitem>
+ <para>
+ When the <option>-a</option>/<option>--all</option> is used, connect
+ to this database to gather the list of databases to repack.
+ If not specified, the <literal>postgres</literal> database will be used,
+ or if that does not exist, <literal>template1</literal> will be used.
+ This can be a <link linkend="libpq-connstring">connection
+ string</link>. If so, connection string parameters will override any
+ conflicting command line options. Also, connection string parameters
+ other than the database name itself will be re-used when connecting
+ to other databases.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect1>
+
+
+ <refsect1>
+ <title>Environment</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><envar>PGDATABASE</envar></term>
+ <term><envar>PGHOST</envar></term>
+ <term><envar>PGPORT</envar></term>
+ <term><envar>PGUSER</envar></term>
+
+ <listitem>
+ <para>
+ Default connection parameters
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><envar>PG_COLOR</envar></term>
+ <listitem>
+ <para>
+ Specifies whether to use color in diagnostic messages. Possible values
+ are <literal>always</literal>, <literal>auto</literal> and
+ <literal>never</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ This utility, like most other <productname>PostgreSQL</productname> utilities,
+ also uses the environment variables supported by <application>libpq</application>
+ (see <xref linkend="libpq-envars"/>).
+ </para>
+
+ </refsect1>
+
+
+ <refsect1>
+ <title>Diagnostics</title>
+
+ <para>
+ In case of difficulty, see
+ <xref linkend="sql-repack"/> and <xref linkend="app-psql"/> for
+ discussions of potential problems and error messages.
+ The database server must be running at the
+ targeted host. Also, any default connection settings and environment
+ variables used by the <application>libpq</application> front-end
+ library will apply.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ To repack the database <literal>test</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb test</userinput>
+</screen>
+ </para>
+
+ <para>
+ To repack and analyze for the optimizer a database named
+ <literal>bigdb</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze bigdb</userinput>
+</screen>
+ </para>
+
+ <para>
+ To repack a single table
+ <literal>foo</literal> in a database named
+ <literal>xyzzy</literal>, and analyze a single column
+ <literal>bar</literal> of the table for the optimizer:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --analyze --verbose --table='foo(bar)' xyzzy</userinput>
+</screen></para>
+
+ <para>
+ To repack all tables in the <literal>foo</literal> and <literal>bar</literal> schemas
+ in a database named <literal>xyzzy</literal>:
+<screen>
+<prompt>$ </prompt><userinput>pg_repackdb --schema='foo' --schema='bar' xyzzy</userinput>
+</screen></para>
+
+
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="sql-repack"/></member>
+ </simplelist>
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/repack.sgml b/doc/src/sgml/ref/repack.sgml
new file mode 100644
index 00000000000..fd9d89f8aaa
--- /dev/null
+++ b/doc/src/sgml/ref/repack.sgml
@@ -0,0 +1,284 @@
+<!--
+doc/src/sgml/ref/repack.sgml
+PostgreSQL documentation
+-->
+
+<refentry id="sql-repack">
+ <indexterm zone="sql-repack">
+ <primary>REPACK</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle>REPACK</refentrytitle>
+ <manvolnum>7</manvolnum>
+ <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>REPACK</refname>
+ <refpurpose>rewrite a table to reclaim disk space</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+<synopsis>
+REPACK [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <replaceable class="parameter">table_name</replaceable> [ USING INDEX [ <replaceable class="parameter">index_name</replaceable> ] ] ]
+
+<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
+
+ VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
+ ANALYSE | ANALYZE
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <command>REPACK</command> reclaims storage occupied by dead
+ tuples. Unlike <command>VACUUM</command>, it does so by rewriting the
+ entire contents of the table specified
+ by <replaceable class="parameter">table_name</replaceable> into a new disk
+ file with no extra space (except for the space guaranteed by
+ the <literal>fillfactor</literal> storage parameter), allowing unused space
+ to be returned to the operating system.
+ </para>
+
+ <para>
+ Without
+ a <replaceable class="parameter">table_name</replaceable>, <command>REPACK</command>
+ processes every table and materialized view in the current database that
+ the current user has the <literal>MAINTAIN</literal> privilege on. This
+ form of <command>REPACK</command> cannot be executed inside a transaction
+ block.
+ </para>
+
+ <para>
+ If a <literal>USING INDEX</literal> clause is specified, the rows are
+ physically reordered based on information from an index. Please see the
+ notes on clustering below.
+ </para>
+
+ <para>
+ When a table is being repacked, an <literal>ACCESS EXCLUSIVE</literal> lock
+ is acquired on it. This prevents any other database operations (both reads
+ and writes) from operating on the table until the <command>REPACK</command>
+ is finished.
+ </para>
+
+ <refsect2 id="sql-repack-notes-on-clustering" xreflabel="Notes on Clustering">
+ <title>Notes on Clustering</title>
+
+ <para>
+ If the <literal>USING INDEX</literal> clause is specified, the rows in
+ the table are physically reordered following an index: if an index name
+ is specified in the command, then that index is used; if no index name
+ is specified, then the index that has been configured as the index to
+ cluster on. If no index has been configured in this way, an error is
+ thrown. The index given in the <literal>USING INDEX</literal> clause
+ is configured as the index to cluster on, as well as an index given
+ to the <command>CLUSTER</command> command. An index can be set
+ manually using <command>ALTER TABLE ... CLUSTER ON</command>, and reset
+ with <command>ALTER TABLE ... SET WITHOUT CLUSTER</command>.
+ </para>
+
+ <para>
+ If no table name is specified in <command>REPACK USING INDEX</command>,
+ all tables which have a clustering index defined and which the calling
+ user has privileges for are processed.
+ </para>
+
+ <para>
+ Clustering is a one-time operation: when the table is
+ subsequently updated, the changes are not clustered. That is, no attempt
+ is made to store new or updated rows according to their index order. (If
+ one wishes, one can periodically recluster by issuing the command again.
+ Also, setting the table's <literal>fillfactor</literal> storage parameter
+ to less than 100% can aid in preserving cluster ordering during updates,
+ since updated rows are kept on the same page if enough space is available
+ there.)
+ </para>
+
+ <para>
+ In cases where you are accessing single rows randomly within a table, the
+ actual order of the data in the table is unimportant. However, if you tend
+ to access some data more than others, and there is an index that groups
+ them together, you will benefit from using clustering. If
+ you are requesting a range of indexed values from a table, or a single
+ indexed value that has multiple rows that match,
+ <command>REPACK</command> will help because once the index identifies the
+ table page for the first row that matches, all other rows that match are
+ probably already on the same table page, and so you save disk accesses and
+ speed up the query.
+ </para>
+
+ <para>
+ <command>REPACK</command> can re-sort the table using either an index scan
+ on the specified index (if the index is a b-tree), or a sequential scan
+ followed by sorting. It will attempt to choose the method that will be
+ faster, based on planner cost parameters and available statistical
+ information.
+ </para>
+
+ <para>
+ Because the planner records statistics about the ordering of tables, it is
+ advisable to
+ run <link linkend="sql-analyze"><command>ANALYZE</command></link> on the
+ newly repacked table. Otherwise, the planner might make poor choices of
+ query plans.
+ </para>
+ </refsect2>
+
+ <refsect2 id="sql-repack-notes-on-resources" xreflabel="Notes on Resources">
+ <title>Notes on Resources</title>
+
+ <para>
+ When an index scan or a sequential scan without sort is used, a temporary
+ copy of the table is created that contains the table data in the index
+ order. Temporary copies of each index on the table are created as well.
+ Therefore, you need free space on disk at least equal to the sum of the
+ table size and the index sizes.
+ </para>
+
+ <para>
+ When a sequential scan and sort is used, a temporary sort file is also
+ created, so that the peak temporary space requirement is as much as double
+ the table size, plus the index sizes. This method is often faster than
+ the index scan method, but if the disk space requirement is intolerable,
+ you can disable this choice by temporarily setting
+ <xref linkend="guc-enable-sort"/> to <literal>off</literal>.
+ </para>
+
+ <para>
+ It is advisable to set <xref linkend="guc-maintenance-work-mem"/> to a
+ reasonably large value (but not more than the amount of RAM you can
+ dedicate to the <command>REPACK</command> operation) before repacking.
+ </para>
+ </refsect2>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Parameters</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><replaceable class="parameter">table_name</replaceable></term>
+ <listitem>
+ <para>
+ The name (possibly schema-qualified) of a table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">index_name</replaceable></term>
+ <listitem>
+ <para>
+ The name of an index.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>VERBOSE</literal></term>
+ <listitem>
+ <para>
+ Prints a progress report as each table is repacked
+ at <literal>INFO</literal> level.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><literal>ANALYZE</literal></term>
+ <term><literal>ANALYSE</literal></term>
+ <listitem>
+ <para>
+ Applies <xref linkend="sql-analyze"/> on the table after repacking. This is
+ currently only supported when a single (non-partitioned) table is specified.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><replaceable class="parameter">boolean</replaceable></term>
+ <listitem>
+ <para>
+ Specifies whether the selected option should be turned on or off.
+ You can write <literal>TRUE</literal>, <literal>ON</literal>, or
+ <literal>1</literal> to enable the option, and <literal>FALSE</literal>,
+ <literal>OFF</literal>, or <literal>0</literal> to disable it. The
+ <replaceable class="parameter">boolean</replaceable> value can also
+ be omitted, in which case <literal>TRUE</literal> is assumed.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ To repack a table, one must have the <literal>MAINTAIN</literal> privilege
+ on the table.
+ </para>
+
+ <para>
+ While <command>REPACK</command> is running, the <xref
+ linkend="guc-search-path"/> is temporarily changed to <literal>pg_catalog,
+ pg_temp</literal>.
+ </para>
+
+ <para>
+ Each backend running <command>REPACK</command> will report its progress
+ in the <structname>pg_stat_progress_repack</structname> view. See
+ <xref linkend="repack-progress-reporting"/> for details.
+ </para>
+
+ <para>
+ Repacking a partitioned table repacks each of its partitions. If an index
+ is specified, each partition is repacked using the partition of that
+ index. <command>REPACK</command> on a partitioned table cannot be executed
+ inside a transaction block.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Examples</title>
+
+ <para>
+ Repack the table <literal>employees</literal>:
+<programlisting>
+REPACK employees;
+</programlisting>
+ </para>
+
+ <para>
+ Repack the table <literal>employees</literal> on the basis of its
+ index <literal>employees_ind</literal> (Since index is used here, this is
+ effectively clustering):
+<programlisting>
+REPACK employees USING INDEX employees_ind;
+</programlisting>
+ </para>
+
+ <para>
+ Repack all tables in the database on which you have
+ the <literal>MAINTAIN</literal> privilege:
+<programlisting>
+REPACK;
+</programlisting></para>
+ </refsect1>
+
+ <refsect1>
+ <title>Compatibility</title>
+
+ <para>
+ There is no <command>REPACK</command> statement in the SQL standard.
+ </para>
+
+ </refsect1>
+
+</refentry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index bd5dcaf86a5..062b658cfcd 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -25,7 +25,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
- FULL [ <replaceable class="parameter">boolean</replaceable> ]
FREEZE [ <replaceable class="parameter">boolean</replaceable> ]
VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
ANALYZE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -39,6 +38,7 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
SKIP_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
ONLY_DATABASE_STATS [ <replaceable class="parameter">boolean</replaceable> ]
BUFFER_USAGE_LIMIT <replaceable class="parameter">size</replaceable>
+ FULL [ <replaceable class="parameter">boolean</replaceable> ]
<phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
@@ -95,20 +95,6 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
<title>Parameters</title>
<variablelist>
- <varlistentry>
- <term><literal>FULL</literal></term>
- <listitem>
- <para>
- Selects <quote>full</quote> vacuum, which can reclaim more
- space, but takes much longer and exclusively locks the table.
- This method also requires extra disk space, since it writes a
- new copy of the table and doesn't release the old copy until
- the operation is complete. Usually this should only be used when a
- significant amount of space needs to be reclaimed from within the table.
- </para>
- </listitem>
- </varlistentry>
-
<varlistentry>
<term><literal>FREEZE</literal></term>
<listitem>
@@ -362,6 +348,23 @@ VACUUM [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] [ <re
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><literal>FULL</literal></term>
+ <listitem>
+ <para>
+ This option, which is deprecated, makes <command>VACUUM</command>
+ behave like <command>REPACK</command> without a
+ <literal>USING INDEX</literal> clause.
+ This method of compacting the table takes much longer than
+ <command>VACUUM</command> and exclusively locks the table.
+ This method also requires extra disk space, since it writes a
+ new copy of the table and doesn't release the old copy until
+ the operation is complete. Usually this should only be used when a
+ significant amount of space needs to be reclaimed from within the table.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><replaceable class="parameter">boolean</replaceable></term>
<listitem>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index ff85ace83fc..2ee08e21f41 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -195,6 +195,7 @@
&refreshMaterializedView;
&reindex;
&releaseSavepoint;
+ &repack;
&reset;
&revoke;
&rollback;
@@ -257,6 +258,7 @@
&pgIsready;
&pgReceivewal;
&pgRecvlogical;
+ &pgRepackdb;
&pgRestore;
&pgVerifyBackup;
&psqlRef;
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bcbac844bb6..79f9de5d760 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -741,13 +741,13 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
if (OldIndex != NULL && !use_sort)
{
const int ci_index[] = {
- PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_INDEX_RELID
+ PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_INDEX_RELID
};
int64 ci_val[2];
/* Set phase and OIDOldIndex to columns */
- ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+ ci_val[0] = PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP;
ci_val[1] = RelationGetRelid(OldIndex);
pgstat_progress_update_multi_param(2, ci_index, ci_val);
@@ -759,15 +759,15 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
else
{
/* In scan-and-sort mode and also VACUUM FULL, set phase */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP);
tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
heapScan = (HeapScanDesc) tableScan;
indexScan = NULL;
/* Set total heap blocks */
- pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+ pgstat_progress_update_param(PROGRESS_REPACK_TOTAL_HEAP_BLKS,
heapScan->rs_nblocks);
}
@@ -809,7 +809,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
* is manually updated to the correct value when the table
* scan finishes.
*/
- pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+ pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
heapScan->rs_nblocks);
break;
}
@@ -825,7 +825,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
*/
if (prev_cblock != heapScan->rs_cblock)
{
- pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
+ pgstat_progress_update_param(PROGRESS_REPACK_HEAP_BLKS_SCANNED,
(heapScan->rs_cblock +
heapScan->rs_nblocks -
heapScan->rs_startblock
@@ -912,14 +912,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
* In scan-and-sort mode, report increase in number of tuples
* scanned
*/
- pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
+ pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
*num_tuples);
}
else
{
const int ct_index[] = {
- PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED,
- PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN
+ PROGRESS_REPACK_HEAP_TUPLES_SCANNED,
+ PROGRESS_REPACK_HEAP_TUPLES_WRITTEN
};
int64 ct_val[2];
@@ -952,14 +952,14 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
double n_tuples = 0;
/* Report that we are now sorting tuples */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_SORT_TUPLES);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_SORT_TUPLES);
tuplesort_performsort(tuplesort);
/* Report that we are now writing new heap */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP);
for (;;)
{
@@ -977,7 +977,7 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap,
values, isnull,
rwstate);
/* Report n_tuples */
- pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
+ pgstat_progress_update_param(PROGRESS_REPACK_HEAP_TUPLES_WRITTEN,
n_tuples);
}
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c4029a4f3d3..3063abff9a5 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -4079,7 +4079,7 @@ reindex_relation(const ReindexStmt *stmt, Oid relid, int flags,
Assert(!ReindexIsProcessingIndex(indexOid));
/* Set index rebuild count */
- pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+ pgstat_progress_update_param(PROGRESS_REPACK_INDEX_REBUILD_COUNT,
i);
i++;
}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 1b3c5a55882..b2b7b10c2be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1279,6 +1279,32 @@ CREATE VIEW pg_stat_progress_cluster AS
FROM pg_stat_get_progress_info('CLUSTER') AS S
LEFT JOIN pg_database D ON S.datid = D.oid;
+CREATE VIEW pg_stat_progress_repack AS
+ SELECT
+ S.pid AS pid,
+ S.datid AS datid,
+ D.datname AS datname,
+ S.relid AS relid,
+ -- param1 is currently unused
+ CASE S.param2 WHEN 0 THEN 'initializing'
+ WHEN 1 THEN 'seq scanning heap'
+ WHEN 2 THEN 'index scanning heap'
+ WHEN 3 THEN 'sorting tuples'
+ WHEN 4 THEN 'writing new heap'
+ WHEN 5 THEN 'swapping relation files'
+ WHEN 6 THEN 'rebuilding index'
+ WHEN 7 THEN 'performing final cleanup'
+ END AS phase,
+ CAST(S.param3 AS oid) AS repack_index_relid,
+ S.param4 AS heap_tuples_scanned,
+ S.param5 AS heap_tuples_written,
+ S.param6 AS heap_blks_total,
+ S.param7 AS heap_blks_scanned,
+ S.param8 AS index_rebuild_count
+ FROM pg_stat_get_progress_info('REPACK') AS S
+ LEFT JOIN pg_database D ON S.datid = D.oid;
+
+
CREATE VIEW pg_stat_progress_create_index AS
SELECT
S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..8b64f9e6795 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -67,18 +67,41 @@ typedef struct
Oid indexOid;
} RelToCluster;
-
-static void cluster_multiple_rels(List *rtcs, ClusterParams *params);
-static void rebuild_relation(Relation OldHeap, Relation index, bool verbose);
+static bool cluster_rel_recheck(RepackCommand cmd, Relation OldHeap,
+ Oid indexOid, Oid userid, int options);
+static void rebuild_relation(RepackCommand cmd, bool usingindex,
+ Relation OldHeap, Relation index, bool verbose);
static void copy_table_data(Relation NewHeap, Relation OldHeap, Relation OldIndex,
bool verbose, bool *pSwapToastByContent,
TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
-static List *get_tables_to_cluster(MemoryContext cluster_context);
-static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
- Oid indexOid);
-static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
+static List *get_tables_to_repack(RepackCommand cmd, bool usingindex,
+ MemoryContext permcxt);
+static List *get_tables_to_repack_partitioned(RepackCommand cmd,
+ MemoryContext cluster_context,
+ Oid relid, bool rel_is_index);
+static bool cluster_is_permitted_for_relation(RepackCommand cmd,
+ Oid relid, Oid userid);
+static Relation process_single_relation(RepackStmt *stmt,
+ ClusterParams *params);
+static Oid determine_clustered_index(Relation rel, bool usingindex,
+ const char *indexname);
+static const char *
+RepackCommandAsString(RepackCommand cmd)
+{
+ switch (cmd)
+ {
+ case REPACK_COMMAND_REPACK:
+ return "REPACK";
+ case REPACK_COMMAND_VACUUMFULL:
+ return "VACUUM";
+ case REPACK_COMMAND_CLUSTER:
+ return "CLUSTER";
+ }
+ return "???";
+}
+
/*---------------------------------------------------------------------------
* This cluster code allows for clustering multiple tables at once. Because
* of this, we cannot just run everything on a single transaction, or we
@@ -104,191 +127,155 @@ static bool cluster_is_permitted_for_relation(Oid relid, Oid userid);
*---------------------------------------------------------------------------
*/
void
-cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
+ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel)
{
- ListCell *lc;
ClusterParams params = {0};
- bool verbose = false;
Relation rel = NULL;
- Oid indexOid = InvalidOid;
- MemoryContext cluster_context;
+ MemoryContext repack_context;
List *rtcs;
/* Parse option list */
- foreach(lc, stmt->params)
+ foreach_node(DefElem, opt, stmt->params)
{
- DefElem *opt = (DefElem *) lfirst(lc);
-
if (strcmp(opt->defname, "verbose") == 0)
- verbose = defGetBoolean(opt);
+ params.options |= defGetBoolean(opt) ? CLUOPT_VERBOSE : 0;
+ else if (strcmp(opt->defname, "analyze") == 0 ||
+ strcmp(opt->defname, "analyse") == 0)
+ params.options |= defGetBoolean(opt) ? CLUOPT_ANALYZE : 0;
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("unrecognized CLUSTER option \"%s\"",
+ errmsg("unrecognized %s option \"%s\"",
+ RepackCommandAsString(stmt->command),
opt->defname),
parser_errposition(pstate, opt->location)));
}
- params.options = (verbose ? CLUOPT_VERBOSE : 0);
-
+ /*
+ * If a single relation is specified, process it and we're done ... unless
+ * the relation is a partitioned table, in which case we fall through.
+ */
if (stmt->relation != NULL)
{
- /* This is the single-relation case. */
- Oid tableOid;
-
- /*
- * Find, lock, and check permissions on the table. We obtain
- * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
- * single-transaction case.
- */
- tableOid = RangeVarGetRelidExtended(stmt->relation,
- AccessExclusiveLock,
- 0,
- RangeVarCallbackMaintainsTable,
- NULL);
- rel = table_open(tableOid, NoLock);
-
- /*
- * Reject clustering a remote temp table ... their local buffer
- * manager is not going to cope.
- */
- if (RELATION_IS_OTHER_TEMP(rel))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot cluster temporary tables of other sessions")));
-
- if (stmt->indexname == NULL)
- {
- ListCell *index;
-
- /* We need to find the index that has indisclustered set. */
- foreach(index, RelationGetIndexList(rel))
- {
- indexOid = lfirst_oid(index);
- if (get_index_isclustered(indexOid))
- break;
- indexOid = InvalidOid;
- }
-
- if (!OidIsValid(indexOid))
- ereport(ERROR,
- (errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg("there is no previously clustered index for table \"%s\"",
- stmt->relation->relname)));
- }
- else
- {
- /*
- * The index is expected to be in the same namespace as the
- * relation.
- */
- indexOid = get_relname_relid(stmt->indexname,
- rel->rd_rel->relnamespace);
- if (!OidIsValid(indexOid))
- ereport(ERROR,
- (errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg("index \"%s\" for table \"%s\" does not exist",
- stmt->indexname, stmt->relation->relname)));
- }
-
- /* For non-partitioned tables, do what we came here to do. */
- if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
- {
- cluster_rel(rel, indexOid, ¶ms);
- /* cluster_rel closes the relation, but keeps lock */
-
+ rel = process_single_relation(stmt, ¶ms);
+ if (rel == NULL)
return;
- }
}
+ /* Don't allow this for now. Maybe we can add support for this later */
+ if (params.options & CLUOPT_ANALYZE)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot ANALYZE multiple tables"));
+
/*
* By here, we know we are in a multi-table situation. In order to avoid
* holding locks for too long, we want to process each table in its own
* transaction. This forces us to disallow running inside a user
* transaction block.
*/
- PreventInTransactionBlock(isTopLevel, "CLUSTER");
+ PreventInTransactionBlock(isTopLevel, RepackCommandAsString(stmt->command));
/* Also, we need a memory context to hold our list of relations */
- cluster_context = AllocSetContextCreate(PortalContext,
- "Cluster",
- ALLOCSET_DEFAULT_SIZES);
+ repack_context = AllocSetContextCreate(PortalContext,
+ "Repack",
+ ALLOCSET_DEFAULT_SIZES);
- /*
- * Either we're processing a partitioned table, or we were not given any
- * table name at all. In either case, obtain a list of relations to
- * process.
- *
- * In the former case, an index name must have been given, so we don't
- * need to recheck its "indisclustered" bit, but we have to check that it
- * is an index that we can cluster on. In the latter case, we set the
- * option bit to have indisclustered verified.
- *
- * Rechecking the relation itself is necessary here in all cases.
- */
params.options |= CLUOPT_RECHECK;
- if (rel != NULL)
+
+ /*
+ * If we don't have a relation yet, determine a relation list. If we do,
+ * then it must be a partitioned table, and we want to process its
+ * partitions.
+ */
+ if (rel == NULL)
{
+ Assert(stmt->indexname == NULL);
+ rtcs = get_tables_to_repack(stmt->command, stmt->usingindex,
+ repack_context);
+ }
+ else
+ {
+ Oid relid;
+ bool rel_is_index;
+
Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
- check_index_is_clusterable(rel, indexOid, AccessShareLock);
- rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
- /* close relation, releasing lock on parent table */
+ /*
+ * If an index name was specified, resolve it now and pass it down.
+ */
+ if (stmt->usingindex)
+ {
+ /*
+ * XXX how should this behave? Passing no index to a partitioned
+ * table could be useful to have certain partitions clustered by
+ * some index, and other partitions by a different index.
+ */
+ if (!stmt->indexname)
+ ereport(ERROR,
+ errmsg("there is no previously clustered index for table \"%s\"",
+ RelationGetRelationName(rel)));
+
+ relid = determine_clustered_index(rel, true, stmt->indexname);
+ if (!OidIsValid(relid))
+ elog(ERROR, "unable to determine index to cluster on");
+ /* XXX is this the right place for this check? */
+ check_index_is_clusterable(rel, relid, AccessExclusiveLock);
+ rel_is_index = true;
+ }
+ else
+ {
+ relid = RelationGetRelid(rel);
+ rel_is_index = false;
+ }
+
+ rtcs = get_tables_to_repack_partitioned(stmt->command, repack_context,
+ relid, rel_is_index);
+
+ /* close parent relation, releasing lock on it */
table_close(rel, AccessExclusiveLock);
+ rel = NULL;
}
- else
- {
- rtcs = get_tables_to_cluster(cluster_context);
- params.options |= CLUOPT_RECHECK_ISCLUSTERED;
- }
-
- /* Do the job. */
- cluster_multiple_rels(rtcs, ¶ms);
-
- /* Start a new transaction for the cleanup work. */
- StartTransactionCommand();
-
- /* Clean up working storage */
- MemoryContextDelete(cluster_context);
-}
-
-/*
- * Given a list of relations to cluster, process each of them in a separate
- * transaction.
- *
- * We expect to be in a transaction at start, but there isn't one when we
- * return.
- */
-static void
-cluster_multiple_rels(List *rtcs, ClusterParams *params)
-{
- ListCell *lc;
/* Commit to get out of starting transaction */
PopActiveSnapshot();
CommitTransactionCommand();
/* Cluster the tables, each in a separate transaction */
- foreach(lc, rtcs)
+ Assert(rel == NULL);
+ foreach_ptr(RelToCluster, rtc, rtcs)
{
- RelToCluster *rtc = (RelToCluster *) lfirst(lc);
- Relation rel;
-
/* Start a new transaction for each relation. */
StartTransactionCommand();
+ /*
+ * Open the target table, coping with the case where it has been
+ * dropped.
+ */
+ rel = try_table_open(rtc->tableOid, AccessExclusiveLock);
+ if (rel == NULL)
+ {
+ CommitTransactionCommand();
+ continue;
+ }
+
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- rel = table_open(rtc->tableOid, AccessExclusiveLock);
-
/* Process this table */
- cluster_rel(rel, rtc->indexOid, params);
+ cluster_rel(stmt->command, stmt->usingindex,
+ rel, rtc->indexOid, ¶ms);
/* cluster_rel closes the relation, but keeps lock */
PopActiveSnapshot();
CommitTransactionCommand();
}
+
+ /* Start a new transaction for the cleanup work. */
+ StartTransactionCommand();
+
+ /* Clean up working storage */
+ MemoryContextDelete(repack_context);
}
/*
@@ -304,11 +291,14 @@ cluster_multiple_rels(List *rtcs, ClusterParams *params)
* them incrementally while we load the table.
*
* If indexOid is InvalidOid, the table will be rewritten in physical order
- * instead of index order. This is the new implementation of VACUUM FULL,
- * and error messages should refer to the operation as VACUUM not CLUSTER.
+ * instead of index order.
+ *
+ * 'cmd' indicates which command is being executed, to be used for error
+ * messages.
*/
void
-cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
+cluster_rel(RepackCommand cmd, bool usingindex,
+ Relation OldHeap, Oid indexOid, ClusterParams *params)
{
Oid tableOid = RelationGetRelid(OldHeap);
Oid save_userid;
@@ -323,13 +313,25 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
/* Check for user-requested abort. */
CHECK_FOR_INTERRUPTS();
- pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
- if (OidIsValid(indexOid))
- pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+ if (cmd == REPACK_COMMAND_REPACK)
+ pgstat_progress_start_command(PROGRESS_COMMAND_REPACK, tableOid);
+ else
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
+ if (cmd == REPACK_COMMAND_REPACK)
+ pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
+ PROGRESS_REPACK_COMMAND_REPACK);
+ else if (cmd == REPACK_COMMAND_CLUSTER)
+ {
+ pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
PROGRESS_CLUSTER_COMMAND_CLUSTER);
+ }
else
- pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+ {
+ Assert(cmd == REPACK_COMMAND_VACUUMFULL);
+ pgstat_progress_update_param(PROGRESS_REPACK_COMMAND,
PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+ }
/*
* Switch to the table owner's userid, so that any index functions are run
@@ -351,63 +353,21 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
* to cluster a not-previously-clustered index.
*/
if (recheck)
- {
- /* Check that the user still has privileges for the relation */
- if (!cluster_is_permitted_for_relation(tableOid, save_userid))
- {
- relation_close(OldHeap, AccessExclusiveLock);
+ if (!cluster_rel_recheck(cmd, OldHeap, indexOid, save_userid,
+ params->options))
goto out;
- }
-
- /*
- * Silently skip a temp table for a remote session. Only doing this
- * check in the "recheck" case is appropriate (which currently means
- * somebody is executing a database-wide CLUSTER or on a partitioned
- * table), because there is another check in cluster() which will stop
- * any attempt to cluster remote temp tables by name. There is
- * another check in cluster_rel which is redundant, but we leave it
- * for extra safety.
- */
- if (RELATION_IS_OTHER_TEMP(OldHeap))
- {
- relation_close(OldHeap, AccessExclusiveLock);
- goto out;
- }
-
- if (OidIsValid(indexOid))
- {
- /*
- * Check that the index still exists
- */
- if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
- {
- relation_close(OldHeap, AccessExclusiveLock);
- goto out;
- }
-
- /*
- * Check that the index is still the one with indisclustered set,
- * if needed.
- */
- if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
- !get_index_isclustered(indexOid))
- {
- relation_close(OldHeap, AccessExclusiveLock);
- goto out;
- }
- }
- }
/*
- * We allow VACUUM FULL, but not CLUSTER, on shared catalogs. CLUSTER
- * would work in most respects, but the index would only get marked as
- * indisclustered in the current database, leading to unexpected behavior
- * if CLUSTER were later invoked in another database.
+ * We allow repacking shared catalogs only when not using an index. It
+ * would work to use an index in most respects, but the index would only
+ * get marked as indisclustered in the current database, leading to
+ * unexpected behavior if CLUSTER were later invoked in another database.
*/
- if (OidIsValid(indexOid) && OldHeap->rd_rel->relisshared)
+ if (usingindex && OldHeap->rd_rel->relisshared)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot cluster a shared catalog")));
+ errmsg("cannot run \"%s\" on a shared catalog",
+ RepackCommandAsString(cmd))));
/*
* Don't process temp tables of other backends ... their local buffer
@@ -415,21 +375,30 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
*/
if (RELATION_IS_OTHER_TEMP(OldHeap))
{
- if (OidIsValid(indexOid))
+ if (cmd == REPACK_COMMAND_CLUSTER)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot cluster temporary tables of other sessions")));
+ else if (cmd == REPACK_COMMAND_REPACK)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot repack temporary tables of other sessions")));
+ }
else
+ {
+ Assert(cmd == REPACK_COMMAND_VACUUMFULL);
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot vacuum temporary tables of other sessions")));
+ }
}
/*
* Also check for active uses of the relation in the current transaction,
* including open scans and pending AFTER trigger events.
*/
- CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
+ CheckTableNotInUse(OldHeap, RepackCommandAsString(cmd));
/* Check heap and index are valid to cluster on */
if (OidIsValid(indexOid))
@@ -469,7 +438,7 @@ cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params)
TransferPredicateLocksToHeapRelation(OldHeap);
/* rebuild_relation does all the dirty work */
- rebuild_relation(OldHeap, index, verbose);
+ rebuild_relation(cmd, usingindex, OldHeap, index, verbose);
/* rebuild_relation closes OldHeap, and index if valid */
out:
@@ -482,6 +451,63 @@ out:
pgstat_progress_end_command();
}
+/*
+ * Check if the table (and its index) still meets the requirements of
+ * cluster_rel().
+ */
+static bool
+cluster_rel_recheck(RepackCommand cmd, Relation OldHeap, Oid indexOid,
+ Oid userid, int options)
+{
+ Oid tableOid = RelationGetRelid(OldHeap);
+
+ /* Check that the user still has privileges for the relation */
+ if (!cluster_is_permitted_for_relation(cmd, tableOid, userid))
+ {
+ relation_close(OldHeap, AccessExclusiveLock);
+ return false;
+ }
+
+ /*
+ * Silently skip a temp table for a remote session. Only doing this check
+ * in the "recheck" case is appropriate (which currently means somebody is
+ * executing a database-wide CLUSTER or on a partitioned table), because
+ * there is another check in cluster() which will stop any attempt to
+ * cluster remote temp tables by name. There is another check in
+ * cluster_rel which is redundant, but we leave it for extra safety.
+ */
+ if (RELATION_IS_OTHER_TEMP(OldHeap))
+ {
+ relation_close(OldHeap, AccessExclusiveLock);
+ return false;
+ }
+
+ if (OidIsValid(indexOid))
+ {
+ /*
+ * Check that the index still exists
+ */
+ if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
+ {
+ relation_close(OldHeap, AccessExclusiveLock);
+ return false;
+ }
+
+ /*
+ * Check that the index is still the one with indisclustered set, if
+ * needed.
+ */
+ if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+ !get_index_isclustered(indexOid))
+ {
+ relation_close(OldHeap, AccessExclusiveLock);
+ return false;
+ }
+ }
+
+ return true;
+}
+
/*
* Verify that the specified heap and index are valid to cluster on
*
@@ -626,7 +652,8 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
* On exit, they are closed, but locks on them are not released.
*/
static void
-rebuild_relation(Relation OldHeap, Relation index, bool verbose)
+rebuild_relation(RepackCommand cmd, bool usingindex,
+ Relation OldHeap, Relation index, bool verbose)
{
Oid tableOid = RelationGetRelid(OldHeap);
Oid accessMethod = OldHeap->rd_rel->relam;
@@ -642,8 +669,8 @@ rebuild_relation(Relation OldHeap, Relation index, bool verbose)
Assert(CheckRelationLockedByMe(OldHeap, AccessExclusiveLock, false) &&
(index == NULL || CheckRelationLockedByMe(index, AccessExclusiveLock, false)));
- if (index)
- /* Mark the correct index as clustered */
+ /* for CLUSTER or REPACK USING INDEX, mark the index as the one to use */
+ if (usingindex)
mark_index_clustered(OldHeap, RelationGetRelid(index), true);
/* Remember info about rel before closing OldHeap */
@@ -1458,8 +1485,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
int i;
/* Report that we are now swapping relation files */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_SWAP_REL_FILES);
/* Zero out possible results from swapped_relation_files */
memset(mapped_tables, 0, sizeof(mapped_tables));
@@ -1509,14 +1536,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
/* Report that we are now reindexing relations */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_REBUILD_INDEX);
reindex_relation(NULL, OIDOldHeap, reindex_flags, &reindex_params);
/* Report that we are now doing clean up */
- pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
- PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
+ pgstat_progress_update_param(PROGRESS_REPACK_PHASE,
+ PROGRESS_REPACK_PHASE_FINAL_CLEANUP);
/*
* If the relation being rebuilt is pg_class, swap_relation_files()
@@ -1632,69 +1659,137 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
}
}
-
/*
- * Get a list of tables that the current user has privileges on and
- * have indisclustered set. Return the list in a List * of RelToCluster
- * (stored in the specified memory context), each one giving the tableOid
- * and the indexOid on which the table is already clustered.
+ * Determine which relations to process, when REPACK/CLUSTER is called
+ * without specifying a table name. The exact process depends on whether
+ * USING INDEX was given or not, and in any case we only return tables and
+ * materialized views that the current user has privileges to repack/cluster.
+ *
+ * If USING INDEX was given, we scan pg_index to find those that have
+ * indisclustered set; if it was not given, scan pg_class and return all
+ * tables.
+ *
+ * Return it as a list of RelToCluster in the given memory context.
*/
static List *
-get_tables_to_cluster(MemoryContext cluster_context)
+get_tables_to_repack(RepackCommand command, bool usingindex,
+ MemoryContext permcxt)
{
- Relation indRelation;
+ Relation catalog;
TableScanDesc scan;
- ScanKeyData entry;
- HeapTuple indexTuple;
- Form_pg_index index;
+ HeapTuple tuple;
MemoryContext old_context;
List *rtcs = NIL;
- /*
- * Get all indexes that have indisclustered set and that the current user
- * has the appropriate privileges for.
- */
- indRelation = table_open(IndexRelationId, AccessShareLock);
- ScanKeyInit(&entry,
- Anum_pg_index_indisclustered,
- BTEqualStrategyNumber, F_BOOLEQ,
- BoolGetDatum(true));
- scan = table_beginscan_catalog(indRelation, 1, &entry);
- while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+ if (usingindex)
{
- RelToCluster *rtc;
+ ScanKeyData entry;
- index = (Form_pg_index) GETSTRUCT(indexTuple);
+ catalog = table_open(IndexRelationId, AccessShareLock);
+ ScanKeyInit(&entry,
+ Anum_pg_index_indisclustered,
+ BTEqualStrategyNumber, F_BOOLEQ,
+ BoolGetDatum(true));
+ scan = table_beginscan_catalog(catalog, 1, &entry);
+ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+ {
+ RelToCluster *rtc;
+ Form_pg_index index;
- if (!cluster_is_permitted_for_relation(index->indrelid, GetUserId()))
- continue;
+ index = (Form_pg_index) GETSTRUCT(tuple);
- /* Use a permanent memory context for the result list */
- old_context = MemoryContextSwitchTo(cluster_context);
+ /*
+ * XXX I think the only reason there's no test failure here is
+ * that we seldom have clustered indexes that would be affected by
+ * concurrency. Maybe we should also do the
+ * ConditionalLockRelationOid+SearchSysCacheExists dance that we
+ * do below.
+ */
+ if (!cluster_is_permitted_for_relation(command, index->indrelid,
+ GetUserId()))
+ continue;
- rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
- rtc->tableOid = index->indrelid;
- rtc->indexOid = index->indexrelid;
- rtcs = lappend(rtcs, rtc);
+ /* Use a permanent memory context for the result list */
+ old_context = MemoryContextSwitchTo(permcxt);
- MemoryContextSwitchTo(old_context);
+ rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+ rtc->tableOid = index->indrelid;
+ rtc->indexOid = index->indexrelid;
+ rtcs = lappend(rtcs, rtc);
+
+ MemoryContextSwitchTo(old_context);
+ }
}
+ else
+ {
+ catalog = table_open(RelationRelationId, AccessShareLock);
+ scan = table_beginscan_catalog(catalog, 0, NULL);
+
+ while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+ {
+ RelToCluster *rtc;
+ Form_pg_class class;
+
+ class = (Form_pg_class) GETSTRUCT(tuple);
+
+ /*
+ * Try to obtain a light lock on the table, to ensure it doesn't
+ * go away while we collect the list. If we cannot, just
+ * disregard the table. XXX we could release at the bottom of the
+ * loop, but for now just hold it until this transaction is
+ * finished.
+ */
+ if (!ConditionalLockRelationOid(class->oid, AccessShareLock))
+ continue;
+
+ /* Verify that the table still exists. */
+ if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(class->oid)))
+ {
+ /* Release useless lock */
+ UnlockRelationOid(class->oid, AccessShareLock);
+ continue;
+ }
+
+ /* Can only process plain tables and matviews */
+ if (class->relkind != RELKIND_RELATION &&
+ class->relkind != RELKIND_MATVIEW)
+ continue;
+
+ if (!cluster_is_permitted_for_relation(command, class->oid,
+ GetUserId()))
+ continue;
+
+ /* Use a permanent memory context for the result list */
+ old_context = MemoryContextSwitchTo(permcxt);
+
+ rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+ rtc->tableOid = class->oid;
+ rtc->indexOid = InvalidOid;
+ rtcs = lappend(rtcs, rtc);
+
+ MemoryContextSwitchTo(old_context);
+ }
+ }
+
table_endscan(scan);
-
- relation_close(indRelation, AccessShareLock);
+ relation_close(catalog, AccessShareLock);
return rtcs;
}
/*
- * Given an index on a partitioned table, return a list of RelToCluster for
+ * Given a partitioned table or its index, return a list of RelToCluster for
* all the children leaves tables/indexes.
*
* Like expand_vacuum_rel, but here caller must hold AccessExclusiveLock
* on the table containing the index.
+ *
+ * 'rel_is_index' tells whether 'relid' is that of an index (true) or of the
+ * owning relation.
*/
static List *
-get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+get_tables_to_repack_partitioned(RepackCommand cmd, MemoryContext cluster_context,
+ Oid relid, bool rel_is_index)
{
List *inhoids;
ListCell *lc;
@@ -1702,17 +1797,33 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
MemoryContext old_context;
/* Do not lock the children until they're processed */
- inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+ inhoids = find_all_inheritors(relid, NoLock, NULL);
foreach(lc, inhoids)
{
- Oid indexrelid = lfirst_oid(lc);
- Oid relid = IndexGetRelation(indexrelid, false);
+ Oid inhoid = lfirst_oid(lc);
+ Oid inhrelid,
+ inhindid;
RelToCluster *rtc;
- /* consider only leaf indexes */
- if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
- continue;
+ if (rel_is_index)
+ {
+ /* consider only leaf indexes */
+ if (get_rel_relkind(inhoid) != RELKIND_INDEX)
+ continue;
+
+ inhrelid = IndexGetRelation(inhoid, false);
+ inhindid = inhoid;
+ }
+ else
+ {
+ /* consider only leaf relations */
+ if (get_rel_relkind(inhoid) != RELKIND_RELATION)
+ continue;
+
+ inhrelid = inhoid;
+ inhindid = InvalidOid;
+ }
/*
* It's possible that the user does not have privileges to CLUSTER the
@@ -1720,15 +1831,15 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
* table. We skip any partitions which the user is not permitted to
* CLUSTER.
*/
- if (!cluster_is_permitted_for_relation(relid, GetUserId()))
+ if (!cluster_is_permitted_for_relation(cmd, inhrelid, GetUserId()))
continue;
/* Use a permanent memory context for the result list */
old_context = MemoryContextSwitchTo(cluster_context);
rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
- rtc->tableOid = relid;
- rtc->indexOid = indexrelid;
+ rtc->tableOid = inhrelid;
+ rtc->indexOid = inhindid;
rtcs = lappend(rtcs, rtc);
MemoryContextSwitchTo(old_context);
@@ -1742,13 +1853,148 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
* function emits a WARNING.
*/
static bool
-cluster_is_permitted_for_relation(Oid relid, Oid userid)
+cluster_is_permitted_for_relation(RepackCommand cmd, Oid relid, Oid userid)
{
if (pg_class_aclcheck(relid, userid, ACL_MAINTAIN) == ACLCHECK_OK)
return true;
+ Assert(cmd == REPACK_COMMAND_CLUSTER || cmd == REPACK_COMMAND_REPACK);
ereport(WARNING,
- (errmsg("permission denied to cluster \"%s\", skipping it",
- get_rel_name(relid))));
+ errmsg("permission denied to execute %s on \"%s\", skipping it",
+ cmd == REPACK_COMMAND_CLUSTER ? "CLUSTER" : "REPACK",
+ get_rel_name(relid)));
+
return false;
}
+
+
+/*
+ * Given a RepackStmt with an indicated relation name, resolve the relation
+ * name, obtain lock on it, then determine what to do based on the relation
+ * type: if it's not a partitioned table, repack it as indicated (using an
+ * existing clustered index, or following the indicated index), and return
+ * NULL.
+ *
+ * On the other hand, if the table is partitioned, do nothing further and
+ * instead return the opened relcache entry, so that caller can process the
+ * partitions using the multiple-table handling code. The index name is not
+ * resolve in this case.
+ */
+static Relation
+process_single_relation(RepackStmt *stmt, ClusterParams *params)
+{
+ Relation rel;
+ Oid tableOid;
+
+ Assert(stmt->relation != NULL);
+ Assert(stmt->command == REPACK_COMMAND_CLUSTER ||
+ stmt->command == REPACK_COMMAND_REPACK);
+
+ /*
+ * Find, lock, and check permissions on the table. We obtain
+ * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+ * single-transaction case.
+ */
+ tableOid = RangeVarGetRelidExtended(stmt->relation,
+ AccessExclusiveLock,
+ 0,
+ RangeVarCallbackMaintainsTable,
+ NULL);
+ rel = table_open(tableOid, NoLock);
+
+ /*
+ * Reject clustering a remote temp table ... their local buffer manager is
+ * not going to cope.
+ */
+ if (RELATION_IS_OTHER_TEMP(rel))
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot execute %s on temporary tables of other sessions",
+ RepackCommandAsString(stmt->command)));
+ }
+
+ /*
+ * For partitioned tables, let caller handle this. Otherwise, process it
+ * here and we're done.
+ */
+ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return rel;
+ else
+ {
+ Oid indexOid;
+
+ indexOid = determine_clustered_index(rel, stmt->usingindex,
+ stmt->indexname);
+ if (OidIsValid(indexOid))
+ check_index_is_clusterable(rel, indexOid, AccessExclusiveLock);
+ cluster_rel(stmt->command, stmt->usingindex, rel, indexOid, params);
+
+ /* Do an analyze, if requested */
+ if (params->options & CLUOPT_ANALYZE)
+ {
+ VacuumParams vac_params = {0};
+
+ vac_params.options |= VACOPT_ANALYZE;
+ if (params->options & CLUOPT_VERBOSE)
+ vac_params.options |= VACOPT_VERBOSE;
+ analyze_rel(RelationGetRelid(rel), NULL, vac_params, NIL, true,
+ NULL);
+ }
+
+ return NULL;
+ }
+}
+
+/*
+ * Given a relation and the usingindex/indexname options in a
+ * REPACK USING INDEX or CLUSTER command, return the OID of the index to use
+ * for clustering the table.
+ *
+ * Caller must hold lock on the relation so that the set of indexes doesn't
+ * change, and must call check_index_is_clusterable.
+ */
+static Oid
+determine_clustered_index(Relation rel, bool usingindex, const char *indexname)
+{
+ Oid indexOid;
+
+ if (indexname == NULL && usingindex)
+ {
+ ListCell *lc;
+
+ /* Find an index with indisclustered set, or report error */
+ foreach(lc, RelationGetIndexList(rel))
+ {
+ indexOid = lfirst_oid(lc);
+
+ if (get_index_isclustered(indexOid))
+ break;
+ indexOid = InvalidOid;
+ }
+
+ if (!OidIsValid(indexOid))
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("there is no previously clustered index for table \"%s\"",
+ RelationGetRelationName(rel)));
+ }
+ else if (indexname != NULL)
+ {
+ /*
+ * An index was specified; figure out its OID. It must be in the same
+ * namespace as the relation.
+ */
+ indexOid = get_relname_relid(indexname,
+ rel->rd_rel->relnamespace);
+ if (!OidIsValid(indexOid))
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("index \"%s\" for table \"%s\" does not exist",
+ indexname, RelationGetRelationName(rel)));
+ }
+ else
+ indexOid = InvalidOid;
+
+ return indexOid;
+}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 733ef40ae7c..8863ad0e8bd 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2287,7 +2287,8 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
cluster_params.options |= CLUOPT_VERBOSE;
/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
- cluster_rel(rel, InvalidOid, &cluster_params);
+ cluster_rel(REPACK_COMMAND_VACUUMFULL, false, rel, InvalidOid,
+ &cluster_params);
/* cluster_rel closes the relation, but keeps lock */
rel = NULL;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index db43034b9db..f9152728021 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -280,7 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
AlterCompositeTypeStmt AlterUserMappingStmt
AlterRoleStmt AlterRoleSetStmt AlterPolicyStmt AlterStatsStmt
AlterDefaultPrivilegesStmt DefACLAction
- AnalyzeStmt CallStmt ClosePortalStmt ClusterStmt CommentStmt
+ AnalyzeStmt CallStmt ClosePortalStmt CommentStmt
ConstraintsSetStmt CopyStmt CreateAsStmt CreateCastStmt
CreateDomainStmt CreateExtensionStmt CreateGroupStmt CreateOpClassStmt
CreateOpFamilyStmt AlterOpFamilyStmt CreatePLangStmt
@@ -297,7 +297,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
GrantStmt GrantRoleStmt ImportForeignSchemaStmt IndexStmt InsertStmt
ListenStmt LoadStmt LockStmt MergeStmt NotifyStmt ExplainableStmt PreparableStmt
CreateFunctionStmt AlterFunctionStmt ReindexStmt RemoveAggrStmt
- RemoveFuncStmt RemoveOperStmt RenameStmt ReturnStmt RevokeStmt RevokeRoleStmt
+ RemoveFuncStmt RemoveOperStmt RenameStmt RepackStmt ReturnStmt RevokeStmt RevokeRoleStmt
RuleActionStmt RuleActionStmtOrEmpty RuleStmt
SecLabelStmt SelectStmt TransactionStmt TransactionStmtLegacy TruncateStmt
UnlistenStmt UpdateStmt VacuumStmt
@@ -316,7 +316,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <str> opt_single_name
%type <list> opt_qualified_name
-%type <boolean> opt_concurrently
+%type <boolean> opt_concurrently opt_usingindex
%type <dbehavior> opt_drop_behavior
%type <list> opt_utility_option_list
%type <list> utility_option_list
@@ -763,7 +763,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
QUOTE QUOTES
RANGE READ REAL REASSIGN RECURSIVE REF_P REFERENCES REFERENCING
- REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
+ REFRESH REINDEX RELATIVE_P RELEASE RENAME REPACK REPEATABLE REPLACE REPLICA
RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROUTINES ROW ROWS RULE
@@ -1025,7 +1025,6 @@ stmt:
| CallStmt
| CheckPointStmt
| ClosePortalStmt
- | ClusterStmt
| CommentStmt
| ConstraintsSetStmt
| CopyStmt
@@ -1099,6 +1098,7 @@ stmt:
| RemoveFuncStmt
| RemoveOperStmt
| RenameStmt
+ | RepackStmt
| RevokeStmt
| RevokeRoleStmt
| RuleStmt
@@ -1135,6 +1135,11 @@ opt_concurrently:
| /*EMPTY*/ { $$ = false; }
;
+opt_usingindex:
+ USING INDEX { $$ = true; }
+ | /* EMPTY */ { $$ = false; }
+ ;
+
opt_drop_behavior:
CASCADE { $$ = DROP_CASCADE; }
| RESTRICT { $$ = DROP_RESTRICT; }
@@ -11912,38 +11917,91 @@ CreateConversionStmt:
/*****************************************************************************
*
* QUERY:
+ * REPACK [ (options) ] [ <qualified_name> [ USING INDEX <index_name> ] ]
+ *
+ * obsolete variants:
* CLUSTER (options) [ <qualified_name> [ USING <index_name> ] ]
* CLUSTER [VERBOSE] [ <qualified_name> [ USING <index_name> ] ]
* CLUSTER [VERBOSE] <index_name> ON <qualified_name> (for pre-8.3)
*
*****************************************************************************/
-ClusterStmt:
- CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+RepackStmt:
+ REPACK opt_utility_option_list qualified_name USING INDEX name
{
- ClusterStmt *n = makeNode(ClusterStmt);
+ RepackStmt *n = makeNode(RepackStmt);
+ n->command = REPACK_COMMAND_REPACK;
+ n->relation = $3;
+ n->indexname = $6;
+ n->usingindex = true;
+ n->params = $2;
+ $$ = (Node *) n;
+ }
+ | REPACK opt_utility_option_list qualified_name opt_usingindex
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->command = REPACK_COMMAND_REPACK;
+ n->relation = $3;
+ n->indexname = NULL;
+ n->usingindex = $4;
+ n->params = $2;
+ $$ = (Node *) n;
+ }
+ | REPACK '(' utility_option_list ')'
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->command = REPACK_COMMAND_REPACK;
+ n->relation = NULL;
+ n->indexname = NULL;
+ n->usingindex = false;
+ n->params = $3;
+ $$ = (Node *) n;
+ }
+ | REPACK opt_usingindex
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->command = REPACK_COMMAND_REPACK;
+ n->relation = NULL;
+ n->indexname = NULL;
+ n->usingindex = $2;
+ n->params = NIL;
+ $$ = (Node *) n;
+ }
+ | CLUSTER '(' utility_option_list ')' qualified_name cluster_index_specification
+ {
+ RepackStmt *n = makeNode(RepackStmt);
+
+ n->command = REPACK_COMMAND_CLUSTER;
n->relation = $5;
n->indexname = $6;
+ n->usingindex = true;
n->params = $3;
$$ = (Node *) n;
}
| CLUSTER opt_utility_option_list
{
- ClusterStmt *n = makeNode(ClusterStmt);
+ RepackStmt *n = makeNode(RepackStmt);
+ n->command = REPACK_COMMAND_CLUSTER;
n->relation = NULL;
n->indexname = NULL;
+ n->usingindex = true;
n->params = $2;
$$ = (Node *) n;
}
/* unparenthesized VERBOSE kept for pre-14 compatibility */
| CLUSTER opt_verbose qualified_name cluster_index_specification
{
- ClusterStmt *n = makeNode(ClusterStmt);
+ RepackStmt *n = makeNode(RepackStmt);
+ n->command = REPACK_COMMAND_CLUSTER;
n->relation = $3;
n->indexname = $4;
+ n->usingindex = true;
if ($2)
n->params = list_make1(makeDefElem("verbose", NULL, @2));
$$ = (Node *) n;
@@ -11951,20 +12009,24 @@ ClusterStmt:
/* unparenthesized VERBOSE kept for pre-17 compatibility */
| CLUSTER VERBOSE
{
- ClusterStmt *n = makeNode(ClusterStmt);
+ RepackStmt *n = makeNode(RepackStmt);
+ n->command = REPACK_COMMAND_CLUSTER;
n->relation = NULL;
n->indexname = NULL;
+ n->usingindex = true;
n->params = list_make1(makeDefElem("verbose", NULL, @2));
$$ = (Node *) n;
}
/* kept for pre-8.3 compatibility */
| CLUSTER opt_verbose name ON qualified_name
{
- ClusterStmt *n = makeNode(ClusterStmt);
+ RepackStmt *n = makeNode(RepackStmt);
+ n->command = REPACK_COMMAND_CLUSTER;
n->relation = $5;
n->indexname = $3;
+ n->usingindex = true;
if ($2)
n->params = list_make1(makeDefElem("verbose", NULL, @2));
$$ = (Node *) n;
@@ -17960,6 +18022,7 @@ unreserved_keyword:
| RELATIVE_P
| RELEASE
| RENAME
+ | REPACK
| REPEATABLE
| REPLACE
| REPLICA
@@ -18592,6 +18655,7 @@ bare_label_keyword:
| RELATIVE_P
| RELEASE
| RENAME
+ | REPACK
| REPEATABLE
| REPLACE
| REPLICA
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5f442bc3bd4..cf6db581007 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -277,9 +277,9 @@ ClassifyUtilityCommandAsReadOnly(Node *parsetree)
return COMMAND_OK_IN_RECOVERY | COMMAND_OK_IN_READ_ONLY_TXN;
}
- case T_ClusterStmt:
case T_ReindexStmt:
case T_VacuumStmt:
+ case T_RepackStmt:
{
/*
* These commands write WAL, so they're not strictly
@@ -854,14 +854,14 @@ standard_ProcessUtility(PlannedStmt *pstmt,
ExecuteCallStmt(castNode(CallStmt, parsetree), params, isAtomicContext, dest);
break;
- case T_ClusterStmt:
- cluster(pstate, (ClusterStmt *) parsetree, isTopLevel);
- break;
-
case T_VacuumStmt:
ExecVacuum(pstate, (VacuumStmt *) parsetree, isTopLevel);
break;
+ case T_RepackStmt:
+ ExecRepack(pstate, (RepackStmt *) parsetree, isTopLevel);
+ break;
+
case T_ExplainStmt:
ExplainQuery(pstate, (ExplainStmt *) parsetree, params, dest);
break;
@@ -2851,10 +2851,6 @@ CreateCommandTag(Node *parsetree)
tag = CMDTAG_CALL;
break;
- case T_ClusterStmt:
- tag = CMDTAG_CLUSTER;
- break;
-
case T_VacuumStmt:
if (((VacuumStmt *) parsetree)->is_vacuumcmd)
tag = CMDTAG_VACUUM;
@@ -2862,6 +2858,10 @@ CreateCommandTag(Node *parsetree)
tag = CMDTAG_ANALYZE;
break;
+ case T_RepackStmt:
+ tag = CMDTAG_REPACK;
+ break;
+
case T_ExplainStmt:
tag = CMDTAG_EXPLAIN;
break;
@@ -3499,7 +3499,7 @@ GetCommandLogLevel(Node *parsetree)
lev = LOGSTMT_ALL;
break;
- case T_ClusterStmt:
+ case T_RepackStmt:
lev = LOGSTMT_DDL;
break;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index c756c2bebaa..a1e10e8c2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -268,6 +268,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
+ else if (pg_strcasecmp(cmd, "REPACK") == 0)
+ cmdtype = PROGRESS_COMMAND_REPACK;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
cmdtype = PROGRESS_COMMAND_CREATE_INDEX;
else if (pg_strcasecmp(cmd, "BASEBACKUP") == 0)
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 8b10f2313f3..59ff6e0923b 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -1247,7 +1247,7 @@ static const char *const sql_commands[] = {
"DELETE FROM", "DISCARD", "DO", "DROP", "END", "EXECUTE", "EXPLAIN",
"FETCH", "GRANT", "IMPORT FOREIGN SCHEMA", "INSERT INTO", "LISTEN", "LOAD", "LOCK",
"MERGE INTO", "MOVE", "NOTIFY", "PREPARE",
- "REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE",
+ "REASSIGN", "REFRESH MATERIALIZED VIEW", "REINDEX", "RELEASE", "REPACK",
"RESET", "REVOKE", "ROLLBACK",
"SAVEPOINT", "SECURITY LABEL", "SELECT", "SET", "SHOW", "START",
"TABLE", "TRUNCATE", "UNLISTEN", "UPDATE", "VACUUM", "VALUES", "WITH",
@@ -4997,6 +4997,37 @@ match_previous_words(int pattern_id,
COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
}
+/* REPACK */
+ else if (Matches("REPACK"))
+ COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+ else if (Matches("REPACK", "(*)"))
+ COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_clusterables);
+ /* If we have REPACK <sth>, then add "USING INDEX" */
+ else if (Matches("REPACK", MatchAnyExcept("(")))
+ COMPLETE_WITH("USING INDEX");
+ /* If we have REPACK (*) <sth>, then add "USING INDEX" */
+ else if (Matches("REPACK", "(*)", MatchAny))
+ COMPLETE_WITH("USING INDEX");
+ /* If we have REPACK <sth> USING, then add the index as well */
+ else if (Matches("REPACK", MatchAny, "USING", "INDEX"))
+ {
+ set_completion_reference(prev3_wd);
+ COMPLETE_WITH_SCHEMA_QUERY(Query_for_index_of_table);
+ }
+ else if (HeadMatches("REPACK", "(*") &&
+ !HeadMatches("REPACK", "(*)"))
+ {
+ /*
+ * This fires if we're in an unfinished parenthesized option list.
+ * get_previous_words treats a completed parenthesized option list as
+ * one word, so the above test is correct.
+ */
+ if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
+ COMPLETE_WITH("VERBOSE");
+ else if (TailMatches("VERBOSE"))
+ COMPLETE_WITH("ON", "OFF");
+ }
+
/* SECURITY LABEL */
else if (Matches("SECURITY"))
COMPLETE_WITH("LABEL");
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index 019ca06455d..f0c1bd4175c 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -16,7 +16,7 @@ subdir = src/bin/scripts
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
-PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready
+PROGRAMS = createdb createuser dropdb dropuser clusterdb vacuumdb reindexdb pg_isready pg_repackdb
override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
@@ -31,6 +31,7 @@ clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport su
vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+pg_repackdb: pg_repackdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
install: all installdirs
$(INSTALL_PROGRAM) createdb$(X) '$(DESTDIR)$(bindir)'/createdb$(X)
@@ -41,6 +42,7 @@ install: all installdirs
$(INSTALL_PROGRAM) vacuumdb$(X) '$(DESTDIR)$(bindir)'/vacuumdb$(X)
$(INSTALL_PROGRAM) reindexdb$(X) '$(DESTDIR)$(bindir)'/reindexdb$(X)
$(INSTALL_PROGRAM) pg_isready$(X) '$(DESTDIR)$(bindir)'/pg_isready$(X)
+ $(INSTALL_PROGRAM) pg_repackdb$(X) '$(DESTDIR)$(bindir)'/pg_repackdb$(X)
installdirs:
$(MKDIR_P) '$(DESTDIR)$(bindir)'
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index a4fed59d1c9..18410fb80dd 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -42,6 +42,7 @@ vacuuming_common = static_library('libvacuuming_common',
binaries = [
'vacuumdb',
+ 'pg_repackdb'
]
foreach binary : binaries
binary_sources = files('@[email protected]'.format(binary))
@@ -80,6 +81,7 @@ tests += {
't/100_vacuumdb.pl',
't/101_vacuumdb_all.pl',
't/102_vacuumdb_stages.pl',
+ 't/103_repackdb.pl',
't/200_connstr.pl',
],
},
diff --git a/src/bin/scripts/pg_repackdb.c b/src/bin/scripts/pg_repackdb.c
new file mode 100644
index 00000000000..23326372a77
--- /dev/null
+++ b/src/bin/scripts/pg_repackdb.c
@@ -0,0 +1,226 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_repackdb
+ * An utility to run REPACK
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * FIXME: this is missing a way to specify the index to use to repack one
+ * table, or whether to pass a WITH INDEX clause when multiple tables are
+ * used. Something like --index[=indexname]. Adding that bleeds into
+ * vacuuming.c as well.
+ *
+ * src/bin/scripts/pg_repackdb.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "common.h"
+#include "common/logging.h"
+#include "fe_utils/option_utils.h"
+#include "vacuuming.h"
+
+static void help(const char *progname);
+void check_objfilter(void);
+
+int
+main(int argc, char *argv[])
+{
+ static struct option long_options[] = {
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"echo", no_argument, NULL, 'e'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"all", no_argument, NULL, 'a'},
+ {"table", required_argument, NULL, 't'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"maintenance-db", required_argument, NULL, 2},
+ {NULL, 0, NULL, 0}
+ };
+
+ const char *progname;
+ int optindex;
+ int c;
+ const char *dbname = NULL;
+ const char *maintenance_db = NULL;
+ ConnParams cparams;
+ bool echo = false;
+ bool quiet = false;
+ vacuumingOptions vacopts;
+ SimpleStringList objects = {NULL, NULL};
+ int concurrentCons = 1;
+ int tbl_count = 0;
+
+ /* initialize options */
+ memset(&vacopts, 0, sizeof(vacopts));
+ vacopts.mode = MODE_REPACK;
+
+ /* the same for connection parameters */
+ memset(&cparams, 0, sizeof(cparams));
+ cparams.prompt_password = TRI_DEFAULT;
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ while ((c = getopt_long(argc, argv, "ad:eh:j:n:N:p:qt:U:vwW",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ objfilter |= OBJFILTER_ALL_DBS;
+ break;
+ case 'd':
+ objfilter |= OBJFILTER_DATABASE;
+ dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ echo = true;
+ break;
+ case 'h':
+ cparams.pghost = pg_strdup(optarg);
+ break;
+ case 'j':
+ if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
+ &concurrentCons))
+ exit(1);
+ break;
+ case 'n':
+ objfilter |= OBJFILTER_SCHEMA;
+ simple_string_list_append(&objects, optarg);
+ break;
+ case 'N':
+ objfilter |= OBJFILTER_SCHEMA_EXCLUDE;
+ simple_string_list_append(&objects, optarg);
+ break;
+ case 'p':
+ cparams.pgport = pg_strdup(optarg);
+ break;
+ case 'q':
+ quiet = true;
+ break;
+ case 't':
+ objfilter |= OBJFILTER_TABLE;
+ simple_string_list_append(&objects, optarg);
+ tbl_count++;
+ break;
+ case 'U':
+ cparams.pguser = pg_strdup(optarg);
+ break;
+ case 'v':
+ vacopts.verbose = true;
+ break;
+ case 'w':
+ cparams.prompt_password = TRI_NO;
+ break;
+ case 'W':
+ cparams.prompt_password = TRI_YES;
+ break;
+ case 2:
+ maintenance_db = pg_strdup(optarg);
+ break;
+ default:
+ /* getopt_long already emitted a complaint */
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+ }
+
+ /*
+ * Non-option argument specifies database name as long as it wasn't
+ * already specified with -d / --dbname
+ */
+ if (optind < argc && dbname == NULL)
+ {
+ objfilter |= OBJFILTER_DATABASE;
+ dbname = argv[optind];
+ optind++;
+ }
+
+ if (optind < argc)
+ {
+ pg_log_error("too many command-line arguments (first is \"%s\")",
+ argv[optind]);
+ pg_log_error_hint("Try \"%s --help\" for more information.", progname);
+ exit(1);
+ }
+
+ /*
+ * Validate the combination of filters specified in the command-line
+ * options.
+ */
+ check_objfilter();
+
+ vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+ false, tbl_count, concurrentCons,
+ progname, echo, quiet);
+ exit(0);
+}
+
+/*
+ * Verify that the filters used at command line are compatible.
+ */
+void
+check_objfilter(void)
+{
+ if ((objfilter & OBJFILTER_ALL_DBS) &&
+ (objfilter & OBJFILTER_DATABASE))
+ pg_fatal("cannot repack all databases and a specific one at the same time");
+
+ if ((objfilter & OBJFILTER_TABLE) &&
+ (objfilter & OBJFILTER_SCHEMA))
+ pg_fatal("cannot repack all tables in schema(s) and specific table(s) at the same time");
+
+ if ((objfilter & OBJFILTER_TABLE) &&
+ (objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+ pg_fatal("cannot repack specific table(s) and exclude schema(s) at the same time");
+
+ if ((objfilter & OBJFILTER_SCHEMA) &&
+ (objfilter & OBJFILTER_SCHEMA_EXCLUDE))
+ pg_fatal("cannot repack all tables in schema(s) and exclude schema(s) at the same time");
+}
+
+static void
+help(const char *progname)
+{
+ printf(_("%s repacks a PostgreSQL database.\n\n"), progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
+ printf(_("\nOptions:\n"));
+ printf(_(" -a, --all repack all databases\n"));
+ printf(_(" -d, --dbname=DBNAME database to repack\n"));
+ printf(_(" -e, --echo show the commands being sent to the server\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to repack\n"));
+ printf(_(" -n, --schema=SCHEMA repack tables in the specified schema(s) only\n"));
+ printf(_(" -N, --exclude-schema=SCHEMA do not repack tables in the specified schema(s)\n"));
+ printf(_(" -q, --quiet don't write any messages\n"));
+ printf(_(" -t, --table='TABLE' repack specific table(s) only\n"));
+ printf(_(" -v, --verbose write a lot of output\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port\n"));
+ printf(_(" -U, --username=USERNAME user name to connect as\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt\n"));
+ printf(_(" --maintenance-db=DBNAME alternate maintenance database\n"));
+ printf(_("\nRead the description of the SQL command REPACK for details.\n"));
+ printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+ printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
diff --git a/src/bin/scripts/t/103_repackdb.pl b/src/bin/scripts/t/103_repackdb.pl
new file mode 100644
index 00000000000..51de4d7ab34
--- /dev/null
+++ b/src/bin/scripts/t/103_repackdb.pl
@@ -0,0 +1,24 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('pg_repackdb');
+program_version_ok('pg_repackdb');
+program_options_handling_ok('pg_repackdb');
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+$node->issues_sql_like(
+ [ 'pg_repackdb', 'postgres' ],
+ qr/statement: REPACK.*;/,
+ 'SQL REPACK run');
+
+
+done_testing();
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
index 9be37fcc45a..e07071c38ee 100644
--- a/src/bin/scripts/vacuuming.c
+++ b/src/bin/scripts/vacuuming.c
@@ -1,6 +1,6 @@
/*-------------------------------------------------------------------------
* vacuuming.c
- * Common routines for vacuumdb
+ * Common routines for vacuumdb and pg_repackdb
*
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -166,6 +166,14 @@ vacuum_one_database(ConnParams *cparams,
conn = connectDatabase(cparams, progname, echo, false, true);
+ if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+ {
+ /* XXX arguably, here we should use VACUUM FULL instead of failing */
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+ "REPACK", "19");
+ }
+
if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
{
PQfinish(conn);
@@ -258,9 +266,15 @@ vacuum_one_database(ConnParams *cparams,
if (stage != ANALYZE_NO_STAGE)
printf(_("%s: processing database \"%s\": %s\n"),
progname, PQdb(conn), _(stage_messages[stage]));
- else
+ else if (vacopts->mode == MODE_VACUUM)
printf(_("%s: vacuuming database \"%s\"\n"),
progname, PQdb(conn));
+ else
+ {
+ Assert(vacopts->mode == MODE_REPACK);
+ printf(_("%s: repacking database \"%s\"\n"),
+ progname, PQdb(conn));
+ }
fflush(stdout);
}
@@ -350,7 +364,7 @@ vacuum_one_database(ConnParams *cparams,
* through ParallelSlotsGetIdle.
*/
ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
- run_vacuum_command(free_slot->connection, sql.data,
+ run_vacuum_command(free_slot->connection, vacopts, sql.data,
echo, tabname);
cell = cell->next;
@@ -363,7 +377,7 @@ vacuum_one_database(ConnParams *cparams,
}
/* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
- if (vacopts->skip_database_stats &&
+ if (vacopts->mode == MODE_VACUUM && vacopts->skip_database_stats &&
stage == ANALYZE_NO_STAGE)
{
const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
@@ -376,7 +390,7 @@ vacuum_one_database(ConnParams *cparams,
}
ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
- run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+ run_vacuum_command(free_slot->connection, vacopts, cmd, echo, NULL);
if (!ParallelSlotsWaitCompletion(sa))
failed = true;
@@ -708,6 +722,12 @@ vacuum_all_databases(ConnParams *cparams,
int i;
conn = connectMaintenanceDatabase(cparams, progname, echo);
+ if (vacopts->mode == MODE_REPACK && PQserverVersion(conn) < 190000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" command on server versions older than PostgreSQL %s",
+ "REPACK", "19");
+ }
result = executeQuery(conn,
"SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
echo);
@@ -761,7 +781,7 @@ vacuum_all_databases(ConnParams *cparams,
}
/*
- * Construct a vacuum/analyze command to run based on the given
+ * Construct a vacuum/analyze/repack command to run based on the given
* options, in the given string buffer, which may contain previous garbage.
*
* The table name used must be already properly quoted. The command generated
@@ -777,7 +797,13 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
resetPQExpBuffer(sql);
- if (vacopts->analyze_only)
+ if (vacopts->mode == MODE_REPACK)
+ {
+ appendPQExpBufferStr(sql, "REPACK");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " (VERBOSE)");
+ }
+ else if (vacopts->analyze_only)
{
appendPQExpBufferStr(sql, "ANALYZE");
@@ -938,8 +964,8 @@ prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
* Any errors during command execution are reported to stderr.
*/
void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
- const char *table)
+run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+ const char *sql, bool echo, const char *table)
{
bool status;
@@ -952,13 +978,21 @@ run_vacuum_command(PGconn *conn, const char *sql, bool echo,
{
if (table)
{
- pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
- table, PQdb(conn), PQerrorMessage(conn));
+ if (vacopts->mode == MODE_VACUUM)
+ pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+ table, PQdb(conn), PQerrorMessage(conn));
+ else
+ pg_log_error("repacking of table \"%s\" in database \"%s\" failed: %s",
+ table, PQdb(conn), PQerrorMessage(conn));
}
else
{
- pg_log_error("vacuuming of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
+ if (vacopts->mode == MODE_VACUUM)
+ pg_log_error("vacuuming of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+ else
+ pg_log_error("repacking of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
}
}
}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
index d3f000840fa..154bc9925c0 100644
--- a/src/bin/scripts/vacuuming.h
+++ b/src/bin/scripts/vacuuming.h
@@ -17,6 +17,12 @@
#include "fe_utils/connect_utils.h"
#include "fe_utils/simple_list.h"
+typedef enum
+{
+ MODE_VACUUM,
+ MODE_REPACK
+} RunMode;
+
/* For analyze-in-stages mode */
#define ANALYZE_NO_STAGE -1
#define ANALYZE_NUM_STAGES 3
@@ -24,6 +30,7 @@
/* vacuum options controlled by user flags */
typedef struct vacuumingOptions
{
+ RunMode mode;
bool analyze_only;
bool verbose;
bool and_analyze;
@@ -87,8 +94,8 @@ extern void vacuum_all_databases(ConnParams *cparams,
extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
vacuumingOptions *vacopts, const char *table);
-extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
- const char *table);
+extern void run_vacuum_command(PGconn *conn, vacuumingOptions *vacopts,
+ const char *sql, bool echo, const char *table);
extern char *escape_quotes(const char *src);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 60088a64cbb..890998d84bb 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -24,6 +24,7 @@
#define CLUOPT_RECHECK 0x02 /* recheck relation state */
#define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
* indisclustered */
+#define CLUOPT_ANALYZE 0x08 /* do an ANALYZE */
/* options for CLUSTER */
typedef struct ClusterParams
@@ -31,8 +32,11 @@ typedef struct ClusterParams
bits32 options; /* bitmask of CLUOPT_* */
} ClusterParams;
-extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
-extern void cluster_rel(Relation OldHeap, Oid indexOid, ClusterParams *params);
+
+extern void ExecRepack(ParseState *pstate, RepackStmt *stmt, bool isTopLevel);
+
+extern void cluster_rel(RepackCommand command, bool usingindex,
+ Relation OldHeap, Oid indexOid, ClusterParams *params);
extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
LOCKMODE lockmode);
extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 1cde4bd9bcf..5b6639c114c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -56,24 +56,51 @@
#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS 4
#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE 5
-/* Progress parameters for cluster */
-#define PROGRESS_CLUSTER_COMMAND 0
-#define PROGRESS_CLUSTER_PHASE 1
-#define PROGRESS_CLUSTER_INDEX_RELID 2
-#define PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED 3
-#define PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN 4
-#define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS 5
-#define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED 6
-#define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT 7
+/*
+ * Progress parameters for REPACK.
+ *
+ * Note: Since REPACK shares some code with CLUSTER, these values are also
+ * used by CLUSTER. (CLUSTER is now deprecated, so it makes little sense to
+ * introduce a separate set of constants.)
+ */
+#define PROGRESS_REPACK_COMMAND 0
+#define PROGRESS_REPACK_PHASE 1
+#define PROGRESS_REPACK_INDEX_RELID 2
+#define PROGRESS_REPACK_HEAP_TUPLES_SCANNED 3
+#define PROGRESS_REPACK_HEAP_TUPLES_WRITTEN 4
+#define PROGRESS_REPACK_TOTAL_HEAP_BLKS 5
+#define PROGRESS_REPACK_HEAP_BLKS_SCANNED 6
+#define PROGRESS_REPACK_INDEX_REBUILD_COUNT 7
-/* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
-#define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP 1
-#define PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP 2
-#define PROGRESS_CLUSTER_PHASE_SORT_TUPLES 3
-#define PROGRESS_CLUSTER_PHASE_WRITE_NEW_HEAP 4
-#define PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES 5
-#define PROGRESS_CLUSTER_PHASE_REBUILD_INDEX 6
-#define PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP 7
+/*
+ * Phases of repack (as advertised via PROGRESS_REPACK_PHASE).
+ */
+#define PROGRESS_REPACK_PHASE_SEQ_SCAN_HEAP 1
+#define PROGRESS_REPACK_PHASE_INDEX_SCAN_HEAP 2
+#define PROGRESS_REPACK_PHASE_SORT_TUPLES 3
+#define PROGRESS_REPACK_PHASE_WRITE_NEW_HEAP 4
+#define PROGRESS_REPACK_PHASE_SWAP_REL_FILES 5
+#define PROGRESS_REPACK_PHASE_REBUILD_INDEX 6
+#define PROGRESS_REPACK_PHASE_FINAL_CLEANUP 7
+
+/*
+ * Commands of PROGRESS_REPACK
+ *
+ * Currently we only have one command, so the PROGRESS_REPACK_COMMAND
+ * parameter is not necessary. However it makes cluster.c simpler if we have
+ * the same set of parameters for CLUSTER and REPACK - see the note on REPACK
+ * parameters above.
+ */
+#define PROGRESS_REPACK_COMMAND_REPACK 1
+
+/*
+ * Progress parameters for cluster.
+ *
+ * Although we need to report REPACK and CLUSTER in separate views, the
+ * parameters and phases of CLUSTER are a subset of those of REPACK. Therefore
+ * we just use the appropriate values defined for REPACK above instead of
+ * defining a separate set of constants here.
+ */
/* Commands of PROGRESS_CLUSTER */
#define PROGRESS_CLUSTER_COMMAND_CLUSTER 1
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 86a236bd58b..fcc25a0c592 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3949,16 +3949,26 @@ typedef struct AlterSystemStmt
} AlterSystemStmt;
/* ----------------------
- * Cluster Statement (support pbrown's cluster index implementation)
+ * Repack Statement
* ----------------------
*/
-typedef struct ClusterStmt
+typedef enum RepackCommand
+{
+ REPACK_COMMAND_CLUSTER,
+ REPACK_COMMAND_REPACK,
+ REPACK_COMMAND_VACUUMFULL,
+} RepackCommand;
+
+typedef struct RepackStmt
{
NodeTag type;
- RangeVar *relation; /* relation being indexed, or NULL if all */
- char *indexname; /* original index defined */
+ RepackCommand command; /* type of command being run */
+ RangeVar *relation; /* relation being repacked */
+ char *indexname; /* order tuples by this index */
+ bool usingindex; /* whether USING INDEX is specified */
List *params; /* list of DefElem nodes */
-} ClusterStmt;
+} RepackStmt;
+
/* ----------------------
* Vacuum and Analyze Statements
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index a4af3f717a1..22559369e2c 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -374,6 +374,7 @@ PG_KEYWORD("reindex", REINDEX, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("relative", RELATIVE_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("release", RELEASE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("rename", RENAME, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("repack", REPACK, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index d250a714d59..cceb312f2b3 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -196,6 +196,7 @@ PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
+PG_CMDTAG(CMDTAG_REPACK, "REPACK", false, false, false)
PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
PG_CMDTAG(CMDTAG_REVOKE_ROLE, "REVOKE ROLE", false, false, false)
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index dda813ab407..e69e366dcdc 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -28,6 +28,7 @@ typedef enum ProgressCommandType
PROGRESS_COMMAND_CREATE_INDEX,
PROGRESS_COMMAND_BASEBACKUP,
PROGRESS_COMMAND_COPY,
+ PROGRESS_COMMAND_REPACK,
} ProgressCommandType;
#define PGSTAT_NUM_PROGRESS_PARAM 20
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 4d40a6809ab..5256628b51d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -254,6 +254,63 @@ ORDER BY 1;
clstr_tst_pkey
(3 rows)
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+ a | b | c | substring | length
+----+-----+------------------+--------------------------------+--------
+ 10 | 14 | catorce | |
+ 18 | 5 | cinco | |
+ 9 | 4 | cuatro | |
+ 26 | 19 | diecinueve | |
+ 12 | 18 | dieciocho | |
+ 30 | 16 | dieciseis | |
+ 24 | 17 | diecisiete | |
+ 2 | 10 | diez | |
+ 23 | 12 | doce | |
+ 11 | 2 | dos | |
+ 25 | 9 | nueve | |
+ 31 | 8 | ocho | |
+ 1 | 11 | once | |
+ 28 | 15 | quince | |
+ 32 | 6 | seis | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzy | 500000
+ 29 | 7 | siete | |
+ 15 | 13 | trece | |
+ 22 | 30 | treinta | |
+ 17 | 32 | treinta y dos | |
+ 3 | 31 | treinta y uno | |
+ 5 | 3 | tres | |
+ 20 | 1 | uno | |
+ 6 | 20 | veinte | |
+ 14 | 25 | veinticinco | |
+ 21 | 24 | veinticuatro | |
+ 4 | 22 | veintidos | |
+ 19 | 29 | veintinueve | |
+ 16 | 28 | veintiocho | |
+ 27 | 26 | veintiseis | |
+ 13 | 27 | veintisiete | |
+ 7 | 23 | veintitres | |
+ 8 | 21 | veintiuno | |
+ 0 | 100 | in child table | |
+ 0 | 100 | in child table 2 | |
+(34 rows)
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+ERROR: insert or update on table "clstr_tst" violates foreign key constraint "clstr_tst_con"
+DETAIL: Key (b)=(1111) is not present in table "clstr_tst_s".
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
+ conname
+----------------------
+ clstr_tst_a_not_null
+ clstr_tst_con
+ clstr_tst_pkey
+(3 rows)
+
SELECT relname, relkind,
EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
FROM pg_class c WHERE relname LIKE 'clstr_tst%' ORDER BY relname;
@@ -381,6 +438,35 @@ SELECT * FROM clstr_1;
2
(2 rows)
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR; -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+ relname
+---------
+ clstr_1
+ clstr_3
+(2 rows)
+
+SET SESSION AUTHORIZATION regress_clstr_user;
-- Test MVCC-safety of cluster. There isn't much we can do to verify the
-- results with a single backend...
CREATE TABLE clustertest (key int PRIMARY KEY);
@@ -495,6 +581,43 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
ERROR: cannot mark index clustered in partitioned table
ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
ERROR: cannot mark index clustered in partitioned table
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+ relname | level | relkind | ?column?
+-------------+-------+---------+----------
+ clstrpart | 0 | p | t
+ clstrpart1 | 1 | p | t
+ clstrpart11 | 2 | r | f
+ clstrpart12 | 2 | p | t
+ clstrpart2 | 1 | r | f
+ clstrpart3 | 1 | p | t
+ clstrpart33 | 2 | r | f
+(7 rows)
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+ relname | level | relkind | ?column?
+-------------+-------+---------+----------
+ clstrpart | 0 | p | t
+ clstrpart1 | 1 | p | t
+ clstrpart11 | 2 | r | f
+ clstrpart12 | 2 | p | t
+ clstrpart2 | 1 | r | f
+ clstrpart3 | 1 | p | t
+ clstrpart33 | 2 | r | f
+(7 rows)
+
DROP TABLE clstrpart;
-- Ownership of partitions is checked
CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
@@ -513,7 +636,7 @@ CREATE TEMP TABLE ptnowner_oldnodes AS
JOIN pg_class AS c ON c.oid=tree.relid;
SET SESSION AUTHORIZATION regress_ptnowner;
CLUSTER ptnowner USING ptnowner_i_idx;
-WARNING: permission denied to cluster "ptnowner2", skipping it
+WARNING: permission denied to execute CLUSTER on "ptnowner2", skipping it
RESET SESSION AUTHORIZATION;
SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a
JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 35e8aad7701..3a1d1d28282 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2071,6 +2071,29 @@ pg_stat_progress_create_index| SELECT s.pid,
s.param15 AS partitions_done
FROM (pg_stat_get_progress_info('CREATE INDEX'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
LEFT JOIN pg_database d ON ((s.datid = d.oid)));
+pg_stat_progress_repack| SELECT s.pid,
+ s.datid,
+ d.datname,
+ s.relid,
+ CASE s.param2
+ WHEN 0 THEN 'initializing'::text
+ WHEN 1 THEN 'seq scanning heap'::text
+ WHEN 2 THEN 'index scanning heap'::text
+ WHEN 3 THEN 'sorting tuples'::text
+ WHEN 4 THEN 'writing new heap'::text
+ WHEN 5 THEN 'swapping relation files'::text
+ WHEN 6 THEN 'rebuilding index'::text
+ WHEN 7 THEN 'performing final cleanup'::text
+ ELSE NULL::text
+ END AS phase,
+ (s.param3)::oid AS repack_index_relid,
+ s.param4 AS heap_tuples_scanned,
+ s.param5 AS heap_tuples_written,
+ s.param6 AS heap_blks_total,
+ s.param7 AS heap_blks_scanned,
+ s.param8 AS index_rebuild_count
+ FROM (pg_stat_get_progress_info('REPACK'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+ LEFT JOIN pg_database d ON ((s.datid = d.oid)));
pg_stat_progress_vacuum| SELECT s.pid,
s.datid,
d.datname,
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b7115f86104..cfcc3dc9761 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -76,6 +76,19 @@ INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
ORDER BY 1;
+-- REPACK handles individual tables identically to CLUSTER, but it's worth
+-- checking if it handles table hierarchies identically as well.
+REPACK clstr_tst USING INDEX clstr_tst_c;
+
+-- Verify that inheritance link still works
+INSERT INTO clstr_tst_inh VALUES (0, 100, 'in child table 2');
+SELECT a,b,c,substring(d for 30), length(d) from clstr_tst;
+
+-- Verify that foreign key link still works
+INSERT INTO clstr_tst (b, c) VALUES (1111, 'this should fail');
+
+SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass
+ORDER BY 1;
SELECT relname, relkind,
EXISTS(SELECT 1 FROM pg_class WHERE oid = c.reltoastrelid) AS hastoast
@@ -159,6 +172,34 @@ INSERT INTO clstr_1 VALUES (1);
CLUSTER clstr_1;
SELECT * FROM clstr_1;
+-- REPACK w/o argument performs no ordering, so we can only check which tables
+-- have the relfilenode changed.
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_old AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+SET client_min_messages = ERROR; -- order of "skipping" warnings may vary
+REPACK;
+RESET client_min_messages;
+
+RESET SESSION AUTHORIZATION;
+CREATE TEMP TABLE relnodes_new AS
+(SELECT relname, relfilenode
+FROM pg_class
+WHERE relname IN ('clstr_1', 'clstr_2', 'clstr_3'));
+
+-- Do the actual comparison. Unlike CLUSTER, clstr_3 should have been
+-- processed because there is nothing like clustering index here.
+SELECT o.relname FROM relnodes_old o
+JOIN relnodes_new n ON o.relname = n.relname
+WHERE o.relfilenode <> n.relfilenode
+ORDER BY o.relname;
+
+SET SESSION AUTHORIZATION regress_clstr_user;
+
-- Test MVCC-safety of cluster. There isn't much we can do to verify the
-- results with a single backend...
@@ -229,6 +270,24 @@ SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM o
CLUSTER clstrpart;
ALTER TABLE clstrpart SET WITHOUT CLUSTER;
ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+
+-- Check that REPACK sets new relfilenodes: it should process exactly the same
+-- tables as CLUSTER did.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart USING INDEX clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
+-- And finally the same for REPACK w/o index.
+DROP TABLE old_cluster_info;
+DROP TABLE new_cluster_info;
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+REPACK clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+
DROP TABLE clstrpart;
-- Ownership of partitions is checked
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..98242e25432 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2537,6 +2537,8 @@ ReorderBufferTupleCidKey
ReorderBufferUpdateProgressTxnCB
ReorderTuple
RepOriginId
+RepackCommand
+RepackStmt
ReparameterizeForeignPathByChild_function
ReplaceVarsFromTargetList_context
ReplaceVarsNoMatchOption
@@ -2603,6 +2605,7 @@ RtlNtStatusToDosError_t
RuleInfo
RuleLock
RuleStmt
+RunMode
RunningTransactions
RunningTransactionsData
SASLStatus
--
2.43.0
[application/octet-stream] v21-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patch (5.4K, 6-v21-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patch)
download | inline diff:
From b9384aa62c96c94d45bb7e97a56acda5590f0c5f Mon Sep 17 00:00:00 2001
From: Antonin Houska <[email protected]>
Date: Mon, 11 Aug 2025 15:23:05 +0200
Subject: [PATCH v21 4/6] Move conversion of a "historic" to MVCC snapshot to a
separate function.
The conversion is now handled by SnapBuildMVCCFromHistoric(). REPACK
CONCURRENTLY will also need it.
---
src/backend/replication/logical/snapbuild.c | 51 +++++++++++++++++----
src/backend/utils/time/snapmgr.c | 3 +-
src/include/replication/snapbuild.h | 1 +
src/include/utils/snapmgr.h | 1 +
4 files changed, 45 insertions(+), 11 deletions(-)
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..a2f1803622c 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -440,10 +440,7 @@ Snapshot
SnapBuildInitialSnapshot(SnapBuild *builder)
{
Snapshot snap;
- TransactionId xid;
TransactionId safeXid;
- TransactionId *newxip;
- int newxcnt = 0;
Assert(XactIsoLevel == XACT_REPEATABLE_READ);
Assert(builder->building_full_snapshot);
@@ -485,6 +482,31 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
MyProc->xmin = snap->xmin;
+ /* Convert the historic snapshot to MVCC snapshot. */
+ return SnapBuildMVCCFromHistoric(snap, true);
+}
+
+/*
+ * Turn a historic MVCC snapshot into an ordinary MVCC snapshot.
+ *
+ * Unlike a regular (non-historic) MVCC snapshot, the xip array of this
+ * snapshot contains not only running main transactions, but also their
+ * subtransactions. This difference does has no impact on XidInMVCCSnapshot().
+ *
+ * Pass true for 'in_place' if you don't care about modifying the source
+ * snapshot. If you need a new instance, and one that was allocated as a
+ * single chunk of memory, pass false.
+ */
+Snapshot
+SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place)
+{
+ TransactionId xid;
+ TransactionId *oldxip = snapshot->xip;
+ uint32 oldxcnt = snapshot->xcnt;
+ TransactionId *newxip;
+ int newxcnt = 0;
+ Snapshot result;
+
/* allocate in transaction context */
newxip = (TransactionId *)
palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
@@ -495,7 +517,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* classical snapshot by marking all non-committed transactions as
* in-progress. This can be expensive.
*/
- for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
+ for (xid = snapshot->xmin; NormalTransactionIdPrecedes(xid, snapshot->xmax);)
{
void *test;
@@ -503,7 +525,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
* Check whether transaction committed using the decoding snapshot
* meaning of ->xip.
*/
- test = bsearch(&xid, snap->xip, snap->xcnt,
+ test = bsearch(&xid, snapshot->xip, snapshot->xcnt,
sizeof(TransactionId), xidComparator);
if (test == NULL)
@@ -520,11 +542,22 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
}
/* adjust remaining snapshot fields as needed */
- snap->snapshot_type = SNAPSHOT_MVCC;
- snap->xcnt = newxcnt;
- snap->xip = newxip;
+ snapshot->xcnt = newxcnt;
+ snapshot->xip = newxip;
- return snap;
+ if (in_place)
+ result = snapshot;
+ else
+ {
+ result = CopySnapshot(snapshot);
+
+ /* Restore the original values so the source is intact. */
+ snapshot->xip = oldxip;
+ snapshot->xcnt = oldxcnt;
+ }
+ result->snapshot_type = SNAPSHOT_MVCC;
+
+ return result;
}
/*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 65561cc6bc3..bc7840052fe 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -212,7 +212,6 @@ typedef struct ExportedSnapshot
static List *exportedSnapshots = NIL;
/* Prototypes for local functions */
-static Snapshot CopySnapshot(Snapshot snapshot);
static void UnregisterSnapshotNoOwner(Snapshot snapshot);
static void FreeSnapshot(Snapshot snapshot);
static void SnapshotResetXmin(void);
@@ -602,7 +601,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
* The copy is palloc'd in TopTransactionContext and has initial refcounts set
* to 0. The returned snapshot has the copied flag set.
*/
-static Snapshot
+Snapshot
CopySnapshot(Snapshot snapshot)
{
Snapshot newsnap;
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 44031dcf6e3..6d4d2d1814c 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -73,6 +73,7 @@ extern void FreeSnapshotBuilder(SnapBuild *builder);
extern void SnapBuildSnapDecRefcount(Snapshot snap);
extern Snapshot SnapBuildInitialSnapshot(SnapBuild *builder);
+extern Snapshot SnapBuildMVCCFromHistoric(Snapshot snapshot, bool in_place);
extern const char *SnapBuildExportSnapshot(SnapBuild *builder);
extern void SnapBuildClearExportedSnapshot(void);
extern void SnapBuildResetExportedSnapshotState(void);
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 604c1f90216..f65f83c85cd 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -63,6 +63,7 @@ extern Snapshot GetTransactionSnapshot(void);
extern Snapshot GetLatestSnapshot(void);
extern void SnapshotSetCommandId(CommandId curcid);
+extern Snapshot CopySnapshot(Snapshot snapshot);
extern Snapshot GetCatalogSnapshot(Oid relid);
extern Snapshot GetNonHistoricCatalogSnapshot(Oid relid);
extern void InvalidateCatalogSnapshot(void);
--
2.43.0
[application/octet-stream] v21-0001-Split-vacuumdb-to-create-vacuuming.c-h.patch (69.3K, 7-v21-0001-Split-vacuumdb-to-create-vacuuming.c-h.patch)
download | inline diff:
From 2206b215a8855cf8a9c29889f5feab4a0bd8a7e0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <[email protected]>
Date: Sat, 30 Aug 2025 14:39:49 +0200
Subject: [PATCH v21 1/6] Split vacuumdb to create vacuuming.c/h
---
src/bin/scripts/Makefile | 4 +-
src/bin/scripts/meson.build | 28 +-
src/bin/scripts/vacuumdb.c | 1048 +----------------------------------
src/bin/scripts/vacuuming.c | 978 ++++++++++++++++++++++++++++++++
src/bin/scripts/vacuuming.h | 95 ++++
5 files changed, 1119 insertions(+), 1034 deletions(-)
create mode 100644 src/bin/scripts/vacuuming.c
create mode 100644 src/bin/scripts/vacuuming.h
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index f6b4d40810b..019ca06455d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,7 +28,7 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o vacuuming.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
@@ -50,7 +50,7 @@ uninstall:
clean distclean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
- rm -f common.o $(WIN32RES)
+ rm -f common.o vacuuming.o $(WIN32RES)
rm -rf tmp_check
export with_icu
diff --git a/src/bin/scripts/meson.build b/src/bin/scripts/meson.build
index 80df7c33257..a4fed59d1c9 100644
--- a/src/bin/scripts/meson.build
+++ b/src/bin/scripts/meson.build
@@ -12,7 +12,6 @@ binaries = [
'createuser',
'dropuser',
'clusterdb',
- 'vacuumdb',
'reindexdb',
'pg_isready',
]
@@ -35,6 +34,33 @@ foreach binary : binaries
bin_targets += binary
endforeach
+vacuuming_common = static_library('libvacuuming_common',
+ files('common.c', 'vacuuming.c'),
+ dependencies: [frontend_code, libpq],
+ kwargs: internal_lib_args,
+)
+
+binaries = [
+ 'vacuumdb',
+]
+foreach binary : binaries
+ binary_sources = files('@[email protected]'.format(binary))
+
+ if host_system == 'windows'
+ binary_sources += rc_bin_gen.process(win32ver_rc, extra_args: [
+ '--NAME', binary,
+ '--FILEDESC', '@0@ - PostgreSQL utility'.format(binary),])
+ endif
+
+ binary = executable(binary,
+ binary_sources,
+ link_with: [vacuuming_common],
+ dependencies: [frontend_code, libpq],
+ kwargs: default_bin_args,
+ )
+ bin_targets += binary
+endforeach
+
tests += {
'name': 'scripts',
'sd': meson.current_source_dir(),
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index fd236087e90..b1be61ddf25 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -14,92 +14,13 @@
#include <limits.h>
-#include "catalog/pg_attribute_d.h"
-#include "catalog/pg_class_d.h"
#include "common.h"
-#include "common/connect.h"
#include "common/logging.h"
-#include "fe_utils/cancel.h"
#include "fe_utils/option_utils.h"
-#include "fe_utils/parallel_slot.h"
-#include "fe_utils/query_utils.h"
-#include "fe_utils/simple_list.h"
-#include "fe_utils/string_utils.h"
-
-
-/* vacuum options controlled by user flags */
-typedef struct vacuumingOptions
-{
- bool analyze_only;
- bool verbose;
- bool and_analyze;
- bool full;
- bool freeze;
- bool disable_page_skipping;
- bool skip_locked;
- int min_xid_age;
- int min_mxid_age;
- int parallel_workers; /* >= 0 indicates user specified the
- * parallel degree, otherwise -1 */
- bool no_index_cleanup;
- bool force_index_cleanup;
- bool do_truncate;
- bool process_main;
- bool process_toast;
- bool skip_database_stats;
- char *buffer_usage_limit;
- bool missing_stats_only;
-} vacuumingOptions;
-
-/* object filter options */
-typedef enum
-{
- OBJFILTER_NONE = 0, /* no filter used */
- OBJFILTER_ALL_DBS = (1 << 0), /* -a | --all */
- OBJFILTER_DATABASE = (1 << 1), /* -d | --dbname */
- OBJFILTER_TABLE = (1 << 2), /* -t | --table */
- OBJFILTER_SCHEMA = (1 << 3), /* -n | --schema */
- OBJFILTER_SCHEMA_EXCLUDE = (1 << 4), /* -N | --exclude-schema */
-} VacObjFilter;
-
-static VacObjFilter objfilter = OBJFILTER_NONE;
-
-static SimpleStringList *retrieve_objects(PGconn *conn,
- vacuumingOptions *vacopts,
- SimpleStringList *objects,
- bool echo);
-
-static void vacuum_one_database(ConnParams *cparams,
- vacuumingOptions *vacopts,
- int stage,
- SimpleStringList *objects,
- SimpleStringList **found_objs,
- int concurrentCons,
- const char *progname, bool echo, bool quiet);
-
-static void vacuum_all_databases(ConnParams *cparams,
- vacuumingOptions *vacopts,
- bool analyze_in_stages,
- SimpleStringList *objects,
- int concurrentCons,
- const char *progname, bool echo, bool quiet);
-
-static void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
- vacuumingOptions *vacopts, const char *table);
-
-static void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
- const char *table);
+#include "vacuuming.h"
static void help(const char *progname);
-
-void check_objfilter(void);
-
-static char *escape_quotes(const char *src);
-
-/* For analyze-in-stages mode */
-#define ANALYZE_NO_STAGE -1
-#define ANALYZE_NUM_STAGES 3
-
+static void check_objfilter(void);
int
main(int argc, char *argv[])
@@ -145,10 +66,6 @@ main(int argc, char *argv[])
int c;
const char *dbname = NULL;
const char *maintenance_db = NULL;
- char *host = NULL;
- char *port = NULL;
- char *username = NULL;
- enum trivalue prompt_password = TRI_DEFAULT;
ConnParams cparams;
bool echo = false;
bool quiet = false;
@@ -168,13 +85,18 @@ main(int argc, char *argv[])
vacopts.process_main = true;
vacopts.process_toast = true;
+ /* the same for connection parameters */
+ memset(&cparams, 0, sizeof(cparams));
+ cparams.prompt_password = TRI_DEFAULT;
+
pg_logging_init(argv[0]);
progname = get_progname(argv[0]);
set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pgscripts"));
- handle_help_version_opts(argc, argv, "vacuumdb", help);
+ handle_help_version_opts(argc, argv, progname, help);
- while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "ad:efFh:j:n:N:p:P:qt:U:vwWzZ",
+ long_options, &optindex)) != -1)
{
switch (c)
{
@@ -195,7 +117,7 @@ main(int argc, char *argv[])
vacopts.freeze = true;
break;
case 'h':
- host = pg_strdup(optarg);
+ cparams.pghost = pg_strdup(optarg);
break;
case 'j':
if (!option_parse_int(optarg, "-j/--jobs", 1, INT_MAX,
@@ -211,7 +133,7 @@ main(int argc, char *argv[])
simple_string_list_append(&objects, optarg);
break;
case 'p':
- port = pg_strdup(optarg);
+ cparams.pgport = pg_strdup(optarg);
break;
case 'P':
if (!option_parse_int(optarg, "-P/--parallel", 0, INT_MAX,
@@ -227,16 +149,16 @@ main(int argc, char *argv[])
tbl_count++;
break;
case 'U':
- username = pg_strdup(optarg);
+ cparams.pguser = pg_strdup(optarg);
break;
case 'v':
vacopts.verbose = true;
break;
case 'w':
- prompt_password = TRI_NO;
+ cparams.prompt_password = TRI_NO;
break;
case 'W':
- prompt_password = TRI_YES;
+ cparams.prompt_password = TRI_YES;
break;
case 'z':
vacopts.and_analyze = true;
@@ -380,66 +302,9 @@ main(int argc, char *argv[])
pg_fatal("cannot use the \"%s\" option without \"%s\" or \"%s\"",
"missing-stats-only", "analyze-only", "analyze-in-stages");
- /* fill cparams except for dbname, which is set below */
- cparams.pghost = host;
- cparams.pgport = port;
- cparams.pguser = username;
- cparams.prompt_password = prompt_password;
- cparams.override_dbname = NULL;
-
- setup_cancel_handler(NULL);
-
- /* Avoid opening extra connections. */
- if (tbl_count && (concurrentCons > tbl_count))
- concurrentCons = tbl_count;
-
- if (objfilter & OBJFILTER_ALL_DBS)
- {
- cparams.dbname = maintenance_db;
-
- vacuum_all_databases(&cparams, &vacopts,
- analyze_in_stages,
- &objects,
- concurrentCons,
- progname, echo, quiet);
- }
- else
- {
- if (dbname == NULL)
- {
- if (getenv("PGDATABASE"))
- dbname = getenv("PGDATABASE");
- else if (getenv("PGUSER"))
- dbname = getenv("PGUSER");
- else
- dbname = get_user_name_or_exit(progname);
- }
-
- cparams.dbname = dbname;
-
- if (analyze_in_stages)
- {
- int stage;
- SimpleStringList *found_objs = NULL;
-
- for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
- {
- vacuum_one_database(&cparams, &vacopts,
- stage,
- &objects,
- vacopts.missing_stats_only ? &found_objs : NULL,
- concurrentCons,
- progname, echo, quiet);
- }
- }
- else
- vacuum_one_database(&cparams, &vacopts,
- ANALYZE_NO_STAGE,
- &objects, NULL,
- concurrentCons,
- progname, echo, quiet);
- }
-
+ vacuuming_main(&cparams, dbname, maintenance_db, &vacopts, &objects,
+ analyze_in_stages, tbl_count, concurrentCons,
+ progname, echo, quiet);
exit(0);
}
@@ -466,885 +331,6 @@ check_objfilter(void)
pg_fatal("cannot vacuum all tables in schema(s) and exclude schema(s) at the same time");
}
-/*
- * Returns a newly malloc'd version of 'src' with escaped single quotes and
- * backslashes.
- */
-static char *
-escape_quotes(const char *src)
-{
- char *result = escape_single_quotes_ascii(src);
-
- if (!result)
- pg_fatal("out of memory");
- return result;
-}
-
-/*
- * vacuum_one_database
- *
- * Process tables in the given database.
- *
- * There are two ways to specify the list of objects to process:
- *
- * 1) The "found_objs" parameter is a double pointer to a fully qualified list
- * of objects to process, as returned by a previous call to
- * vacuum_one_database().
- *
- * a) If both "found_objs" (the double pointer) and "*found_objs" (the
- * once-dereferenced double pointer) are not NULL, this list takes
- * priority, and anything specified in "objects" is ignored.
- *
- * b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
- * (the once-dereferenced double pointer) _is_ NULL, the "objects"
- * parameter takes priority, and the results of the catalog query
- * described in (2) are stored in "found_objs".
- *
- * c) If "found_objs" (the double pointer) is NULL, the "objects"
- * parameter again takes priority, and the results of the catalog query
- * are not saved.
- *
- * 2) The "objects" parameter is a user-specified list of objects to process.
- * When (1b) or (1c) applies, this function performs a catalog query to
- * retrieve a fully qualified list of objects to process, as described
- * below.
- *
- * a) If "objects" is not NULL, the catalog query gathers only the objects
- * listed in "objects".
- *
- * b) If "objects" is NULL, all tables in the database are gathered.
- *
- * Note that this function is only concerned with running exactly one stage
- * when in analyze-in-stages mode; caller must iterate on us if necessary.
- *
- * If concurrentCons is > 1, multiple connections are used to vacuum tables
- * in parallel.
- */
-static void
-vacuum_one_database(ConnParams *cparams,
- vacuumingOptions *vacopts,
- int stage,
- SimpleStringList *objects,
- SimpleStringList **found_objs,
- int concurrentCons,
- const char *progname, bool echo, bool quiet)
-{
- PQExpBufferData sql;
- PGconn *conn;
- SimpleStringListCell *cell;
- ParallelSlotArray *sa;
- int ntups = 0;
- bool failed = false;
- const char *initcmd;
- SimpleStringList *ret = NULL;
- const char *stage_commands[] = {
- "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
- "SET default_statistics_target=10; RESET vacuum_cost_delay;",
- "RESET default_statistics_target;"
- };
- const char *stage_messages[] = {
- gettext_noop("Generating minimal optimizer statistics (1 target)"),
- gettext_noop("Generating medium optimizer statistics (10 targets)"),
- gettext_noop("Generating default (full) optimizer statistics")
- };
-
- Assert(stage == ANALYZE_NO_STAGE ||
- (stage >= 0 && stage < ANALYZE_NUM_STAGES));
-
- conn = connectDatabase(cparams, progname, echo, false, true);
-
- if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "disable-page-skipping", "9.6");
- }
-
- if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "no-index-cleanup", "12");
- }
-
- if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "force-index-cleanup", "12");
- }
-
- if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "no-truncate", "12");
- }
-
- if (!vacopts->process_main && PQserverVersion(conn) < 160000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "no-process-main", "16");
- }
-
- if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "no-process-toast", "14");
- }
-
- if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "skip-locked", "12");
- }
-
- if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "--min-xid-age", "9.6");
- }
-
- if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "--min-mxid-age", "9.6");
- }
-
- if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "--parallel", "13");
- }
-
- if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "--buffer-usage-limit", "16");
- }
-
- if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
- {
- PQfinish(conn);
- pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
- "--missing-stats-only", "15");
- }
-
- /* skip_database_stats is used automatically if server supports it */
- vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
-
- if (!quiet)
- {
- if (stage != ANALYZE_NO_STAGE)
- printf(_("%s: processing database \"%s\": %s\n"),
- progname, PQdb(conn), _(stage_messages[stage]));
- else
- printf(_("%s: vacuuming database \"%s\"\n"),
- progname, PQdb(conn));
- fflush(stdout);
- }
-
- /*
- * If the caller provided the results of a previous catalog query, just
- * use that. Otherwise, run the catalog query ourselves and set the
- * return variable if provided.
- */
- if (found_objs && *found_objs)
- ret = *found_objs;
- else
- {
- ret = retrieve_objects(conn, vacopts, objects, echo);
- if (found_objs)
- *found_objs = ret;
- }
-
- /*
- * Count the number of objects in the catalog query result. If there are
- * none, we are done.
- */
- for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
- ntups++;
-
- if (ntups == 0)
- {
- PQfinish(conn);
- return;
- }
-
- /*
- * Ensure concurrentCons is sane. If there are more connections than
- * vacuumable relations, we don't need to use them all.
- */
- if (concurrentCons > ntups)
- concurrentCons = ntups;
- if (concurrentCons <= 0)
- concurrentCons = 1;
-
- /*
- * All slots need to be prepared to run the appropriate analyze stage, if
- * caller requested that mode. We have to prepare the initial connection
- * ourselves before setting up the slots.
- */
- if (stage == ANALYZE_NO_STAGE)
- initcmd = NULL;
- else
- {
- initcmd = stage_commands[stage];
- executeCommand(conn, initcmd, echo);
- }
-
- /*
- * Setup the database connections. We reuse the connection we already have
- * for the first slot. If not in parallel mode, the first slot in the
- * array contains the connection.
- */
- sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
- ParallelSlotsAdoptConn(sa, conn);
-
- initPQExpBuffer(&sql);
-
- cell = ret->head;
- do
- {
- const char *tabname = cell->val;
- ParallelSlot *free_slot;
-
- if (CancelRequested)
- {
- failed = true;
- goto finish;
- }
-
- free_slot = ParallelSlotsGetIdle(sa, NULL);
- if (!free_slot)
- {
- failed = true;
- goto finish;
- }
-
- prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
- vacopts, tabname);
-
- /*
- * Execute the vacuum. All errors are handled in processQueryResult
- * through ParallelSlotsGetIdle.
- */
- ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
- run_vacuum_command(free_slot->connection, sql.data,
- echo, tabname);
-
- cell = cell->next;
- } while (cell != NULL);
-
- if (!ParallelSlotsWaitCompletion(sa))
- {
- failed = true;
- goto finish;
- }
-
- /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
- if (vacopts->skip_database_stats && stage == ANALYZE_NO_STAGE)
- {
- const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
- ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
-
- if (!free_slot)
- {
- failed = true;
- goto finish;
- }
-
- ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
- run_vacuum_command(free_slot->connection, cmd, echo, NULL);
-
- if (!ParallelSlotsWaitCompletion(sa))
- failed = true;
- }
-
-finish:
- ParallelSlotsTerminate(sa);
- pg_free(sa);
-
- termPQExpBuffer(&sql);
-
- if (failed)
- exit(1);
-}
-
-/*
- * Prepare the list of tables to process by querying the catalogs.
- *
- * Since we execute the constructed query with the default search_path (which
- * could be unsafe), everything in this query MUST be fully qualified.
- *
- * First, build a WITH clause for the catalog query if any tables were
- * specified, with a set of values made of relation names and their optional
- * set of columns. This is used to match any provided column lists with the
- * generated qualified identifiers and to filter for the tables provided via
- * --table. If a listed table does not exist, the catalog query will fail.
- */
-static SimpleStringList *
-retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
- SimpleStringList *objects, bool echo)
-{
- PQExpBufferData buf;
- PQExpBufferData catalog_query;
- PGresult *res;
- SimpleStringListCell *cell;
- SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
- bool objects_listed = false;
-
- initPQExpBuffer(&catalog_query);
- for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
- {
- char *just_table = NULL;
- const char *just_columns = NULL;
-
- if (!objects_listed)
- {
- appendPQExpBufferStr(&catalog_query,
- "WITH listed_objects (object_oid, column_list) "
- "AS (\n VALUES (");
- objects_listed = true;
- }
- else
- appendPQExpBufferStr(&catalog_query, ",\n (");
-
- if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
- {
- appendStringLiteralConn(&catalog_query, cell->val, conn);
- appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
- }
-
- if (objfilter & OBJFILTER_TABLE)
- {
- /*
- * Split relation and column names given by the user, this is used
- * to feed the CTE with values on which are performed pre-run
- * validity checks as well. For now these happen only on the
- * relation name.
- */
- splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
- &just_table, &just_columns);
-
- appendStringLiteralConn(&catalog_query, just_table, conn);
- appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
- }
-
- if (just_columns && just_columns[0] != '\0')
- appendStringLiteralConn(&catalog_query, just_columns, conn);
- else
- appendPQExpBufferStr(&catalog_query, "NULL");
-
- appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
-
- pg_free(just_table);
- }
-
- /* Finish formatting the CTE */
- if (objects_listed)
- appendPQExpBufferStr(&catalog_query, "\n)\n");
-
- appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
-
- if (objects_listed)
- appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
-
- appendPQExpBufferStr(&catalog_query,
- " FROM pg_catalog.pg_class c\n"
- " JOIN pg_catalog.pg_namespace ns"
- " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
- " CROSS JOIN LATERAL (SELECT c.relkind IN ("
- CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
- CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
- " LEFT JOIN pg_catalog.pg_class t"
- " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
-
- /*
- * Used to match the tables or schemas listed by the user, completing the
- * JOIN clause.
- */
- if (objects_listed)
- {
- appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
- " ON listed_objects.object_oid"
- " OPERATOR(pg_catalog.=) ");
-
- if (objfilter & OBJFILTER_TABLE)
- appendPQExpBufferStr(&catalog_query, "c.oid\n");
- else
- appendPQExpBufferStr(&catalog_query, "ns.oid\n");
- }
-
- /*
- * Exclude temporary tables, beginning the WHERE clause.
- */
- appendPQExpBufferStr(&catalog_query,
- " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
- CppAsString2(RELPERSISTENCE_TEMP) "\n");
-
- /*
- * Used to match the tables or schemas listed by the user, for the WHERE
- * clause.
- */
- if (objects_listed)
- {
- if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
- appendPQExpBufferStr(&catalog_query,
- " AND listed_objects.object_oid IS NULL\n");
- else
- appendPQExpBufferStr(&catalog_query,
- " AND listed_objects.object_oid IS NOT NULL\n");
- }
-
- /*
- * If no tables were listed, filter for the relevant relation types. If
- * tables were given via --table, don't bother filtering by relation type.
- * Instead, let the server decide whether a given relation can be
- * processed in which case the user will know about it.
- */
- if ((objfilter & OBJFILTER_TABLE) == 0)
- {
- /*
- * vacuumdb should generally follow the behavior of the underlying
- * VACUUM and ANALYZE commands. If analyze_only is true, process
- * regular tables, materialized views, and partitioned tables, just
- * like ANALYZE (with no specific target tables) does. Otherwise,
- * process only regular tables and materialized views, since VACUUM
- * skips partitioned tables when no target tables are specified.
- */
- if (vacopts->analyze_only)
- appendPQExpBufferStr(&catalog_query,
- " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
- CppAsString2(RELKIND_RELATION) ", "
- CppAsString2(RELKIND_MATVIEW) ", "
- CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
- else
- appendPQExpBufferStr(&catalog_query,
- " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
- CppAsString2(RELKIND_RELATION) ", "
- CppAsString2(RELKIND_MATVIEW) "])\n");
-
- }
-
- /*
- * For --min-xid-age and --min-mxid-age, the age of the relation is the
- * greatest of the ages of the main relation and its associated TOAST
- * table. The commands generated by vacuumdb will also process the TOAST
- * table for the relation if necessary, so it does not need to be
- * considered separately.
- */
- if (vacopts->min_xid_age != 0)
- {
- appendPQExpBuffer(&catalog_query,
- " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
- " pg_catalog.age(t.relfrozenxid)) "
- " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
- " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
- " '0'::pg_catalog.xid\n",
- vacopts->min_xid_age);
- }
-
- if (vacopts->min_mxid_age != 0)
- {
- appendPQExpBuffer(&catalog_query,
- " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
- " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
- " '%d'::pg_catalog.int4\n"
- " AND c.relminmxid OPERATOR(pg_catalog.!=)"
- " '0'::pg_catalog.xid\n",
- vacopts->min_mxid_age);
- }
-
- if (vacopts->missing_stats_only)
- {
- appendPQExpBufferStr(&catalog_query, " AND (\n");
-
- /* regular stats */
- appendPQExpBufferStr(&catalog_query,
- " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
- " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
- " AND NOT a.attisdropped\n"
- " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
- " AND a.attgenerated OPERATOR(pg_catalog.<>) "
- CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
- " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
- " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
- " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
- " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
- /* extended stats */
- appendPQExpBufferStr(&catalog_query,
- " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
- " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
- " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
- " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
- " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
- /* expression indexes */
- appendPQExpBufferStr(&catalog_query,
- " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
- " JOIN pg_catalog.pg_index i"
- " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
- " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
- " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
- " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
- " AND NOT a.attisdropped\n"
- " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
- " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
- " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
- " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
- " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
-
- /* inheritance and regular stats */
- appendPQExpBufferStr(&catalog_query,
- " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
- " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
- " AND NOT a.attisdropped\n"
- " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
- " AND a.attgenerated OPERATOR(pg_catalog.<>) "
- CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
- " AND c.relhassubclass\n"
- " AND NOT p.inherited\n"
- " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
- " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
- " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
- " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
- " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
- " AND s.stainherit))\n");
-
- /* inheritance and extended stats */
- appendPQExpBufferStr(&catalog_query,
- " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
- " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
- " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
- " AND c.relhassubclass\n"
- " AND NOT p.inherited\n"
- " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
- " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
- " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
- " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
- " AND d.stxdinherit))\n");
-
- appendPQExpBufferStr(&catalog_query, " )\n");
- }
-
- /*
- * Execute the catalog query. We use the default search_path for this
- * query for consistency with table lookups done elsewhere by the user.
- */
- appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
- executeCommand(conn, "RESET search_path;", echo);
- res = executeQuery(conn, catalog_query.data, echo);
- termPQExpBuffer(&catalog_query);
- PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
- /*
- * Build qualified identifiers for each table, including the column list
- * if given.
- */
- initPQExpBuffer(&buf);
- for (int i = 0; i < PQntuples(res); i++)
- {
- appendPQExpBufferStr(&buf,
- fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
- PQgetvalue(res, i, 0),
- PQclientEncoding(conn)));
-
- if (objects_listed && !PQgetisnull(res, i, 2))
- appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
-
- simple_string_list_append(found_objs, buf.data);
- resetPQExpBuffer(&buf);
- }
- termPQExpBuffer(&buf);
- PQclear(res);
-
- return found_objs;
-}
-
-/*
- * Vacuum/analyze all connectable databases.
- *
- * In analyze-in-stages mode, we process all databases in one stage before
- * moving on to the next stage. That ensure minimal stats are available
- * quickly everywhere before generating more detailed ones.
- */
-static void
-vacuum_all_databases(ConnParams *cparams,
- vacuumingOptions *vacopts,
- bool analyze_in_stages,
- SimpleStringList *objects,
- int concurrentCons,
- const char *progname, bool echo, bool quiet)
-{
- PGconn *conn;
- PGresult *result;
- int stage;
- int i;
-
- conn = connectMaintenanceDatabase(cparams, progname, echo);
- result = executeQuery(conn,
- "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
- echo);
- PQfinish(conn);
-
- if (analyze_in_stages)
- {
- SimpleStringList **found_objs = NULL;
-
- if (vacopts->missing_stats_only)
- found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
-
- /*
- * When analyzing all databases in stages, we analyze them all in the
- * fastest stage first, so that initial statistics become available
- * for all of them as soon as possible.
- *
- * This means we establish several times as many connections, but
- * that's a secondary consideration.
- */
- for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
- {
- for (i = 0; i < PQntuples(result); i++)
- {
- cparams->override_dbname = PQgetvalue(result, i, 0);
-
- vacuum_one_database(cparams, vacopts,
- stage,
- objects,
- vacopts->missing_stats_only ? &found_objs[i] : NULL,
- concurrentCons,
- progname, echo, quiet);
- }
- }
- }
- else
- {
- for (i = 0; i < PQntuples(result); i++)
- {
- cparams->override_dbname = PQgetvalue(result, i, 0);
-
- vacuum_one_database(cparams, vacopts,
- ANALYZE_NO_STAGE,
- objects, NULL,
- concurrentCons,
- progname, echo, quiet);
- }
- }
-
- PQclear(result);
-}
-
-/*
- * Construct a vacuum/analyze command to run based on the given options, in the
- * given string buffer, which may contain previous garbage.
- *
- * The table name used must be already properly quoted. The command generated
- * depends on the server version involved and it is semicolon-terminated.
- */
-static void
-prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
- vacuumingOptions *vacopts, const char *table)
-{
- const char *paren = " (";
- const char *comma = ", ";
- const char *sep = paren;
-
- resetPQExpBuffer(sql);
-
- if (vacopts->analyze_only)
- {
- appendPQExpBufferStr(sql, "ANALYZE");
-
- /* parenthesized grammar of ANALYZE is supported since v11 */
- if (serverVersion >= 110000)
- {
- if (vacopts->skip_locked)
- {
- /* SKIP_LOCKED is supported since v12 */
- Assert(serverVersion >= 120000);
- appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
- sep = comma;
- }
- if (vacopts->verbose)
- {
- appendPQExpBuffer(sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (vacopts->buffer_usage_limit)
- {
- Assert(serverVersion >= 160000);
- appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
- vacopts->buffer_usage_limit);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferChar(sql, ')');
- }
- else
- {
- if (vacopts->verbose)
- appendPQExpBufferStr(sql, " VERBOSE");
- }
- }
- else
- {
- appendPQExpBufferStr(sql, "VACUUM");
-
- /* parenthesized grammar of VACUUM is supported since v9.0 */
- if (serverVersion >= 90000)
- {
- if (vacopts->disable_page_skipping)
- {
- /* DISABLE_PAGE_SKIPPING is supported since v9.6 */
- Assert(serverVersion >= 90600);
- appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
- sep = comma;
- }
- if (vacopts->no_index_cleanup)
- {
- /* "INDEX_CLEANUP FALSE" has been supported since v12 */
- Assert(serverVersion >= 120000);
- Assert(!vacopts->force_index_cleanup);
- appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
- sep = comma;
- }
- if (vacopts->force_index_cleanup)
- {
- /* "INDEX_CLEANUP TRUE" has been supported since v12 */
- Assert(serverVersion >= 120000);
- Assert(!vacopts->no_index_cleanup);
- appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
- sep = comma;
- }
- if (!vacopts->do_truncate)
- {
- /* TRUNCATE is supported since v12 */
- Assert(serverVersion >= 120000);
- appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
- sep = comma;
- }
- if (!vacopts->process_main)
- {
- /* PROCESS_MAIN is supported since v16 */
- Assert(serverVersion >= 160000);
- appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
- sep = comma;
- }
- if (!vacopts->process_toast)
- {
- /* PROCESS_TOAST is supported since v14 */
- Assert(serverVersion >= 140000);
- appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
- sep = comma;
- }
- if (vacopts->skip_database_stats)
- {
- /* SKIP_DATABASE_STATS is supported since v16 */
- Assert(serverVersion >= 160000);
- appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
- sep = comma;
- }
- if (vacopts->skip_locked)
- {
- /* SKIP_LOCKED is supported since v12 */
- Assert(serverVersion >= 120000);
- appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
- sep = comma;
- }
- if (vacopts->full)
- {
- appendPQExpBuffer(sql, "%sFULL", sep);
- sep = comma;
- }
- if (vacopts->freeze)
- {
- appendPQExpBuffer(sql, "%sFREEZE", sep);
- sep = comma;
- }
- if (vacopts->verbose)
- {
- appendPQExpBuffer(sql, "%sVERBOSE", sep);
- sep = comma;
- }
- if (vacopts->and_analyze)
- {
- appendPQExpBuffer(sql, "%sANALYZE", sep);
- sep = comma;
- }
- if (vacopts->parallel_workers >= 0)
- {
- /* PARALLEL is supported since v13 */
- Assert(serverVersion >= 130000);
- appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
- vacopts->parallel_workers);
- sep = comma;
- }
- if (vacopts->buffer_usage_limit)
- {
- Assert(serverVersion >= 160000);
- appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
- vacopts->buffer_usage_limit);
- sep = comma;
- }
- if (sep != paren)
- appendPQExpBufferChar(sql, ')');
- }
- else
- {
- if (vacopts->full)
- appendPQExpBufferStr(sql, " FULL");
- if (vacopts->freeze)
- appendPQExpBufferStr(sql, " FREEZE");
- if (vacopts->verbose)
- appendPQExpBufferStr(sql, " VERBOSE");
- if (vacopts->and_analyze)
- appendPQExpBufferStr(sql, " ANALYZE");
- }
- }
-
- appendPQExpBuffer(sql, " %s;", table);
-}
-
-/*
- * Send a vacuum/analyze command to the server, returning after sending the
- * command.
- *
- * Any errors during command execution are reported to stderr.
- */
-static void
-run_vacuum_command(PGconn *conn, const char *sql, bool echo,
- const char *table)
-{
- bool status;
-
- if (echo)
- printf("%s\n", sql);
-
- status = PQsendQuery(conn, sql) == 1;
-
- if (!status)
- {
- if (table)
- pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
- table, PQdb(conn), PQerrorMessage(conn));
- else
- pg_log_error("vacuuming of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
- }
-}
static void
help(const char *progname)
diff --git a/src/bin/scripts/vacuuming.c b/src/bin/scripts/vacuuming.c
new file mode 100644
index 00000000000..9be37fcc45a
--- /dev/null
+++ b/src/bin/scripts/vacuuming.c
@@ -0,0 +1,978 @@
+/*-------------------------------------------------------------------------
+ * vacuuming.c
+ * Common routines for vacuumdb
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <limits.h>
+
+#include "catalog/pg_attribute_d.h"
+#include "catalog/pg_class_d.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/string_utils.h"
+#include "vacuuming.h"
+
+VacObjFilter objfilter = OBJFILTER_NONE;
+
+
+/*
+ * Executes vacuum/analyze as indicated, or dies in case of failure.
+ */
+void
+vacuuming_main(ConnParams *cparams, const char *dbname,
+ const char *maintenance_db, vacuumingOptions *vacopts,
+ SimpleStringList *objects, bool analyze_in_stages,
+ int tbl_count, int concurrentCons,
+ const char *progname, bool echo, bool quiet)
+{
+ setup_cancel_handler(NULL);
+
+ /* Avoid opening extra connections. */
+ if (tbl_count && (concurrentCons > tbl_count))
+ concurrentCons = tbl_count;
+
+ if (objfilter & OBJFILTER_ALL_DBS)
+ {
+ cparams->dbname = maintenance_db;
+
+ vacuum_all_databases(cparams, vacopts,
+ analyze_in_stages,
+ objects,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+ else
+ {
+ if (dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ dbname = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ dbname = getenv("PGUSER");
+ else
+ dbname = get_user_name_or_exit(progname);
+ }
+
+ cparams->dbname = dbname;
+
+ if (analyze_in_stages)
+ {
+ int stage;
+ SimpleStringList *found_objs = NULL;
+
+ for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+ {
+ vacuum_one_database(cparams, vacopts,
+ stage,
+ objects,
+ vacopts->missing_stats_only ? &found_objs : NULL,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+ }
+ else
+ vacuum_one_database(cparams, vacopts,
+ ANALYZE_NO_STAGE,
+ objects, NULL,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+}
+
+
+/*
+ * vacuum_one_database
+ *
+ * Process tables in the given database.
+ *
+ * There are two ways to specify the list of objects to process:
+ *
+ * 1) The "found_objs" parameter is a double pointer to a fully qualified list
+ * of objects to process, as returned by a previous call to
+ * vacuum_one_database().
+ *
+ * a) If both "found_objs" (the double pointer) and "*found_objs" (the
+ * once-dereferenced double pointer) are not NULL, this list takes
+ * priority, and anything specified in "objects" is ignored.
+ *
+ * b) If "found_objs" (the double pointer) is not NULL but "*found_objs"
+ * (the once-dereferenced double pointer) _is_ NULL, the "objects"
+ * parameter takes priority, and the results of the catalog query
+ * described in (2) are stored in "found_objs".
+ *
+ * c) If "found_objs" (the double pointer) is NULL, the "objects"
+ * parameter again takes priority, and the results of the catalog query
+ * are not saved.
+ *
+ * 2) The "objects" parameter is a user-specified list of objects to process.
+ * When (1b) or (1c) applies, this function performs a catalog query to
+ * retrieve a fully qualified list of objects to process, as described
+ * below.
+ *
+ * a) If "objects" is not NULL, the catalog query gathers only the objects
+ * listed in "objects".
+ *
+ * b) If "objects" is NULL, all tables in the database are gathered.
+ *
+ * Note that this function is only concerned with running exactly one stage
+ * when in analyze-in-stages mode; caller must iterate on us if necessary.
+ *
+ * If concurrentCons is > 1, multiple connections are used to vacuum tables
+ * in parallel.
+ */
+void
+vacuum_one_database(ConnParams *cparams,
+ vacuumingOptions *vacopts,
+ int stage,
+ SimpleStringList *objects,
+ SimpleStringList **found_objs,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet)
+{
+ PQExpBufferData sql;
+ PGconn *conn;
+ SimpleStringListCell *cell;
+ ParallelSlotArray *sa;
+ int ntups = 0;
+ bool failed = false;
+ const char *initcmd;
+ SimpleStringList *ret = NULL;
+ const char *stage_commands[] = {
+ "SET default_statistics_target=1; SET vacuum_cost_delay=0;",
+ "SET default_statistics_target=10; RESET vacuum_cost_delay;",
+ "RESET default_statistics_target;"
+ };
+ const char *stage_messages[] = {
+ gettext_noop("Generating minimal optimizer statistics (1 target)"),
+ gettext_noop("Generating medium optimizer statistics (10 targets)"),
+ gettext_noop("Generating default (full) optimizer statistics")
+ };
+
+ Assert(stage == ANALYZE_NO_STAGE ||
+ (stage >= 0 && stage < ANALYZE_NUM_STAGES));
+
+ conn = connectDatabase(cparams, progname, echo, false, true);
+
+ if (vacopts->disable_page_skipping && PQserverVersion(conn) < 90600)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "disable-page-skipping", "9.6");
+ }
+
+ if (vacopts->no_index_cleanup && PQserverVersion(conn) < 120000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "no-index-cleanup", "12");
+ }
+
+ if (vacopts->force_index_cleanup && PQserverVersion(conn) < 120000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "force-index-cleanup", "12");
+ }
+
+ if (!vacopts->do_truncate && PQserverVersion(conn) < 120000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "no-truncate", "12");
+ }
+
+ if (!vacopts->process_main && PQserverVersion(conn) < 160000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "no-process-main", "16");
+ }
+
+ if (!vacopts->process_toast && PQserverVersion(conn) < 140000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "no-process-toast", "14");
+ }
+
+ if (vacopts->skip_locked && PQserverVersion(conn) < 120000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "skip-locked", "12");
+ }
+
+ if (vacopts->min_xid_age != 0 && PQserverVersion(conn) < 90600)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--min-xid-age", "9.6");
+ }
+
+ if (vacopts->min_mxid_age != 0 && PQserverVersion(conn) < 90600)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--min-mxid-age", "9.6");
+ }
+
+ if (vacopts->parallel_workers >= 0 && PQserverVersion(conn) < 130000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--parallel", "13");
+ }
+
+ if (vacopts->buffer_usage_limit && PQserverVersion(conn) < 160000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--buffer-usage-limit", "16");
+ }
+
+ if (vacopts->missing_stats_only && PQserverVersion(conn) < 150000)
+ {
+ PQfinish(conn);
+ pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+ "--missing-stats-only", "15");
+ }
+
+ /* skip_database_stats is used automatically if server supports it */
+ vacopts->skip_database_stats = (PQserverVersion(conn) >= 160000);
+
+ if (!quiet)
+ {
+ if (stage != ANALYZE_NO_STAGE)
+ printf(_("%s: processing database \"%s\": %s\n"),
+ progname, PQdb(conn), _(stage_messages[stage]));
+ else
+ printf(_("%s: vacuuming database \"%s\"\n"),
+ progname, PQdb(conn));
+ fflush(stdout);
+ }
+
+ /*
+ * If the caller provided the results of a previous catalog query, just
+ * use that. Otherwise, run the catalog query ourselves and set the
+ * return variable if provided.
+ */
+ if (found_objs && *found_objs)
+ ret = *found_objs;
+ else
+ {
+ ret = retrieve_objects(conn, vacopts, objects, echo);
+ if (found_objs)
+ *found_objs = ret;
+ }
+
+ /*
+ * Count the number of objects in the catalog query result. If there are
+ * none, we are done.
+ */
+ for (cell = ret ? ret->head : NULL; cell; cell = cell->next)
+ ntups++;
+
+ if (ntups == 0)
+ {
+ PQfinish(conn);
+ return;
+ }
+
+ /*
+ * Ensure concurrentCons is sane. If there are more connections than
+ * vacuumable relations, we don't need to use them all.
+ */
+ if (concurrentCons > ntups)
+ concurrentCons = ntups;
+ if (concurrentCons <= 0)
+ concurrentCons = 1;
+
+ /*
+ * All slots need to be prepared to run the appropriate analyze stage, if
+ * caller requested that mode. We have to prepare the initial connection
+ * ourselves before setting up the slots.
+ */
+ if (stage == ANALYZE_NO_STAGE)
+ initcmd = NULL;
+ else
+ {
+ initcmd = stage_commands[stage];
+ executeCommand(conn, initcmd, echo);
+ }
+
+ /*
+ * Setup the database connections. We reuse the connection we already have
+ * for the first slot. If not in parallel mode, the first slot in the
+ * array contains the connection.
+ */
+ sa = ParallelSlotsSetup(concurrentCons, cparams, progname, echo, initcmd);
+ ParallelSlotsAdoptConn(sa, conn);
+
+ initPQExpBuffer(&sql);
+
+ cell = ret->head;
+ do
+ {
+ const char *tabname = cell->val;
+ ParallelSlot *free_slot;
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ free_slot = ParallelSlotsGetIdle(sa, NULL);
+ if (!free_slot)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ prepare_vacuum_command(&sql, PQserverVersion(free_slot->connection),
+ vacopts, tabname);
+
+ /*
+ * Execute the vacuum. All errors are handled in processQueryResult
+ * through ParallelSlotsGetIdle.
+ */
+ ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+ run_vacuum_command(free_slot->connection, sql.data,
+ echo, tabname);
+
+ cell = cell->next;
+ } while (cell != NULL);
+
+ if (!ParallelSlotsWaitCompletion(sa))
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /* If we used SKIP_DATABASE_STATS, mop up with ONLY_DATABASE_STATS */
+ if (vacopts->skip_database_stats &&
+ stage == ANALYZE_NO_STAGE)
+ {
+ const char *cmd = "VACUUM (ONLY_DATABASE_STATS);";
+ ParallelSlot *free_slot = ParallelSlotsGetIdle(sa, NULL);
+
+ if (!free_slot)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+ run_vacuum_command(free_slot->connection, cmd, echo, NULL);
+
+ if (!ParallelSlotsWaitCompletion(sa))
+ failed = true;
+ }
+
+finish:
+ ParallelSlotsTerminate(sa);
+ pg_free(sa);
+
+ termPQExpBuffer(&sql);
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * Prepare the list of tables to process by querying the catalogs.
+ *
+ * Since we execute the constructed query with the default search_path (which
+ * could be unsafe), everything in this query MUST be fully qualified.
+ *
+ * First, build a WITH clause for the catalog query if any tables were
+ * specified, with a set of values made of relation names and their optional
+ * set of columns. This is used to match any provided column lists with the
+ * generated qualified identifiers and to filter for the tables provided via
+ * --table. If a listed table does not exist, the catalog query will fail.
+ */
+SimpleStringList *
+retrieve_objects(PGconn *conn, vacuumingOptions *vacopts,
+ SimpleStringList *objects, bool echo)
+{
+ PQExpBufferData buf;
+ PQExpBufferData catalog_query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ SimpleStringList *found_objs = palloc0(sizeof(SimpleStringList));
+ bool objects_listed = false;
+
+ initPQExpBuffer(&catalog_query);
+ for (cell = objects ? objects->head : NULL; cell; cell = cell->next)
+ {
+ char *just_table = NULL;
+ const char *just_columns = NULL;
+
+ if (!objects_listed)
+ {
+ appendPQExpBufferStr(&catalog_query,
+ "WITH listed_objects (object_oid, column_list) AS (\n"
+ " VALUES (");
+ objects_listed = true;
+ }
+ else
+ appendPQExpBufferStr(&catalog_query, ",\n (");
+
+ if (objfilter & (OBJFILTER_SCHEMA | OBJFILTER_SCHEMA_EXCLUDE))
+ {
+ appendStringLiteralConn(&catalog_query, cell->val, conn);
+ appendPQExpBufferStr(&catalog_query, "::pg_catalog.regnamespace, ");
+ }
+
+ if (objfilter & OBJFILTER_TABLE)
+ {
+ /*
+ * Split relation and column names given by the user, this is used
+ * to feed the CTE with values on which are performed pre-run
+ * validity checks as well. For now these happen only on the
+ * relation name.
+ */
+ splitTableColumnsSpec(cell->val, PQclientEncoding(conn),
+ &just_table, &just_columns);
+
+ appendStringLiteralConn(&catalog_query, just_table, conn);
+ appendPQExpBufferStr(&catalog_query, "::pg_catalog.regclass, ");
+ }
+
+ if (just_columns && just_columns[0] != '\0')
+ appendStringLiteralConn(&catalog_query, just_columns, conn);
+ else
+ appendPQExpBufferStr(&catalog_query, "NULL");
+
+ appendPQExpBufferStr(&catalog_query, "::pg_catalog.text)");
+
+ pg_free(just_table);
+ }
+
+ /* Finish formatting the CTE */
+ if (objects_listed)
+ appendPQExpBufferStr(&catalog_query, "\n)\n");
+
+ appendPQExpBufferStr(&catalog_query, "SELECT c.relname, ns.nspname");
+
+ if (objects_listed)
+ appendPQExpBufferStr(&catalog_query, ", listed_objects.column_list");
+
+ appendPQExpBufferStr(&catalog_query,
+ " FROM pg_catalog.pg_class c\n"
+ " JOIN pg_catalog.pg_namespace ns"
+ " ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+ " CROSS JOIN LATERAL (SELECT c.relkind IN ("
+ CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
+ CppAsString2(RELKIND_PARTITIONED_INDEX) ")) as p (inherited)\n"
+ " LEFT JOIN pg_catalog.pg_class t"
+ " ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
+
+ /*
+ * Used to match the tables or schemas listed by the user, completing the
+ * JOIN clause.
+ */
+ if (objects_listed)
+ {
+ appendPQExpBufferStr(&catalog_query, " LEFT JOIN listed_objects"
+ " ON listed_objects.object_oid"
+ " OPERATOR(pg_catalog.=) ");
+
+ if (objfilter & OBJFILTER_TABLE)
+ appendPQExpBufferStr(&catalog_query, "c.oid\n");
+ else
+ appendPQExpBufferStr(&catalog_query, "ns.oid\n");
+ }
+
+ /*
+ * Exclude temporary tables, beginning the WHERE clause.
+ */
+ appendPQExpBufferStr(&catalog_query,
+ " WHERE c.relpersistence OPERATOR(pg_catalog.!=) "
+ CppAsString2(RELPERSISTENCE_TEMP) "\n");
+
+ /*
+ * Used to match the tables or schemas listed by the user, for the WHERE
+ * clause.
+ */
+ if (objects_listed)
+ {
+ if (objfilter & OBJFILTER_SCHEMA_EXCLUDE)
+ appendPQExpBufferStr(&catalog_query,
+ " AND listed_objects.object_oid IS NULL\n");
+ else
+ appendPQExpBufferStr(&catalog_query,
+ " AND listed_objects.object_oid IS NOT NULL\n");
+ }
+
+ /*
+ * If no tables were listed, filter for the relevant relation types. If
+ * tables were given via --table, don't bother filtering by relation type.
+ * Instead, let the server decide whether a given relation can be
+ * processed in which case the user will know about it.
+ */
+ if ((objfilter & OBJFILTER_TABLE) == 0)
+ {
+ /*
+ * vacuumdb should generally follow the behavior of the underlying
+ * VACUUM and ANALYZE commands. If analyze_only is true, process
+ * regular tables, materialized views, and partitioned tables, just
+ * like ANALYZE (with no specific target tables) does. Otherwise,
+ * process only regular tables and materialized views, since VACUUM
+ * skips partitioned tables when no target tables are specified.
+ */
+ if (vacopts->analyze_only)
+ appendPQExpBufferStr(&catalog_query,
+ " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+ CppAsString2(RELKIND_RELATION) ", "
+ CppAsString2(RELKIND_MATVIEW) ", "
+ CppAsString2(RELKIND_PARTITIONED_TABLE) "])\n");
+ else
+ appendPQExpBufferStr(&catalog_query,
+ " AND c.relkind OPERATOR(pg_catalog.=) ANY (array["
+ CppAsString2(RELKIND_RELATION) ", "
+ CppAsString2(RELKIND_MATVIEW) "])\n");
+ }
+
+ /*
+ * For --min-xid-age and --min-mxid-age, the age of the relation is the
+ * greatest of the ages of the main relation and its associated TOAST
+ * table. The commands generated by vacuumdb will also process the TOAST
+ * table for the relation if necessary, so it does not need to be
+ * considered separately.
+ */
+ if (vacopts->min_xid_age != 0)
+ {
+ appendPQExpBuffer(&catalog_query,
+ " AND GREATEST(pg_catalog.age(c.relfrozenxid),"
+ " pg_catalog.age(t.relfrozenxid)) "
+ " OPERATOR(pg_catalog.>=) '%d'::pg_catalog.int4\n"
+ " AND c.relfrozenxid OPERATOR(pg_catalog.!=)"
+ " '0'::pg_catalog.xid\n",
+ vacopts->min_xid_age);
+ }
+
+ if (vacopts->min_mxid_age != 0)
+ {
+ appendPQExpBuffer(&catalog_query,
+ " AND GREATEST(pg_catalog.mxid_age(c.relminmxid),"
+ " pg_catalog.mxid_age(t.relminmxid)) OPERATOR(pg_catalog.>=)"
+ " '%d'::pg_catalog.int4\n"
+ " AND c.relminmxid OPERATOR(pg_catalog.!=)"
+ " '0'::pg_catalog.xid\n",
+ vacopts->min_mxid_age);
+ }
+
+ if (vacopts->missing_stats_only)
+ {
+ appendPQExpBufferStr(&catalog_query, " AND (\n");
+
+ /* regular stats */
+ appendPQExpBufferStr(&catalog_query,
+ " EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+ " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+ " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+ " AND NOT a.attisdropped\n"
+ " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+ " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+ CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+ " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+ " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+ /* extended stats */
+ appendPQExpBufferStr(&catalog_query,
+ " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+ " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+ " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+ " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+ " AND d.stxdinherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+ /* expression indexes */
+ appendPQExpBufferStr(&catalog_query,
+ " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+ " JOIN pg_catalog.pg_index i"
+ " ON i.indexrelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " WHERE i.indrelid OPERATOR(pg_catalog.=) c.oid\n"
+ " AND i.indkey[a.attnum OPERATOR(pg_catalog.-) 1::pg_catalog.int2]"
+ " OPERATOR(pg_catalog.=) 0::pg_catalog.int2\n"
+ " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+ " AND NOT a.attisdropped\n"
+ " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+ " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+ " AND s.stainherit OPERATOR(pg_catalog.=) p.inherited))\n");
+
+ /* inheritance and regular stats */
+ appendPQExpBufferStr(&catalog_query,
+ " OR EXISTS (SELECT NULL FROM pg_catalog.pg_attribute a\n"
+ " WHERE a.attrelid OPERATOR(pg_catalog.=) c.oid\n"
+ " AND a.attnum OPERATOR(pg_catalog.>) 0::pg_catalog.int2\n"
+ " AND NOT a.attisdropped\n"
+ " AND a.attstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+ " AND a.attgenerated OPERATOR(pg_catalog.<>) "
+ CppAsString2(ATTRIBUTE_GENERATED_VIRTUAL) "\n"
+ " AND c.relhassubclass\n"
+ " AND NOT p.inherited\n"
+ " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+ " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic s\n"
+ " WHERE s.starelid OPERATOR(pg_catalog.=) a.attrelid\n"
+ " AND s.staattnum OPERATOR(pg_catalog.=) a.attnum\n"
+ " AND s.stainherit))\n");
+
+ /* inheritance and extended stats */
+ appendPQExpBufferStr(&catalog_query,
+ " OR EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext e\n"
+ " WHERE e.stxrelid OPERATOR(pg_catalog.=) c.oid\n"
+ " AND e.stxstattarget IS DISTINCT FROM 0::pg_catalog.int2\n"
+ " AND c.relhassubclass\n"
+ " AND NOT p.inherited\n"
+ " AND EXISTS (SELECT NULL FROM pg_catalog.pg_inherits h\n"
+ " WHERE h.inhparent OPERATOR(pg_catalog.=) c.oid)\n"
+ " AND NOT EXISTS (SELECT NULL FROM pg_catalog.pg_statistic_ext_data d\n"
+ " WHERE d.stxoid OPERATOR(pg_catalog.=) e.oid\n"
+ " AND d.stxdinherit))\n");
+
+ appendPQExpBufferStr(&catalog_query, " )\n");
+ }
+
+ /*
+ * Execute the catalog query. We use the default search_path for this
+ * query for consistency with table lookups done elsewhere by the user.
+ */
+ appendPQExpBufferStr(&catalog_query, " ORDER BY c.relpages DESC;");
+ executeCommand(conn, "RESET search_path;", echo);
+ res = executeQuery(conn, catalog_query.data, echo);
+ termPQExpBuffer(&catalog_query);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+ /*
+ * Build qualified identifiers for each table, including the column list
+ * if given.
+ */
+ initPQExpBuffer(&buf);
+ for (int i = 0; i < PQntuples(res); i++)
+ {
+ appendPQExpBufferStr(&buf,
+ fmtQualifiedIdEnc(PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 0),
+ PQclientEncoding(conn)));
+
+ if (objects_listed && !PQgetisnull(res, i, 2))
+ appendPQExpBufferStr(&buf, PQgetvalue(res, i, 2));
+
+ simple_string_list_append(found_objs, buf.data);
+ resetPQExpBuffer(&buf);
+ }
+ termPQExpBuffer(&buf);
+ PQclear(res);
+
+ return found_objs;
+}
+
+/*
+ * Vacuum/analyze all connectable databases.
+ *
+ * In analyze-in-stages mode, we process all databases in one stage before
+ * moving on to the next stage. That ensure minimal stats are available
+ * quickly everywhere before generating more detailed ones.
+ */
+void
+vacuum_all_databases(ConnParams *cparams,
+ vacuumingOptions *vacopts,
+ bool analyze_in_stages,
+ SimpleStringList *objects,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet)
+{
+ PGconn *conn;
+ PGresult *result;
+ int stage;
+ int i;
+
+ conn = connectMaintenanceDatabase(cparams, progname, echo);
+ result = executeQuery(conn,
+ "SELECT datname FROM pg_database WHERE datallowconn AND datconnlimit <> -2 ORDER BY 1;",
+ echo);
+ PQfinish(conn);
+
+ if (analyze_in_stages)
+ {
+ SimpleStringList **found_objs = NULL;
+
+ if (vacopts->missing_stats_only)
+ found_objs = palloc0(PQntuples(result) * sizeof(SimpleStringList *));
+
+ /*
+ * When analyzing all databases in stages, we analyze them all in the
+ * fastest stage first, so that initial statistics become available
+ * for all of them as soon as possible.
+ *
+ * This means we establish several times as many connections, but
+ * that's a secondary consideration.
+ */
+ for (stage = 0; stage < ANALYZE_NUM_STAGES; stage++)
+ {
+ for (i = 0; i < PQntuples(result); i++)
+ {
+ cparams->override_dbname = PQgetvalue(result, i, 0);
+
+ vacuum_one_database(cparams, vacopts,
+ stage,
+ objects,
+ vacopts->missing_stats_only ? &found_objs[i] : NULL,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+ }
+ }
+ else
+ {
+ for (i = 0; i < PQntuples(result); i++)
+ {
+ cparams->override_dbname = PQgetvalue(result, i, 0);
+
+ vacuum_one_database(cparams, vacopts,
+ ANALYZE_NO_STAGE,
+ objects, NULL,
+ concurrentCons,
+ progname, echo, quiet);
+ }
+ }
+
+ PQclear(result);
+}
+
+/*
+ * Construct a vacuum/analyze command to run based on the given
+ * options, in the given string buffer, which may contain previous garbage.
+ *
+ * The table name used must be already properly quoted. The command generated
+ * depends on the server version involved and it is semicolon-terminated.
+ */
+void
+prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+ vacuumingOptions *vacopts, const char *table)
+{
+ const char *paren = " (";
+ const char *comma = ", ";
+ const char *sep = paren;
+
+ resetPQExpBuffer(sql);
+
+ if (vacopts->analyze_only)
+ {
+ appendPQExpBufferStr(sql, "ANALYZE");
+
+ /* parenthesized grammar of ANALYZE is supported since v11 */
+ if (serverVersion >= 110000)
+ {
+ if (vacopts->skip_locked)
+ {
+ /* SKIP_LOCKED is supported since v12 */
+ Assert(serverVersion >= 120000);
+ appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+ sep = comma;
+ }
+ if (vacopts->verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (vacopts->buffer_usage_limit)
+ {
+ Assert(serverVersion >= 160000);
+ appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+ vacopts->buffer_usage_limit);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferChar(sql, ')');
+ }
+ else
+ {
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ }
+ }
+ else
+ {
+ appendPQExpBufferStr(sql, "VACUUM");
+
+ /* parenthesized grammar of VACUUM is supported since v9.0 */
+ if (serverVersion >= 90000)
+ {
+ if (vacopts->disable_page_skipping)
+ {
+ /* DISABLE_PAGE_SKIPPING is supported since v9.6 */
+ Assert(serverVersion >= 90600);
+ appendPQExpBuffer(sql, "%sDISABLE_PAGE_SKIPPING", sep);
+ sep = comma;
+ }
+ if (vacopts->no_index_cleanup)
+ {
+ /* "INDEX_CLEANUP FALSE" has been supported since v12 */
+ Assert(serverVersion >= 120000);
+ Assert(!vacopts->force_index_cleanup);
+ appendPQExpBuffer(sql, "%sINDEX_CLEANUP FALSE", sep);
+ sep = comma;
+ }
+ if (vacopts->force_index_cleanup)
+ {
+ /* "INDEX_CLEANUP TRUE" has been supported since v12 */
+ Assert(serverVersion >= 120000);
+ Assert(!vacopts->no_index_cleanup);
+ appendPQExpBuffer(sql, "%sINDEX_CLEANUP TRUE", sep);
+ sep = comma;
+ }
+ if (!vacopts->do_truncate)
+ {
+ /* TRUNCATE is supported since v12 */
+ Assert(serverVersion >= 120000);
+ appendPQExpBuffer(sql, "%sTRUNCATE FALSE", sep);
+ sep = comma;
+ }
+ if (!vacopts->process_main)
+ {
+ /* PROCESS_MAIN is supported since v16 */
+ Assert(serverVersion >= 160000);
+ appendPQExpBuffer(sql, "%sPROCESS_MAIN FALSE", sep);
+ sep = comma;
+ }
+ if (!vacopts->process_toast)
+ {
+ /* PROCESS_TOAST is supported since v14 */
+ Assert(serverVersion >= 140000);
+ appendPQExpBuffer(sql, "%sPROCESS_TOAST FALSE", sep);
+ sep = comma;
+ }
+ if (vacopts->skip_database_stats)
+ {
+ /* SKIP_DATABASE_STATS is supported since v16 */
+ Assert(serverVersion >= 160000);
+ appendPQExpBuffer(sql, "%sSKIP_DATABASE_STATS", sep);
+ sep = comma;
+ }
+ if (vacopts->skip_locked)
+ {
+ /* SKIP_LOCKED is supported since v12 */
+ Assert(serverVersion >= 120000);
+ appendPQExpBuffer(sql, "%sSKIP_LOCKED", sep);
+ sep = comma;
+ }
+ if (vacopts->full)
+ {
+ appendPQExpBuffer(sql, "%sFULL", sep);
+ sep = comma;
+ }
+ if (vacopts->freeze)
+ {
+ appendPQExpBuffer(sql, "%sFREEZE", sep);
+ sep = comma;
+ }
+ if (vacopts->verbose)
+ {
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
+ sep = comma;
+ }
+ if (vacopts->and_analyze)
+ {
+ appendPQExpBuffer(sql, "%sANALYZE", sep);
+ sep = comma;
+ }
+ if (vacopts->parallel_workers >= 0)
+ {
+ /* PARALLEL is supported since v13 */
+ Assert(serverVersion >= 130000);
+ appendPQExpBuffer(sql, "%sPARALLEL %d", sep,
+ vacopts->parallel_workers);
+ sep = comma;
+ }
+ if (vacopts->buffer_usage_limit)
+ {
+ Assert(serverVersion >= 160000);
+ appendPQExpBuffer(sql, "%sBUFFER_USAGE_LIMIT '%s'", sep,
+ vacopts->buffer_usage_limit);
+ sep = comma;
+ }
+ if (sep != paren)
+ appendPQExpBufferChar(sql, ')');
+ }
+ else
+ {
+ if (vacopts->full)
+ appendPQExpBufferStr(sql, " FULL");
+ if (vacopts->freeze)
+ appendPQExpBufferStr(sql, " FREEZE");
+ if (vacopts->verbose)
+ appendPQExpBufferStr(sql, " VERBOSE");
+ if (vacopts->and_analyze)
+ appendPQExpBufferStr(sql, " ANALYZE");
+ }
+ }
+
+ appendPQExpBuffer(sql, " %s;", table);
+}
+
+/*
+ * Send a vacuum/analyze command to the server, returning after sending the
+ * command.
+ *
+ * Any errors during command execution are reported to stderr.
+ */
+void
+run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+ const char *table)
+{
+ bool status;
+
+ if (echo)
+ printf("%s\n", sql);
+
+ status = PQsendQuery(conn, sql) == 1;
+
+ if (!status)
+ {
+ if (table)
+ {
+ pg_log_error("vacuuming of table \"%s\" in database \"%s\" failed: %s",
+ table, PQdb(conn), PQerrorMessage(conn));
+ }
+ else
+ {
+ pg_log_error("vacuuming of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+ }
+ }
+}
+
+/*
+ * Returns a newly malloc'd version of 'src' with escaped single quotes and
+ * backslashes.
+ */
+char *
+escape_quotes(const char *src)
+{
+ char *result = escape_single_quotes_ascii(src);
+
+ if (!result)
+ pg_fatal("out of memory");
+ return result;
+}
diff --git a/src/bin/scripts/vacuuming.h b/src/bin/scripts/vacuuming.h
new file mode 100644
index 00000000000..d3f000840fa
--- /dev/null
+++ b/src/bin/scripts/vacuuming.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * vacuuming.h
+ * Common declarations for vacuuming.c
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/bin/scripts/vacuuming.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef VACUUMING_H
+#define VACUUMING_H
+
+#include "common.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/simple_list.h"
+
+/* For analyze-in-stages mode */
+#define ANALYZE_NO_STAGE -1
+#define ANALYZE_NUM_STAGES 3
+
+/* vacuum options controlled by user flags */
+typedef struct vacuumingOptions
+{
+ bool analyze_only;
+ bool verbose;
+ bool and_analyze;
+ bool full;
+ bool freeze;
+ bool disable_page_skipping;
+ bool skip_locked;
+ int min_xid_age;
+ int min_mxid_age;
+ int parallel_workers; /* >= 0 indicates user specified the
+ * parallel degree, otherwise -1 */
+ bool no_index_cleanup;
+ bool force_index_cleanup;
+ bool do_truncate;
+ bool process_main;
+ bool process_toast;
+ bool skip_database_stats;
+ char *buffer_usage_limit;
+ bool missing_stats_only;
+} vacuumingOptions;
+
+/* object filter options */
+typedef enum
+{
+ OBJFILTER_NONE = 0, /* no filter used */
+ OBJFILTER_ALL_DBS = (1 << 0), /* -a | --all */
+ OBJFILTER_DATABASE = (1 << 1), /* -d | --dbname */
+ OBJFILTER_TABLE = (1 << 2), /* -t | --table */
+ OBJFILTER_SCHEMA = (1 << 3), /* -n | --schema */
+ OBJFILTER_SCHEMA_EXCLUDE = (1 << 4), /* -N | --exclude-schema */
+} VacObjFilter;
+
+extern VacObjFilter objfilter;
+
+extern void vacuuming_main(ConnParams *cparams, const char *dbname,
+ const char *maintenance_db, vacuumingOptions *vacopts,
+ SimpleStringList *objects, bool analyze_in_stages,
+ int tbl_count, int concurrentCons,
+ const char *progname, bool echo, bool quiet);
+
+extern SimpleStringList *retrieve_objects(PGconn *conn,
+ vacuumingOptions *vacopts,
+ SimpleStringList *objects,
+ bool echo);
+
+extern void vacuum_one_database(ConnParams *cparams,
+ vacuumingOptions *vacopts,
+ int stage,
+ SimpleStringList *objects,
+ SimpleStringList **found_objs,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet);
+
+extern void vacuum_all_databases(ConnParams *cparams,
+ vacuumingOptions *vacopts,
+ bool analyze_in_stages,
+ SimpleStringList *objects,
+ int concurrentCons,
+ const char *progname, bool echo, bool quiet);
+
+extern void prepare_vacuum_command(PQExpBuffer sql, int serverVersion,
+ vacuumingOptions *vacopts, const char *table);
+
+extern void run_vacuum_command(PGconn *conn, const char *sql, bool echo,
+ const char *table);
+
+extern char *escape_quotes(const char *src);
+
+#endif /* VACUUMING_H */
--
2.43.0
view thread (106+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Adding REPACK [concurrently]
In-Reply-To: <CADzfLwUgPMLiFkXRnk97ugPqkDfsNJ3TRdw9gjJM=8WB4_nXwQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox