public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Korotkov <[email protected]>
To: Álvaro Herrera <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Cc: [email protected]
Cc: Maxim Orlov <[email protected]>
Cc: Svetlana Derevyanko <[email protected]>
Subject: Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs
Date: Tue, 11 Mar 2025 23:33:25 +0200
Message-ID: <CAPpHfdsE4O15X=rbNt9f1KAyg2k6XcmHfUOrcWURKQ-yHPggAg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAPpHfds0Y7pcOrhwuFSJXvs58-=tv=_odWMCoCch01P2QaXq1w@mail.gmail.com>
<[email protected]>
On Mon, Mar 10, 2025 at 1:12 PM Álvaro Herrera <[email protected]> wrote:
> On 2025-Mar-09, Alexander Korotkov wrote:
>
> > After second thought it's not so hard to fix. The attached patch does
> > it by putting REINDEX commands related to the same table into a single
> > SQL statement. Possibly, that could be better than revert given we
> > need to handle compatibility. What do you think?
>
> Oh, this is an egg of Columbus solution, I like it. It seems to work as
> intended on a quick test. Please add some commentary to run_reindex_command.
>
> Maybe another, possibly better way to do this would be to use libpq
> pipeline mode, sending all the commands for a table in one pipeline
> instead of a single command. The advantage of this would be that
> server-side log entries for each command would be separate, and they
> wouldn't appear clumped together in pg_stat_activity and so on.
> However, this would require more invasive changes, so it might be better
> to leave that for a future project -- it's certainly a harder sell for
> such a change to be backpatched. So I'm +1 on your current patch for 17
> and master.
Thank you for your feedback! I also think that pipelining would be a
better options, but it's too invasive for backpatching. I've written
comments for gen_reindex_command() and run_reindex_command(). I'm
going to push this patch to master and 17 if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
[application/octet-stream] v2-0001-reindexdb-Fix-the-index-level-REINDEX-with-multip.patch (8.3K, 2-v2-0001-reindexdb-Fix-the-index-level-REINDEX-with-multip.patch)
download | inline diff:
From 8c3188e9713cf0401a14f64fb45389293bbbbd42 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <[email protected]>
Date: Tue, 11 Mar 2025 23:31:14 +0200
Subject: [PATCH v2] reindexdb: Fix the index-level REINDEX with multiple jobs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
47f99a407d introduced a parallel index-level REINDEX. The code was written
assuming that running run_reindex_command() with async == true can schedule
a number of queries for a connection. That's not true, and the second query
sent using run_reindex_command() will wait for the completion of the previous
one.
This commit fixes that by putting REINDEX commands for the same table into a
single query.
Reported-by: Álvaro Herrera <[email protected]>
Discussion: https://postgr.es/m/202503071820.j25zn3lo4hvn%40alvherre.pgsql
Reviewed-by: Álvaro Herrera <[email protected]>
Backpatch-through: 17
---
src/bin/scripts/reindexdb.c | 126 +++++++++++++++++++++---------------
1 file changed, 74 insertions(+), 52 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index b00c8112869..355f2a52391 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -49,10 +49,13 @@ static void reindex_all_databases(ConnParams *cparams,
bool syscatalog, SimpleStringList *schemas,
SimpleStringList *tables,
SimpleStringList *indexes);
-static void run_reindex_command(PGconn *conn, ReindexType type,
+static void gen_reindex_command(PGconn *conn, ReindexType type,
const char *name, bool echo, bool verbose,
- bool concurrently, bool async,
- const char *tablespace);
+ bool concurrently, const char *tablespace,
+ PQExpBufferData *sql);
+static void run_reindex_command(PGconn *conn, ReindexType type,
+ const char *name, bool echo, bool async,
+ PQExpBufferData *sq);
static void help(const char *progname);
@@ -284,7 +287,6 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
ParallelSlotArray *sa;
bool failed = false;
int items_count = 0;
- char *prev_index_table_name = NULL;
ParallelSlot *free_slot = NULL;
conn = connectDatabase(cparams, progname, echo, false, true);
@@ -430,8 +432,8 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
cell = process_list->head;
do
{
+ PQExpBufferData sql;
const char *objname = cell->val;
- bool need_new_slot = true;
if (CancelRequested)
{
@@ -439,35 +441,45 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
goto finish;
}
- /*
- * For parallel index-level REINDEX, the indices of the same table are
- * ordered together and they are to be processed by the same job. So,
- * we don't switch the job as soon as the index belongs to the same
- * table as the previous one.
- */
- if (parallel && process_type == REINDEX_INDEX)
+ free_slot = ParallelSlotsGetIdle(sa, NULL);
+ if (!free_slot)
{
- if (prev_index_table_name != NULL &&
- strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
- need_new_slot = false;
- prev_index_table_name = indices_tables_cell->val;
- indices_tables_cell = indices_tables_cell->next;
+ failed = true;
+ goto finish;
}
- if (need_new_slot)
+ ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+ initPQExpBuffer(&sql);
+ if (parallel && process_type == REINDEX_INDEX)
{
- free_slot = ParallelSlotsGetIdle(sa, NULL);
- if (!free_slot)
+ /*
+ * For parallel index-level REINDEX, the indices of the same table
+ * are ordered together and they are to be processed by the same
+ * job. So, we put all the relevant REINDEX commands into the
+ * same SQL query to be processed by this job at once.
+ */
+ gen_reindex_command(free_slot->connection, process_type, objname,
+ echo, verbose, concurrently, tablespace, &sql);
+ while (indices_tables_cell->next &&
+ strcmp(indices_tables_cell->val, indices_tables_cell->next->val) == 0)
{
- failed = true;
- goto finish;
+ indices_tables_cell = indices_tables_cell->next;
+ cell = cell->next;
+ objname = cell->val;
+ appendPQExpBufferChar(&sql, '\n');
+ gen_reindex_command(free_slot->connection, process_type, objname,
+ echo, verbose, concurrently, tablespace, &sql);
}
-
- ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
+ indices_tables_cell = indices_tables_cell->next;
+ }
+ else
+ {
+ gen_reindex_command(free_slot->connection, process_type, objname,
+ echo, verbose, concurrently, tablespace, &sql);
}
-
run_reindex_command(free_slot->connection, process_type, objname,
- echo, verbose, concurrently, true, tablespace);
+ echo, true, &sql);
+ termPQExpBuffer(&sql);
cell = cell->next;
} while (cell != NULL);
@@ -495,57 +507,57 @@ finish:
exit(1);
}
+/*
+ * Append a SQL command required to reindex a given database object to the
+ * '*sql' string.
+ */
static void
-run_reindex_command(PGconn *conn, ReindexType type, const char *name,
- bool echo, bool verbose, bool concurrently, bool async,
- const char *tablespace)
+gen_reindex_command(PGconn *conn, ReindexType type, const char *name,
+ bool echo, bool verbose, bool concurrently,
+ const char *tablespace, PQExpBufferData *sql)
{
const char *paren = "(";
const char *comma = ", ";
const char *sep = paren;
- PQExpBufferData sql;
- bool status;
Assert(name);
/* build the REINDEX query */
- initPQExpBuffer(&sql);
-
- appendPQExpBufferStr(&sql, "REINDEX ");
+ appendPQExpBufferStr(sql, "REINDEX ");
if (verbose)
{
- appendPQExpBuffer(&sql, "%sVERBOSE", sep);
+ appendPQExpBuffer(sql, "%sVERBOSE", sep);
sep = comma;
}
if (tablespace)
{
- appendPQExpBuffer(&sql, "%sTABLESPACE %s", sep,
+ appendPQExpBuffer(sql, "%sTABLESPACE %s", sep,
fmtIdEnc(tablespace, PQclientEncoding(conn)));
sep = comma;
}
if (sep != paren)
- appendPQExpBufferStr(&sql, ") ");
+ appendPQExpBufferStr(sql, ") ");
/* object type */
switch (type)
{
case REINDEX_DATABASE:
- appendPQExpBufferStr(&sql, "DATABASE ");
+ appendPQExpBufferStr(sql, "DATABASE ");
break;
case REINDEX_INDEX:
- appendPQExpBufferStr(&sql, "INDEX ");
+ appendPQExpBufferStr(sql, "INDEX ");
break;
case REINDEX_SCHEMA:
- appendPQExpBufferStr(&sql, "SCHEMA ");
+ appendPQExpBufferStr(sql, "SCHEMA ");
break;
case REINDEX_SYSTEM:
- appendPQExpBufferStr(&sql, "SYSTEM ");
+ appendPQExpBufferStr(sql, "SYSTEM ");
break;
case REINDEX_TABLE:
- appendPQExpBufferStr(&sql, "TABLE ");
+ appendPQExpBufferStr(sql, "TABLE ");
break;
}
@@ -555,37 +567,49 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
* object type.
*/
if (concurrently)
- appendPQExpBufferStr(&sql, "CONCURRENTLY ");
+ appendPQExpBufferStr(sql, "CONCURRENTLY ");
/* object name */
switch (type)
{
case REINDEX_DATABASE:
case REINDEX_SYSTEM:
- appendPQExpBufferStr(&sql,
+ appendPQExpBufferStr(sql,
fmtIdEnc(name, PQclientEncoding(conn)));
break;
case REINDEX_INDEX:
case REINDEX_TABLE:
- appendQualifiedRelation(&sql, name, conn, echo);
+ appendQualifiedRelation(sql, name, conn, echo);
break;
case REINDEX_SCHEMA:
- appendPQExpBufferStr(&sql, name);
+ appendPQExpBufferStr(sql, name);
break;
}
/* finish the query */
- appendPQExpBufferChar(&sql, ';');
+ appendPQExpBufferChar(sql, ';');
+}
+
+/*
+ * Run one or more reindex commands accumulated in the '*sql' string against
+ * a given database connection. Note that the 'async' parameter skips the
+ * PQfinish() but does not support pipelining.
+ */
+static void
+run_reindex_command(PGconn *conn, ReindexType type, const char *name,
+ bool echo, bool async, PQExpBufferData *sql)
+{
+ bool status;
if (async)
{
if (echo)
- printf("%s\n", sql.data);
+ printf("%s\n", sql->data);
- status = PQsendQuery(conn, sql.data) == 1;
+ status = PQsendQuery(conn, sql->data) == 1;
}
else
- status = executeMaintenanceCommand(conn, sql.data, echo);
+ status = executeMaintenanceCommand(conn, sql->data, echo);
if (!status)
{
@@ -618,8 +642,6 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
exit(1);
}
}
-
- termPQExpBuffer(&sql);
}
/*
--
2.39.5 (Apple Git-154)
view thread (11+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs
In-Reply-To: <CAPpHfdsE4O15X=rbNt9f1KAyg2k6XcmHfUOrcWURKQ-yHPggAg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox