public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter Eisentraut <[email protected]>
To: Tomas Vondra <[email protected]>
To: [email protected]
Subject: Re: pgsql: Allow parallel CREATE INDEX for GIN indexes
Date: Fri, 7 Mar 2025 22:22:19 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

The new tuplesort_getgintuple() in tuplesortvariants.c has a branch that 
does "return false" even though the function's return type is GinTuple 
*.  That is probably a mistake.  Check please.

Also, this code contains a "pgrminclude ignore", but we don't use those 
anymore.


On 03.03.25 17:10, Tomas Vondra wrote:
> Allow parallel CREATE INDEX for GIN indexes
> 
> Allow using parallel workers to build a GIN index, similarly to BTREE
> and BRIN. For large tables this may result in significant speedup when
> the build is CPU-bound.
> 
> The work is divided so that each worker builds index entries on a subset
> of the table, determined by the regular parallel scan used to read the
> data. Each worker uses a local tuplesort to sort and merge the entries
> for the same key. The TID lists do not overlap (for a given key), which
> means the merge sort simply concatenates the two lists. The merged
> entries are written into a shared tuplesort for the leader.
> 
> The leader needs to merge the sorted entries again, before writing them
> into the index. But this way a significant part of the work happens in
> the workers, and the leader is left with merging fewer large entries,
> which is more efficient.
> 
> Most of the parallelism infrastructure is a simplified copy of the code
> used by BTREE indexes, omitting the parts irrelevant for GIN indexes
> (e.g. uniqueness checks).
> 
> Original patch by me, with reviews and substantial improvements by
> Matthias van de Meent, certainly enough to make him a co-author.
> 
> Author: Tomas Vondra, Matthias van de Meent
> Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
> Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
> 
> Branch
> ------
> master
> 
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/8492feb98f6df3f0f03e84ed56f0d1cbb2ac514c
> 
> Modified Files
> --------------
> src/backend/access/gin/gininsert.c         | 1649 +++++++++++++++++++++++++++-
> src/backend/access/gin/ginutil.c           |   30 +-
> src/backend/access/transam/parallel.c      |    4 +
> src/backend/utils/sort/tuplesortvariants.c |  198 ++++
> src/include/access/gin.h                   |   15 +
> src/include/access/gin_private.h           |    1 +
> src/include/access/gin_tuple.h             |   44 +
> src/include/utils/tuplesort.h              |    8 +
> src/tools/pgindent/typedefs.list           |    4 +
> 9 files changed, 1937 insertions(+), 16 deletions(-)
> 






view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: pgsql: Allow parallel CREATE INDEX for GIN indexes
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox