public inbox for [email protected]
help / color / mirror / Atom feedpgsql: Allow parallel CREATE INDEX for GIN indexes
4+ messages / 3 participants
[nested] [flat]
* pgsql: Allow parallel CREATE INDEX for GIN indexes
@ 2025-03-03 16:10 Tomas Vondra <[email protected]>
2025-03-07 21:22 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Peter Eisentraut <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Tomas Vondra @ 2025-03-03 16:10 UTC (permalink / raw)
To: [email protected]
Allow parallel CREATE INDEX for GIN indexes
Allow using parallel workers to build a GIN index, similarly to BTREE
and BRIN. For large tables this may result in significant speedup when
the build is CPU-bound.
The work is divided so that each worker builds index entries on a subset
of the table, determined by the regular parallel scan used to read the
data. Each worker uses a local tuplesort to sort and merge the entries
for the same key. The TID lists do not overlap (for a given key), which
means the merge sort simply concatenates the two lists. The merged
entries are written into a shared tuplesort for the leader.
The leader needs to merge the sorted entries again, before writing them
into the index. But this way a significant part of the work happens in
the workers, and the leader is left with merging fewer large entries,
which is more efficient.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for GIN indexes
(e.g. uniqueness checks).
Original patch by me, with reviews and substantial improvements by
Matthias van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/8492feb98f6df3f0f03e84ed56f0d1cbb2ac514c
Modified Files
--------------
src/backend/access/gin/gininsert.c | 1649 +++++++++++++++++++++++++++-
src/backend/access/gin/ginutil.c | 30 +-
src/backend/access/transam/parallel.c | 4 +
src/backend/utils/sort/tuplesortvariants.c | 198 ++++
src/include/access/gin.h | 15 +
src/include/access/gin_private.h | 1 +
src/include/access/gin_tuple.h | 44 +
src/include/utils/tuplesort.h | 8 +
src/tools/pgindent/typedefs.list | 4 +
9 files changed, 1937 insertions(+), 16 deletions(-)
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: pgsql: Allow parallel CREATE INDEX for GIN indexes
2025-03-03 16:10 pgsql: Allow parallel CREATE INDEX for GIN indexes Tomas Vondra <[email protected]>
@ 2025-03-07 21:22 ` Peter Eisentraut <[email protected]>
2025-04-01 13:30 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Peter Eisentraut <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Peter Eisentraut @ 2025-03-07 21:22 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; [email protected]
The new tuplesort_getgintuple() in tuplesortvariants.c has a branch that
does "return false" even though the function's return type is GinTuple
*. That is probably a mistake. Check please.
Also, this code contains a "pgrminclude ignore", but we don't use those
anymore.
On 03.03.25 17:10, Tomas Vondra wrote:
> Allow parallel CREATE INDEX for GIN indexes
>
> Allow using parallel workers to build a GIN index, similarly to BTREE
> and BRIN. For large tables this may result in significant speedup when
> the build is CPU-bound.
>
> The work is divided so that each worker builds index entries on a subset
> of the table, determined by the regular parallel scan used to read the
> data. Each worker uses a local tuplesort to sort and merge the entries
> for the same key. The TID lists do not overlap (for a given key), which
> means the merge sort simply concatenates the two lists. The merged
> entries are written into a shared tuplesort for the leader.
>
> The leader needs to merge the sorted entries again, before writing them
> into the index. But this way a significant part of the work happens in
> the workers, and the leader is left with merging fewer large entries,
> which is more efficient.
>
> Most of the parallelism infrastructure is a simplified copy of the code
> used by BTREE indexes, omitting the parts irrelevant for GIN indexes
> (e.g. uniqueness checks).
>
> Original patch by me, with reviews and substantial improvements by
> Matthias van de Meent, certainly enough to make him a co-author.
>
> Author: Tomas Vondra, Matthias van de Meent
> Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
> Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
>
> Branch
> ------
> master
>
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/8492feb98f6df3f0f03e84ed56f0d1cbb2ac514c
>
> Modified Files
> --------------
> src/backend/access/gin/gininsert.c | 1649 +++++++++++++++++++++++++++-
> src/backend/access/gin/ginutil.c | 30 +-
> src/backend/access/transam/parallel.c | 4 +
> src/backend/utils/sort/tuplesortvariants.c | 198 ++++
> src/include/access/gin.h | 15 +
> src/include/access/gin_private.h | 1 +
> src/include/access/gin_tuple.h | 44 +
> src/include/utils/tuplesort.h | 8 +
> src/tools/pgindent/typedefs.list | 4 +
> 9 files changed, 1937 insertions(+), 16 deletions(-)
>
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: pgsql: Allow parallel CREATE INDEX for GIN indexes
2025-03-03 16:10 pgsql: Allow parallel CREATE INDEX for GIN indexes Tomas Vondra <[email protected]>
2025-03-07 21:22 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Peter Eisentraut <[email protected]>
@ 2025-04-01 13:30 ` Peter Eisentraut <[email protected]>
2025-04-02 10:32 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Tomas Vondra <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Peter Eisentraut @ 2025-04-01 13:30 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; [email protected]
On 07.03.25 22:22, Peter Eisentraut wrote:
> The new tuplesort_getgintuple() in tuplesortvariants.c has a branch that
> does "return false" even though the function's return type is GinTuple
> *. That is probably a mistake. Check please.
>
> Also, this code contains a "pgrminclude ignore", but we don't use those
> anymore.
Fixed committed.
> On 03.03.25 17:10, Tomas Vondra wrote:
>> Allow parallel CREATE INDEX for GIN indexes
>>
>> Allow using parallel workers to build a GIN index, similarly to BTREE
>> and BRIN. For large tables this may result in significant speedup when
>> the build is CPU-bound.
>>
>> The work is divided so that each worker builds index entries on a subset
>> of the table, determined by the regular parallel scan used to read the
>> data. Each worker uses a local tuplesort to sort and merge the entries
>> for the same key. The TID lists do not overlap (for a given key), which
>> means the merge sort simply concatenates the two lists. The merged
>> entries are written into a shared tuplesort for the leader.
>>
>> The leader needs to merge the sorted entries again, before writing them
>> into the index. But this way a significant part of the work happens in
>> the workers, and the leader is left with merging fewer large entries,
>> which is more efficient.
>>
>> Most of the parallelism infrastructure is a simplified copy of the code
>> used by BTREE indexes, omitting the parts irrelevant for GIN indexes
>> (e.g. uniqueness checks).
>>
>> Original patch by me, with reviews and substantial improvements by
>> Matthias van de Meent, certainly enough to make him a co-author.
>>
>> Author: Tomas Vondra, Matthias van de Meent
>> Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
>> Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-
>> f25ad98582dc%40enterprisedb.com
>>
>> Branch
>> ------
>> master
>>
>> Details
>> -------
>> https://git.postgresql.org/pg/
>> commitdiff/8492feb98f6df3f0f03e84ed56f0d1cbb2ac514c
>>
>> Modified Files
>> --------------
>> src/backend/access/gin/gininsert.c | 1649 ++++++++++++++++++++
>> +++++++-
>> src/backend/access/gin/ginutil.c | 30 +-
>> src/backend/access/transam/parallel.c | 4 +
>> src/backend/utils/sort/tuplesortvariants.c | 198 ++++
>> src/include/access/gin.h | 15 +
>> src/include/access/gin_private.h | 1 +
>> src/include/access/gin_tuple.h | 44 +
>> src/include/utils/tuplesort.h | 8 +
>> src/tools/pgindent/typedefs.list | 4 +
>> 9 files changed, 1937 insertions(+), 16 deletions(-)
>>
>
>
>
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: pgsql: Allow parallel CREATE INDEX for GIN indexes
2025-03-03 16:10 pgsql: Allow parallel CREATE INDEX for GIN indexes Tomas Vondra <[email protected]>
2025-03-07 21:22 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Peter Eisentraut <[email protected]>
2025-04-01 13:30 ` Re: pgsql: Allow parallel CREATE INDEX for GIN indexes Peter Eisentraut <[email protected]>
@ 2025-04-02 10:32 ` Tomas Vondra <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Tomas Vondra @ 2025-04-02 10:32 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; Tomas Vondra <[email protected]>; [email protected]
On 4/1/25 15:30, Peter Eisentraut wrote:
> On 07.03.25 22:22, Peter Eisentraut wrote:
>> The new tuplesort_getgintuple() in tuplesortvariants.c has a branch
>> that does "return false" even though the function's return type is
>> GinTuple *. That is probably a mistake. Check please.
>>
>> Also, this code contains a "pgrminclude ignore", but we don't use
>> those anymore.
>
> Fixed committed.
>
Thank you! I apparently missed your report on pgsql-committers :-(
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2025-04-02 10:32 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-03 16:10 pgsql: Allow parallel CREATE INDEX for GIN indexes Tomas Vondra <[email protected]>
2025-03-07 21:22 ` Peter Eisentraut <[email protected]>
2025-04-01 13:30 ` Peter Eisentraut <[email protected]>
2025-04-02 10:32 ` Tomas Vondra <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox