public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: [email protected]
Subject: pgsql: Compress TID lists when writing GIN tuples to disk
Date: Tue, 04 Mar 2025 18:02:15 +0000
Message-ID: <[email protected]> (raw)

Compress TID lists when writing GIN tuples to disk

When serializing GIN tuples to tuplesorts during parallel index builds,
we can significantly reduce the amount of data by compressing the TID
lists. The GIN opclasses may produce a lot of data (depending on how
many keys are extracted from each row), and the TID compression is very
efficient and effective.

If the number of distinct keys is high, the first worker pass (reading
data from the table and writing them into a private tuplesort) may not
benefit from the compression very much. It is likely to spill data to
disk before the TID lists get long enough for the compression to help.
The second pass (writing the merged data into the shared tuplesort) is
more likely to benefit from compression.

The compression can be seen as a way to reduce the amount of disk space
needed by the parallel builds, because the data is written twice. First
into the per-worker tuplesorts, then into the shared tuplesort.

Author: Tomas Vondra
Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke
Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0b2a45a5d1f2b088640a7eb7817ca6a0513a2717

Modified Files
--------------
src/backend/access/gin/gininsert.c | 116 ++++++++++++++++++++++++++++++-------
src/tools/pgindent/typedefs.list   |   1 +
2 files changed, 95 insertions(+), 22 deletions(-)



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: pgsql: Compress TID lists when writing GIN tuples to disk
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox