public inbox for [email protected]help / color / mirror / Atom feed
Re: Reduce build times of pg_trgm GIN indexes 3+ messages / 2 participants [nested] [flat]
* Re: Reduce build times of pg_trgm GIN indexes @ 2026-01-09 12:06 David Geier <[email protected]> 2026-01-12 22:10 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: David Geier @ 2026-01-09 12:06 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; pgsql-hackers Hi Heikki! Thanks for looking at the patch set. On 06.01.2026 18:00, Heikki Linnakangas wrote: > On 05/01/2026 17:01, David Geier wrote: >> - v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke() >> instead of FunctionCall2Coll(). This saves a bunch of per-comparison >> setup code, such as calling InitFunctionCallInfoData(). > > You lose the check for NULL result with this. That's probably still > worth checking. It seems like existing code where all args are not null, has that safety check. Added it for consistency. >> - v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch >> ginExtractEntries() deduplicates and sorts the entries returned from the >> extract value function. In case of pg_trgm, that is completely redundant >> because the trigrams are already deduplicated and sorted. The current >> version of this patch is just to demonstrate the potential. We need to >> think about what we want here. Ideally, we would require the extraction >> function to provide the entries deduplicated and sorted. Alternatively, >> we could indicate to ginExtractEntries() if the entries are already >> deduplicated and sorted. If we don't want to alter the signature of the >> extract value function, we could e.g. use the MSB of the nentries >> argument. > > Yeah, this seems wrong as it is. You're assuming that if the extract > function returns nullFlags == NULL, the array is already sorted and > deduped. As said, that was just for demonstration purposes of the possible gains. I've changed the code now such that the extractValue function of the GIN index can indicate via the third argument uniqueAndSorted, if the returned keys are already unique and sorted. Unfortunately, it seems like this patch regresses performance. See measurements below. I haven't had the time to investigate why that is. It's pretty counter intuitive, given that this patch effectively only removes code. Maybe you could re-test patch 0004 and share your runtimes? >> v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a >> full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a >> equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same >> optimization can be done in plenty of other places in our code base. >> Likely, in most of them the gains are insignificant. > > Makes sense. I'm a little disappointed the compiler won't do that > optimization for us.. I thought the same. > > Perhaps we should introduce a new qunique_eq() function with a different > callback signature: > > /* like qunique(), but the callback function returns true/false rather > than int */ > static inline size_t > qunique_eq(void *array, size_t elements, size_t width, > bool (*equal) (const void *, const void *)) > I would prefer to change qunique() instead. That would enforce using an adequate comparison function from the get go. There are only ~15 calls to qunique(). So refactoring this should also be a fairly small patch. I can do that if there's agreement for that approach. >> v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots >> of text is actually ASCII. Hence, we provide a fast path for this case >> which is exercised if the MSB of the current character is unset. > > This uses pg_ascii_tolower() when for ASCII characters when built with > the IGNORECASE. I don't think that's correct, if the proper collation > would do something more complicated for than what pg_ascii_tolower() does. Oh, that's evil. I had tested that specifically. But it only worked because the code in master uses str_tolower() with DEFAULT_COLLATION_OID. So using a different locale like in the following example does something different than when creating a database with the same locale. postgres=# select lower('III' COLLATE "tr_TR"); lower ------- ııı postgres=# select show_trgm('III' COLLATE "tr_TR"); show_trgm ------------------------- {" i"," ii","ii ",iii} (1 row) But when using tr_TR as default locale of the database the following happens: postgres=# select lower('III' COLLATE "tr_TR"); lower ------- ııı postgres=# select show_trgm('III');sü show_trgm --------------------------------------- {0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1} I'm wondering if that's intentional to begin with. Shouldn't the code instead pass PG_GET_COLLATION() to str_tolower()? Might require some research to see how other index types handle locales. Coming back to the original problem: the lengthy comment at the top of pg_locale_libc.c, suggests that in some cases ASCII characters are handled the pg_ascii_tolower() way for the default locale. See for example tolower_libc_mb(). So a character by character conversion using that function will yield a different result than strlower_libc_mb(). I'm wondering why that is. Anyways, we could limit the optimization to only kick in when the used locale follows the same rules as pg_ascii_tolower(). We could test that when creating the locale and store that info in pg_locale_struct. Thoughts? > Did you measure how big is the impact from each individual patch? > Patches 1 and 2 seem pretty much ready to be committed, but I wonder if > they make any difference on their own. Here is the impact of each patch. I ran again CREATE INDEX three times and took the fastest run. The run of each patch includes all previous patches as well. For example, the timings for patch 0003 were measured with a binary that also had patch 0002 and 0001 applied. To get the impact of each patch in isolation, the delta to the previous run was taken. Code | movies |delta | lineitem | delta ------------------------------------|--------|-------|------------------ master | 10,311 | 0 | 256,986 | 0 v1-0001-Inline-ginCompareAttEntries | 9,694 | 617 | 239,778 | 17,208 v1-0002-Optimized-comparison-func | 9,510 | 184 | 238,094 | 1,684 v1-0003-Use-sort_template.h | 8,661 | 849 | 231,190 | 6,904 v1-0004-Avoid-dedup-and-sort-in | 9,305 | -644 | 232,472 | -1,282 v1-0005-Make-btint4cmp-branchless | 8,240 | 1,065 | 228,387 | 4,085 v1-0006-Use-radix-sort | 6,976 | 1,264 | 207,687 | 20,700 v1-0007-Faster-qunique-comparator | 5,911 | 1,065 | 203,744 | 3,943 v1-0008-Add-ASCII-fastpath | 3,409 | 2,502 | 161,469 | 42,275 Attached is v2 of the patch set with the aforementioned changes. I've also fixed the white space errors in 0003, 0004 and 0008, as reported by Kirill. -- David Geier Attachments: [text/x-patch] v2-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch (4.0K, 2-v2-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch) download | inline diff: From 00c75bd58b7982c8bfe62d1260e9366766bc7f34 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Fri, 14 Nov 2025 11:37:40 +0100 Subject: [PATCH v2 8/8] Add ASCII fastpath to generate_trgm_only() --- contrib/pg_trgm/trgm_op.c | 124 ++++++++++++++++++++------------------ 1 file changed, 65 insertions(+), 59 deletions(-) diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c index 39b586f5b9a..d2087b3a45e 100644 --- a/contrib/pg_trgm/trgm_op.c +++ b/contrib/pg_trgm/trgm_op.c @@ -226,32 +226,6 @@ show_limit(PG_FUNCTION_ARGS) PG_RETURN_FLOAT4(similarity_threshold); } -/* - * Finds first word in string, returns pointer to the word, - * endword points to the character after word - */ -static char * -find_word(char *str, int lenstr, char **endword, int *charlen) -{ - char *beginword = str; - - while (beginword - str < lenstr && !ISWORDCHR(beginword)) - beginword += pg_mblen(beginword); - - if (beginword - str >= lenstr) - return NULL; - - *endword = beginword; - *charlen = 0; - while (*endword - str < lenstr && ISWORDCHR(*endword)) - { - *endword += pg_mblen(*endword); - (*charlen)++; - } - - return beginword; -} - /* * Reduce a trigram (three possibly multi-byte characters) to a trgm, * which is always exactly three bytes. If we have three single-byte @@ -337,58 +311,90 @@ make_trigrams(trgm *tptr, char *str, int bytelen, int charlen) static int generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds) { - trgm *tptr; - char *buf; - int charlen, - bytelen; - char *bword, - *eword; + trgm *tptr = trg; + char *buf; if (slen + LPADDING + RPADDING < 3 || slen == 0) return 0; - tptr = trg; - - /* Allocate a buffer for case-folded, blank-padded words */ - buf = (char *) palloc(slen * pg_database_encoding_max_length() + 4); + buf = palloc_array(char, slen * pg_database_encoding_max_length() + 4 + 1); + memset(buf, ' ', LPADDING); - if (LPADDING > 0) + for (int i = 0; i < slen; ) { - *buf = ' '; - if (LPADDING > 1) - *(buf + 1) = ' '; - } + int num_bytes = LPADDING; + int num_chars = LPADDING; + char *word; - eword = str; - while ((bword = find_word(eword, slen - (eword - str), &eword, &charlen)) != NULL) - { + /* Extract next word */ + while (i < slen) + { + if ((str[i] & 0x80) == 0) /* Fast path for ASCII-only */ + { + if (isalnum(str[i])) + { #ifdef IGNORECASE - bword = str_tolower(bword, eword - bword, DEFAULT_COLLATION_OID); - bytelen = strlen(bword); + buf[num_bytes++] = pg_ascii_tolower(str[i++]); #else - bytelen = eword - bword; + buf[num_bytes++] = str[i++]; #endif + } + else + { + i++; + break; + } + } + else + { + const int mblen = pg_mblen(str + i); + Assert(mblen >= 2); /* Otherwise, it would be ASCII */ + + if (ISWORDCHR(str + i)) + { + memcpy(buf + num_bytes, str + i, mblen); + num_bytes += mblen; + i += mblen; + } + else + { + i += mblen; + break; + } + } + + num_chars++; + } - memcpy(buf + LPADDING, bword, bytelen); + if (num_chars > LPADDING) + { + memset(buf + num_bytes, ' ', RPADDING); + num_bytes += RPADDING; + num_chars += RPADDING; + word = buf; #ifdef IGNORECASE - pfree(bword); + if (num_chars != num_bytes) + { + word = str_tolower(buf, num_bytes, DEFAULT_COLLATION_OID); + num_bytes = strlen(word); /* String can get shorter from lower-casing */ + } #endif - buf[LPADDING + bytelen] = ' '; - buf[LPADDING + bytelen + 1] = ' '; + if (bounds) + bounds[tptr - trg] |= TRGM_BOUND_LEFT; + + tptr = make_trigrams(tptr, word, num_bytes, num_chars); + + if (bounds) + bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT; - /* Calculate trigrams marking their bounds if needed */ - if (bounds) - bounds[tptr - trg] |= TRGM_BOUND_LEFT; - tptr = make_trigrams(tptr, buf, bytelen + LPADDING + RPADDING, - charlen + LPADDING + RPADDING); - if (bounds) - bounds[tptr - trg - 1] |= TRGM_BOUND_RIGHT; + if (word != buf) + pfree(word); + } } pfree(buf); - return tptr - trg; } -- 2.51.0 [text/x-patch] v2-0007-Faster-qunique-comparator.patch (1.9K, 3-v2-0007-Faster-qunique-comparator.patch) download | inline diff: From 457a3d0d57a834a80237d628d353408f3e2f4378 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Wed, 12 Nov 2025 14:27:13 +0100 Subject: [PATCH v2 7/8] Faster qunique() comparator --- contrib/pg_trgm/trgm_op.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c index 039c273f6a1..39b586f5b9a 100644 --- a/contrib/pg_trgm/trgm_op.c +++ b/contrib/pg_trgm/trgm_op.c @@ -139,6 +139,14 @@ CMPTRGM_UNSIGNED(const void *a, const void *b) : CMPPCHAR_UNS(a, b, 2)); } +static inline int +CMPTRGM_EQ(const void *a, const void *b) +{ + char *aa = (char *)a; + char *bb = (char *)b; + return aa[0] != bb[0] || aa[1] != bb[1] || aa[2] != bb[2] ? 1 : 0; +} + /* * This gets called on the first call. It replaces the function pointer so * that subsequent calls are routed directly to the chosen implementation. @@ -482,15 +490,11 @@ generate_trgm(char *str, int slen) if (len > 1) { if (GetDefaultCharSignedness()) - { radix_sort_trigrams_signed((trgm *)GETARR(trg), len); - len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); - } else - { trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm)); - len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED); - } + + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ); } SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len)); @@ -1038,15 +1042,11 @@ generate_wildcard_trgm(const char *str, int slen) if (len > 1) { if (GetDefaultCharSignedness()) - { radix_sort_trigrams_signed((trgm *)GETARR(trg), len); - len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); - } else - { trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm)); - len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED); - } + + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ); } SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len)); -- 2.51.0 [text/x-patch] v2-0006-Use-radix-sort.patch (2.9K, 4-v2-0006-Use-radix-sort.patch) download | inline diff: From 447ce313602e768f29c43d4a5359f4c909b8db46 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Tue, 11 Nov 2025 13:18:59 +0100 Subject: [PATCH v2 6/8] Use radix sort --- contrib/pg_trgm/trgm_op.c | 64 +++++++++++++++++++++++++++++++++------ 1 file changed, 54 insertions(+), 10 deletions(-) diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c index 6af120fa1ad..039c273f6a1 100644 --- a/contrib/pg_trgm/trgm_op.c +++ b/contrib/pg_trgm/trgm_op.c @@ -155,14 +155,6 @@ CMPTRGM_CHOOSE(const void *a, const void *b) } /* Define our specialized sort function name */ -#define ST_SORT trigram_qsort_signed -#define ST_ELEMENT_TYPE_VOID -#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b) -#define ST_SCOPE static -#define ST_DEFINE -#define ST_DECLARE -#include "lib/sort_template.h" - #define ST_SORT trigram_qsort_unsigned #define ST_ELEMENT_TYPE_VOID #define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b) @@ -392,6 +384,58 @@ generate_trgm_only(trgm *trg, char *str, int slen, TrgmBound *bounds) return tptr - trg; } +/* + * Needed to properly handle negative numbers in case char is signed. + */ +static inline unsigned char FlipSign(char x) +{ + return x^0x80; +} + +static void radix_sort_trigrams_signed(trgm *trg, int count) +{ + trgm *buffer = palloc_array(trgm, count); + trgm *starts[256]; + trgm *from = trg; + trgm *to = buffer; + int freqs[3][256]; + + /* + * Compute frequencies to partition the buffer. + */ + memset(freqs, 0, sizeof(freqs)); + + for (int i=0; i<count; i++) + for (int j=0; j<3; j++) + freqs[j][FlipSign(trg[i][j])]++; + + /* + * Do the sorting. Start with last character because that's the is "LSB" + * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers. + */ + for (int i=2; i>=0; i--) + { + trgm *old_from = from; + trgm *next = to; + + for (int j=0; j<256; j++) + { + starts[j] = next; + next += freqs[i][j]; + } + + for (int j=0; j<count; j++) + memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm)); + + from = to; + to = old_from; + } + + Assert(to == buffer); + memcpy(trg, buffer, sizeof(trgm) * count); + pfree(buffer); +} + /* * Guard against possible overflow in the palloc requests below. (We * don't worry about the additive constants, since palloc can detect @@ -439,7 +483,7 @@ generate_trgm(char *str, int slen) { if (GetDefaultCharSignedness()) { - trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm)); + radix_sort_trigrams_signed((trgm *)GETARR(trg), len); len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); } else @@ -995,7 +1039,7 @@ generate_wildcard_trgm(const char *str, int slen) { if (GetDefaultCharSignedness()) { - trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm)); + radix_sort_trigrams_signed((trgm *)GETARR(trg), len); len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); } else -- 2.51.0 [text/x-patch] v2-0005-Make-btint4cmp-branchless.patch (1.0K, 5-v2-0005-Make-btint4cmp-branchless.patch) download | inline diff: From c68aa700a245f1eb05f701a227776e24d0c0afe7 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Mon, 10 Nov 2025 15:40:11 +0100 Subject: [PATCH v2 5/8] Make btint4cmp() branchless --- src/backend/access/nbtree/nbtcompare.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c index bffc4b7709c..5ae27c22621 100644 --- a/src/backend/access/nbtree/nbtcompare.c +++ b/src/backend/access/nbtree/nbtcompare.c @@ -60,6 +60,7 @@ #include "utils/fmgrprotos.h" #include "utils/skipsupport.h" #include "utils/sortsupport.h" +#include "common/int.h" #ifdef STRESS_SORT_INT_MIN #define A_LESS_THAN_B INT_MIN @@ -202,12 +203,7 @@ btint4cmp(PG_FUNCTION_ARGS) int32 a = PG_GETARG_INT32(0); int32 b = PG_GETARG_INT32(1); - if (a > b) - PG_RETURN_INT32(A_GREATER_THAN_B); - else if (a == b) - PG_RETURN_INT32(0); - else - PG_RETURN_INT32(A_LESS_THAN_B); + PG_RETURN_INT32(pg_cmp_s32(a, b)); } Datum -- 2.51.0 [text/x-patch] v2-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch (4.9K, 6-v2-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch) download | inline diff: From 0cdc87640cf2a2d8c946aff1669706f99280c555 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Mon, 10 Nov 2025 14:40:37 +0100 Subject: [PATCH v2 4/8] Avoid dedup and sort in ginExtractEntries --- contrib/pg_trgm/trgm_gin.c | 2 ++ doc/src/sgml/gin.sgml | 7 ++++++- src/backend/access/gin/ginutil.c | 32 +++++++++++++++++++------------- 3 files changed, 27 insertions(+), 14 deletions(-) diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c index 66ff6adde99..862e650efec 100644 --- a/contrib/pg_trgm/trgm_gin.c +++ b/contrib/pg_trgm/trgm_gin.c @@ -36,10 +36,12 @@ gin_extract_value_trgm(PG_FUNCTION_ARGS) { text *val = (text *) PG_GETARG_TEXT_PP(0); int32 *nentries = (int32 *) PG_GETARG_POINTER(1); + bool *uniqueAndSorted = (bool *) PG_GETARG_POINTER(3); Datum *entries = NULL; TRGM *trg; int32 trglen; + *uniqueAndSorted = true; *nentries = 0; trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val)); diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml index 82410b1fbdf..b96478731f8 100644 --- a/doc/src/sgml/gin.sgml +++ b/doc/src/sgml/gin.sgml @@ -167,7 +167,7 @@ <variablelist> <varlistentry> <term><function>Datum *extractValue(Datum itemValue, int32 *nkeys, - bool **nullFlags)</function></term> + bool **nullFlags, bool *uniqueAndSorted)</function></term> <listitem> <para> Returns a palloc'd array of keys given an item to be indexed. The @@ -177,6 +177,11 @@ <literal>*nullFlags</literal>, and set these null flags as needed. <literal>*nullFlags</literal> can be left <symbol>NULL</symbol> (its initial value) if all keys are non-null. + If the returned keys do not contain duplicates and are sorted w.r.t. the comparison + function of the GIN type's operator class, store <symbol>true</symbol> in + <literal>uniqueAndSorted</literal>. <literal>uniqueAndSorted</literal> can be left + <symbol>false</symbol> (its initial value) if the keys are either unsorted or contain + duplicates. In that case, duplicate removal and sorting is performed by the GIN index. The return value can be <symbol>NULL</symbol> if the item contains no keys. </para> </listitem> diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index 75a18f457bc..22b588483d0 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -464,6 +464,7 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, { Datum *entries; bool *nullFlags; + bool uniqueAndSorted = false; int32 i; /* @@ -483,11 +484,12 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, /* OK, call the opclass's extractValueFn */ nullFlags = NULL; /* in case extractValue doesn't set it */ entries = (Datum *) - DatumGetPointer(FunctionCall3Coll(&ginstate->extractValueFn[attnum - 1], + DatumGetPointer(FunctionCall4Coll(&ginstate->extractValueFn[attnum - 1], ginstate->supportCollation[attnum - 1], value, PointerGetDatum(nentries), - PointerGetDatum(&nullFlags))); + PointerGetDatum(&nullFlags), + PointerGetDatum(&uniqueAndSorted))); /* * Generate a placeholder if the item contained no keys. @@ -502,13 +504,6 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, return entries; } - /* - * If the extractValueFn didn't create a nullFlags array, create one, - * assuming that everything's non-null. - */ - if (nullFlags == NULL) - nullFlags = (bool *) palloc0(*nentries * sizeof(bool)); - /* * If there's more than one key, sort and unique-ify. * @@ -516,11 +511,18 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, * pretty bad too. For small numbers of keys it'd likely be better to use * a simple insertion sort. */ - if (*nentries > 1) + if (*nentries > 1 && !uniqueAndSorted) { keyEntryData *keydata; cmpEntriesArg arg; + /* + * If the extractValueFn didn't create a nullFlags array, create one, + * assuming that everything's non-null. + */ + if (nullFlags == NULL) + nullFlags = (bool *) palloc0(*nentries * sizeof(bool)); + keydata = palloc_array(keyEntryData, *nentries); for (i = 0; i < *nentries; i++) { @@ -568,9 +570,13 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, /* * Create GinNullCategory representation from nullFlags. */ - *categories = (GinNullCategory *) palloc0(*nentries * sizeof(GinNullCategory)); - for (i = 0; i < *nentries; i++) - (*categories)[i] = (nullFlags[i] ? GIN_CAT_NULL_KEY : GIN_CAT_NORM_KEY); + StaticAssertStmt(GIN_CAT_NORM_KEY == 0, "Assuming GIN_CAT_NORM_KEY is 0"); + *categories = palloc0_array(GinNullCategory, *nentries); + + if (nullFlags != NULL) + for (i = 0; i < *nentries; i++) + if (nullFlags[i]) + (*categories)[i] = GIN_CAT_NULL_KEY; return entries; } -- 2.51.0 [text/x-patch] v2-0003-Use-sort_template.h.patch (2.6K, 7-v2-0003-Use-sort_template.h.patch) download | inline diff: From c38f3517d05d3cefa0fc6d994f4e8f6ac14273d9 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Mon, 10 Nov 2025 13:35:11 +0100 Subject: [PATCH v2 3/8] Use sort_template.h --- contrib/pg_trgm/trgm_op.c | 47 ++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 10 deletions(-) diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c index 81182a15e07..6af120fa1ad 100644 --- a/contrib/pg_trgm/trgm_op.c +++ b/contrib/pg_trgm/trgm_op.c @@ -154,6 +154,23 @@ CMPTRGM_CHOOSE(const void *a, const void *b) return CMPTRGM(a, b); } +/* Define our specialized sort function name */ +#define ST_SORT trigram_qsort_signed +#define ST_ELEMENT_TYPE_VOID +#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b) +#define ST_SCOPE static +#define ST_DEFINE +#define ST_DECLARE +#include "lib/sort_template.h" + +#define ST_SORT trigram_qsort_unsigned +#define ST_ELEMENT_TYPE_VOID +#define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b) +#define ST_SCOPE static +#define ST_DEFINE +#define ST_DECLARE +#include "lib/sort_template.h" + /* * Deprecated function. * Use "pg_trgm.similarity_threshold" GUC variable instead of this function. @@ -209,12 +226,6 @@ show_limit(PG_FUNCTION_ARGS) PG_RETURN_FLOAT4(similarity_threshold); } -static int -comp_trgm(const void *a, const void *b) -{ - return CMPTRGM(a, b); -} - /* * Finds first word in string, returns pointer to the word, * endword points to the character after word @@ -426,8 +437,16 @@ generate_trgm(char *str, int slen) */ if (len > 1) { - qsort(GETARR(trg), len, sizeof(trgm), comp_trgm); - len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm); + if (GetDefaultCharSignedness()) + { + trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm)); + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); + } + else + { + trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm)); + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED); + } } SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len)); @@ -974,8 +993,16 @@ generate_wildcard_trgm(const char *str, int slen) */ if (len > 1) { - qsort(GETARR(trg), len, sizeof(trgm), comp_trgm); - len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm); + if (GetDefaultCharSignedness()) + { + trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm)); + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED); + } + else + { + trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm)); + len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED); + } } SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len)); -- 2.51.0 [text/x-patch] v2-0002-Optimized-comparison-functions.patch (4.1K, 8-v2-0002-Optimized-comparison-functions.patch) download | inline diff: From ae80c2dc190493f8c9811c1949c7ebced265abd5 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Mon, 10 Nov 2025 13:35:01 +0100 Subject: [PATCH v2 2/8] Optimized comparison functions --- src/backend/access/gin/ginutil.c | 23 ++++++++++++++++------- src/include/access/gin_private.h | 19 +++++++++++++++---- 2 files changed, 31 insertions(+), 11 deletions(-) diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index d205093e21d..75a18f457bc 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -114,6 +114,7 @@ initGinState(GinState *state, Relation index) for (i = 0; i < origTupdesc->natts; i++) { Form_pg_attribute attr = TupleDescAttr(origTupdesc, i); + FunctionCallInfoBaseData *fci = &state->compareFnCallInfo[i].fcinfo; if (state->oneCol) state->tupdesc[i] = state->origTupdesc; @@ -222,6 +223,10 @@ initGinState(GinState *state, Relation index) state->supportCollation[i] = index->rd_indcollation[i]; else state->supportCollation[i] = DEFAULT_COLLATION_OID; + + InitFunctionCallInfoData(*fci, &state->compareFn[i], 2, state->supportCollation[i], NULL, NULL); + fci->args[0].isnull = false; + fci->args[1].isnull = false; } } @@ -402,8 +407,7 @@ typedef struct typedef struct { - FmgrInfo *cmpDatumFunc; - Oid collation; + FunctionCallInfoBaseData *cmpFuncInfo; bool haveDups; } cmpEntriesArg; @@ -425,9 +429,15 @@ cmpEntries(const void *a, const void *b, void *arg) else if (bb->isnull) res = -1; /* not-NULL "<" NULL */ else - res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc, - data->collation, - aa->datum, bb->datum)); + { + FunctionCallInfo fci = data->cmpFuncInfo; + fci->args[0].value = aa->datum; + fci->args[1].value = bb->datum; + res = DatumGetInt32(FunctionCallInvoke(fci)); + + if (fci->isnull) + elog(ERROR, "function %u returned NULL", fci->flinfo->fn_oid); + } /* * Detect if we have any duplicates. If there are equal keys, qsort must @@ -518,8 +528,7 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum, keydata[i].isnull = nullFlags[i]; } - arg.cmpDatumFunc = &ginstate->compareFn[attnum - 1]; - arg.collation = ginstate->supportCollation[attnum - 1]; + arg.cmpFuncInfo = &ginstate->compareFnCallInfo[attnum - 1].fcinfo; arg.haveDups = false; qsort_arg(keydata, *nentries, sizeof(keyEntryData), cmpEntries, &arg); diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h index e155045ce8a..7cf19c8a5dc 100644 --- a/src/include/access/gin_private.h +++ b/src/include/access/gin_private.h @@ -51,6 +51,11 @@ typedef struct GinOptions #define GIN_SHARE BUFFER_LOCK_SHARE #define GIN_EXCLUSIVE BUFFER_LOCK_EXCLUSIVE +typedef union CompareFuncCallInfoData +{ + FunctionCallInfoBaseData fcinfo; + char fcinfo_data[SizeForFunctionCallInfo(2)]; +} CompareFuncCallInfoData; /* * GinState: working data structure describing the index being worked on @@ -77,6 +82,10 @@ typedef struct GinState /* * Per-index-column opclass support functions */ + + + CompareFuncCallInfoData compareFnCallInfo[INDEX_MAX_KEYS]; + FmgrInfo compareFn[INDEX_MAX_KEYS]; FmgrInfo extractValueFn[INDEX_MAX_KEYS]; FmgrInfo extractQueryFn[INDEX_MAX_KEYS]; @@ -504,6 +513,8 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum, Datum a, GinNullCategory categorya, Datum b, GinNullCategory categoryb) { + FunctionCallInfoBaseData *fci; + /* if not of same null category, sort by that first */ if (categorya != categoryb) return (categorya < categoryb) ? -1 : 1; @@ -512,10 +523,10 @@ ginCompareEntries(GinState *ginstate, OffsetNumber attnum, if (categorya != GIN_CAT_NORM_KEY) return 0; - /* both not null, so safe to call the compareFn */ - return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1], - ginstate->supportCollation[attnum - 1], - a, b)); + fci = &ginstate->compareFnCallInfo[attnum - 1].fcinfo; + fci->args[0].value = a; + fci->args[1].value = b; + return DatumGetInt32(FunctionCallInvoke(fci)); } /* -- 2.51.0 [text/x-patch] v2-0001-Inline-ginCompareAttEntries.patch (4.0K, 9-v2-0001-Inline-ginCompareAttEntries.patch) download | inline diff: From a484e7c69bec62474b041eb4e533601e4883dab3 Mon Sep 17 00:00:00 2001 From: David Geier <[email protected]> Date: Thu, 6 Nov 2025 09:42:27 +0100 Subject: [PATCH v2 1/8] Inline ginCompareAttEntries --- src/backend/access/gin/ginutil.c | 38 --------------------------- src/include/access/gin_private.h | 44 +++++++++++++++++++++++++++----- 2 files changed, 38 insertions(+), 44 deletions(-) diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c index a546cac18d3..d205093e21d 100644 --- a/src/backend/access/gin/ginutil.c +++ b/src/backend/access/gin/ginutil.c @@ -387,44 +387,6 @@ GinInitMetabuffer(Buffer b) ((char *) metadata + sizeof(GinMetaPageData)) - (char *) page; } -/* - * Compare two keys of the same index column - */ -int -ginCompareEntries(GinState *ginstate, OffsetNumber attnum, - Datum a, GinNullCategory categorya, - Datum b, GinNullCategory categoryb) -{ - /* if not of same null category, sort by that first */ - if (categorya != categoryb) - return (categorya < categoryb) ? -1 : 1; - - /* all null items in same category are equal */ - if (categorya != GIN_CAT_NORM_KEY) - return 0; - - /* both not null, so safe to call the compareFn */ - return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1], - ginstate->supportCollation[attnum - 1], - a, b)); -} - -/* - * Compare two keys of possibly different index columns - */ -int -ginCompareAttEntries(GinState *ginstate, - OffsetNumber attnuma, Datum a, GinNullCategory categorya, - OffsetNumber attnumb, Datum b, GinNullCategory categoryb) -{ - /* attribute number is the first sort key */ - if (attnuma != attnumb) - return (attnuma < attnumb) ? -1 : 1; - - return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb); -} - - /* * Support for sorting key datums in ginExtractEntries * diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h index b33f7cec5b4..e155045ce8a 100644 --- a/src/include/access/gin_private.h +++ b/src/include/access/gin_private.h @@ -97,12 +97,6 @@ extern Buffer GinNewBuffer(Relation index); extern void GinInitBuffer(Buffer b, uint32 f); extern void GinInitPage(Page page, uint32 f, Size pageSize); extern void GinInitMetabuffer(Buffer b); -extern int ginCompareEntries(GinState *ginstate, OffsetNumber attnum, - Datum a, GinNullCategory categorya, - Datum b, GinNullCategory categoryb); -extern int ginCompareAttEntries(GinState *ginstate, - OffsetNumber attnuma, Datum a, GinNullCategory categorya, - OffsetNumber attnumb, Datum b, GinNullCategory categoryb); extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum, Datum value, bool isNull, int32 *nentries, GinNullCategory **categories); @@ -502,6 +496,44 @@ ginCompareItemPointers(ItemPointer a, ItemPointer b) return pg_cmp_u64(ia, ib); } +/* + * Compare two keys of the same index column + */ +static inline int +ginCompareEntries(GinState *ginstate, OffsetNumber attnum, + Datum a, GinNullCategory categorya, + Datum b, GinNullCategory categoryb) +{ + /* if not of same null category, sort by that first */ + if (categorya != categoryb) + return (categorya < categoryb) ? -1 : 1; + + /* all null items in same category are equal */ + if (categorya != GIN_CAT_NORM_KEY) + return 0; + + /* both not null, so safe to call the compareFn */ + return DatumGetInt32(FunctionCall2Coll(&ginstate->compareFn[attnum - 1], + ginstate->supportCollation[attnum - 1], + a, b)); +} + +/* + * Compare two keys of possibly different index columns + */ +static inline int +ginCompareAttEntries(GinState *ginstate, + OffsetNumber attnuma, Datum a, GinNullCategory categorya, + OffsetNumber attnumb, Datum b, GinNullCategory categoryb) +{ + /* attribute number is the first sort key */ + if (attnuma != attnumb) + return (attnuma < attnumb) ? -1 : 1; + + return ginCompareEntries(ginstate, attnuma, a, categorya, b, categoryb); + +} + extern int ginTraverseLock(Buffer buffer, bool searchMode); #endif /* GIN_PRIVATE_H */ -- 2.51.0 ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes 2026-01-09 12:06 Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]> @ 2026-01-12 22:10 ` Heikki Linnakangas <[email protected]> 2026-01-21 15:45 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Heikki Linnakangas @ 2026-01-12 22:10 UTC (permalink / raw) To: David Geier <[email protected]>; pgsql-hackers On 09/01/2026 14:06, David Geier wrote: > On 06.01.2026 18:00, Heikki Linnakangas wrote: >> On 05/01/2026 17:01, David Geier wrote: >>> v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots >>> of text is actually ASCII. Hence, we provide a fast path for this case >>> which is exercised if the MSB of the current character is unset. >> >> This uses pg_ascii_tolower() when for ASCII characters when built with >> the IGNORECASE. I don't think that's correct, if the proper collation >> would do something more complicated for than what pg_ascii_tolower() does. > > Oh, that's evil. I had tested that specifically. But it only worked > because the code in master uses str_tolower() with > DEFAULT_COLLATION_OID. So using a different locale like in the following > example does something different than when creating a database with the > same locale. > > postgres=# select lower('III' COLLATE "tr_TR"); > lower > ------- > ııı > > postgres=# select show_trgm('III' COLLATE "tr_TR"); > show_trgm > ------------------------- > {" i"," ii","ii ",iii} > (1 row) > > But when using tr_TR as default locale of the database the following > happens: > > postgres=# select lower('III' COLLATE "tr_TR"); > lower > ------- > ııı > > postgres=# select show_trgm('III');sü > show_trgm > --------------------------------------- > {0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1} > > I'm wondering if that's intentional to begin with. Shouldn't the code > instead pass PG_GET_COLLATION() to str_tolower()? Might require some > research to see how other index types handle locales. > > Coming back to the original problem: the lengthy comment at the top of > pg_locale_libc.c, suggests that in some cases ASCII characters are > handled the pg_ascii_tolower() way for the default locale. See for > example tolower_libc_mb(). So a character by character conversion using > that function will yield a different result than strlower_libc_mb(). I'm > wondering why that is. Hmm, yeah, that feels funny. The trigram code predates per-column collation support, so I guess we never really thought through how it should interact with COLLATE clauses. > Anyways, we could limit the optimization to only kick in when the used > locale follows the same rules as pg_ascii_tolower(). We could test that > when creating the locale and store that info in pg_locale_struct. I think that's only possible for libc locales, which operate one character at a time. In ICU locales, lower-casing a character can depend on the surrounding characters, so you cannot just test the conversion of every ascii character individually. It would make sense for libc locales though, and I hope the ICU functions are a little faster anyway. Although, we probably should be using case-folding rather than lower-casing with ICU locales anyway. Case-folding is designed for string matching. It'd be a backwards-compatibility breaking change, though. - Heikki ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes 2026-01-09 12:06 Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]> 2026-01-12 22:10 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]> @ 2026-01-21 15:45 ` David Geier <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: David Geier @ 2026-01-21 15:45 UTC (permalink / raw) To: Heikki Linnakangas <[email protected]>; pgsql-hackers >> Oh, that's evil. I had tested that specifically. But it only worked >> because the code in master uses str_tolower() with >> DEFAULT_COLLATION_OID. So using a different locale like in the following >> example does something different than when creating a database with the >> same locale. >> >> postgres=# select lower('III' COLLATE "tr_TR"); >> lower >> ------- >> ııı >> >> postgres=# select show_trgm('III' COLLATE "tr_TR"); >> show_trgm >> ------------------------- >> {" i"," ii","ii ",iii} >> (1 row) >> >> But when using tr_TR as default locale of the database the following >> happens: >> >> postgres=# select lower('III' COLLATE "tr_TR"); >> lower >> ------- >> ııı >> >> postgres=# select show_trgm('III');sü >> show_trgm >> --------------------------------------- >> {0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1} >> >> I'm wondering if that's intentional to begin with. Shouldn't the code >> instead pass PG_GET_COLLATION() to str_tolower()? Might require some >> research to see how other index types handle locales. >> >> Coming back to the original problem: the lengthy comment at the top of >> pg_locale_libc.c, suggests that in some cases ASCII characters are >> handled the pg_ascii_tolower() way for the default locale. See for >> example tolower_libc_mb(). So a character by character conversion using >> that function will yield a different result than strlower_libc_mb(). I'm >> wondering why that is. > > Hmm, yeah, that feels funny. The trigram code predates per-column > collation support, so I guess we never really thought through how it > should interact with COLLATE clauses. I've written a patch to fix that. See [1]. >> Anyways, we could limit the optimization to only kick in when the used >> locale follows the same rules as pg_ascii_tolower(). We could test that >> when creating the locale and store that info in pg_locale_struct. > > I think that's only possible for libc locales, which operate one > character at a time. In ICU locales, lower-casing a character can depend > on the surrounding characters, so you cannot just test the conversion of > every ascii character individually. It would make sense for libc locales > though, and I hope the ICU functions are a little faster anyway. > > Although, we probably should be using case-folding rather than lower- > casing with ICU locales anyway. Case-folding is designed for string > matching. It'd be a backwards-compatibility breaking change, though. Oh, I wasn't ware of that. Doing it only for libc locales seems still useful. Good point with the casefolding. I'll look into that. How do we usually go about such backwards-compatibility breaking changes? Could we have pg_upgrade reindex all GIN indexes? Would that be acceptable? [1] https://www.postgresql.org/message-id/flat/db087c3e-230e-4119-8a03-8b5d74956bc2%40gmail.com -- David Geier ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-01-21 15:45 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-01-09 12:06 Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]> 2026-01-12 22:10 ` Heikki Linnakangas <[email protected]> 2026-01-21 15:45 ` David Geier <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox