public inbox for [email protected]
help / color / mirror / Atom feedRe: Reduce build times of pg_trgm GIN indexes
31+ messages / 8 participants
[nested] [flat]
* Re: Reduce build times of pg_trgm GIN indexes
@ 2026-01-21 20:50 Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Matthias van de Meent @ 2026-01-21 20:50 UTC (permalink / raw)
To: David Geier <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; pgsql-hackers
On Wed, 21 Jan 2026 at 16:45, David Geier <[email protected]> wrote:
>
> How do we usually go about such backwards-compatibility breaking
> changes?
When it concerns a bug, we mention the change in the release notes
with a warning to reindex affected indexes to be sure no known
corruption remains. See e.g. the final entry in the PG18 release
notes' migration section here:
https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MIGRATION.
> Could we have pg_upgrade reindex all GIN indexes? Would that be
> acceptable?
No. We'd handle this like any other collation/opclass fixes; we ask
users to reindex their indexes in their own time after they've
upgraded their cluster. Note that in this case it concerns an issue
with just one GIN opclass, not all GIN indexes; so even if we were to
address this in pg_upgrade it wouldn't be a correct choice to reindex
every GIN index, as only a subset of those would be affected by this
issue.
Generally speaking, pg_upgrade doesn't concern itself with the
validity of the data structures that are described by the catalogs
that it upgrades, it only concerns itself with that it correctly
transcribes the catalogs from one version to another, and that the
data files of the old cluster are transfered correctly without
changes.
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
@ 2026-01-23 10:18 ` David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-01-23 10:18 UTC (permalink / raw)
To: Matthias van de Meent <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; pgsql-hackers
Hi Matthias,
On 21.01.2026 21:50, Matthias van de Meent wrote:
> On Wed, 21 Jan 2026 at 16:45, David Geier <[email protected]> wrote:
>>
>> How do we usually go about such backwards-compatibility breaking
>> changes?
>
> When it concerns a bug, we mention the change in the release notes
> with a warning to reindex affected indexes to be sure no known
> corruption remains. See e.g. the final entry in the PG18 release
> notes' migration section here:
> https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MIGRATION.
>
>> Could we have pg_upgrade reindex all GIN indexes? Would that be
>> acceptable?
>
> No. We'd handle this like any other collation/opclass fixes; we ask
> users to reindex their indexes in their own time after they've
> upgraded their cluster. Note that in this case it concerns an issue
> with just one GIN opclass, not all GIN indexes; so even if we were to
> address this in pg_upgrade it wouldn't be a correct choice to reindex
> every GIN index, as only a subset of those would be affected by this
> issue.
>
> Generally speaking, pg_upgrade doesn't concern itself with the
> validity of the data structures that are described by the catalogs
> that it upgrades, it only concerns itself with that it correctly
> transcribes the catalogs from one version to another, and that the
> data files of the old cluster are transfered correctly without
> changes.
Thanks for the clarifications and the link to the release notes. That's
very helpful. Then I know how to move on and will update the patch
accordingly.
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-03-02 12:17 ` David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-03-02 12:17 UTC (permalink / raw)
To: Matthias van de Meent <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; pgsql-hackers
On 23.01.2026 11:18, David Geier wrote:
> Hi Matthias,
>
> On 21.01.2026 21:50, Matthias van de Meent wrote:
>> On Wed, 21 Jan 2026 at 16:45, David Geier <[email protected]> wrote:
>>>
>>> How do we usually go about such backwards-compatibility breaking
>>> changes?
>>
>> When it concerns a bug, we mention the change in the release notes
>> with a warning to reindex affected indexes to be sure no known
>> corruption remains. See e.g. the final entry in the PG18 release
>> notes' migration section here:
>> https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MIGRATION.
>>
>>> Could we have pg_upgrade reindex all GIN indexes? Would that be
>>> acceptable?
>>
>> No. We'd handle this like any other collation/opclass fixes; we ask
>> users to reindex their indexes in their own time after they've
>> upgraded their cluster. Note that in this case it concerns an issue
>> with just one GIN opclass, not all GIN indexes; so even if we were to
>> address this in pg_upgrade it wouldn't be a correct choice to reindex
>> every GIN index, as only a subset of those would be affected by this
>> issue.
>>
>> Generally speaking, pg_upgrade doesn't concern itself with the
>> validity of the data structures that are described by the catalogs
>> that it upgrades, it only concerns itself with that it correctly
>> transcribes the catalogs from one version to another, and that the
>> data files of the old cluster are transfered correctly without
>> changes.
>
> Thanks for the clarifications and the link to the release notes. That's
> very helpful. Then I know how to move on and will update the patch
> accordingly.
Attached are the patches rebased on latest master.
I've removed the ASCII fast-path patch 0006 as it turned out to be more
complicated to make work than expected.
I kept the radix sort patch because it gives a decent speedup but I
would like to focus for now on getting patches 0001 - 0004 merged.
They're all simple and, the way I see it, uncontroversial.
I remeasured the savings of 0001 - 0004, which comes on top of the
already committed patch that inlined the comparison function, which gave
another ~5%:
Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 8,058 | 10,311 | 1.27x
lineitem(l_comment) | 223,233 | 256,986 | 1.19x
I've also registered the change at the commit fest, see
https://commitfest.postgresql.org/patch/6418/.
--
David Geier
Attachments:
[text/x-patch] v4-0005-Optimize-generate_trgm-with-radix-sort.patch (2.9K, 2-v4-0005-Optimize-generate_trgm-with-radix-sort.patch)
download | inline diff:
From cc377266ea8071bb000f0c05d0aca0ab2012cdfe Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Tue, 11 Nov 2025 13:18:59 +0100
Subject: [PATCH v4 5/5] Optimize generate_trgm() with radix sort
---
contrib/pg_trgm/trgm_op.c | 64 +++++++++++++++++++++++++++++++++------
1 file changed, 54 insertions(+), 10 deletions(-)
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 07daf111729..1603a55022b 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -235,14 +235,6 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
}
/* Define our specialized sort function name */
-#define ST_SORT trigram_qsort_signed
-#define ST_ELEMENT_TYPE_VOID
-#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
-#define ST_SCOPE static
-#define ST_DEFINE
-#define ST_DECLARE
-#include "lib/sort_template.h"
-
#define ST_SORT trigram_qsort_unsigned
#define ST_ELEMENT_TYPE_VOID
#define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
@@ -564,6 +556,58 @@ generate_trgm_only(growable_trgm_array *dst, char *str, int slen, TrgmBound **bo
pfree(buf);
}
+/*
+ * Needed to properly handle negative numbers in case char is signed.
+ */
+static inline unsigned char FlipSign(char x)
+{
+ return x^0x80;
+}
+
+static void radix_sort_trigrams_signed(trgm *trg, int count)
+{
+ trgm *buffer = palloc_array(trgm, count);
+ trgm *starts[256];
+ trgm *from = trg;
+ trgm *to = buffer;
+ int freqs[3][256];
+
+ /*
+ * Compute frequencies to partition the buffer.
+ */
+ memset(freqs, 0, sizeof(freqs));
+
+ for (int i=0; i<count; i++)
+ for (int j=0; j<3; j++)
+ freqs[j][FlipSign(trg[i][j])]++;
+
+ /*
+ * Do the sorting. Start with last character because that's the is "LSB"
+ * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers.
+ */
+ for (int i=2; i>=0; i--)
+ {
+ trgm *old_from = from;
+ trgm *next = to;
+
+ for (int j=0; j<256; j++)
+ {
+ starts[j] = next;
+ next += freqs[i][j];
+ }
+
+ for (int j=0; j<count; j++)
+ memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm));
+
+ from = to;
+ to = old_from;
+ }
+
+ Assert(to == buffer);
+ memcpy(trg, buffer, sizeof(trgm) * count);
+ pfree(buffer);
+}
+
/*
* Make array of trigrams with sorting and removing duplicate items.
*
@@ -589,7 +633,7 @@ generate_trgm(char *str, int slen)
if (len > 1)
{
if (GetDefaultCharSignedness())
- trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+ radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
else
trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
@@ -1124,7 +1168,7 @@ generate_wildcard_trgm(const char *str, int slen)
if (len > 1)
{
if (GetDefaultCharSignedness())
- trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+ radix_sort_trigrams_signed((trgm *)GETARR(trg), len);
else
trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
--
2.51.0
[text/x-patch] v4-0004-Faster-qunique-comparator-in-generate_trgm.patch (2.0K, 3-v4-0004-Faster-qunique-comparator-in-generate_trgm.patch)
download | inline diff:
From 6d3a0f61928958b51989e10d5f85db2deef210cf Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Wed, 12 Nov 2025 14:27:13 +0100
Subject: [PATCH v4 4/5] Faster qunique() comparator in generate_trgm()
---
contrib/pg_trgm/trgm_op.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 85df5ef2310..07daf111729 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -211,6 +211,14 @@ CMPTRGM_UNSIGNED(const void *a, const void *b)
: CMPPCHAR_UNS(a, b, 2));
}
+static inline int
+CMPTRGM_EQ(const void *a, const void *b)
+{
+ char *aa = (char *)a;
+ char *bb = (char *)b;
+ return aa[0] != bb[0] || aa[1] != bb[1] || aa[2] != bb[2] ? 1 : 0;
+}
+
/*
* This gets called on the first call. It replaces the function pointer so
* that subsequent calls are routed directly to the chosen implementation.
@@ -581,15 +589,11 @@ generate_trgm(char *str, int slen)
if (len > 1)
{
if (GetDefaultCharSignedness())
- {
trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
- len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
- }
else
- {
trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
- len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
- }
+
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
}
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
@@ -1120,15 +1124,11 @@ generate_wildcard_trgm(const char *str, int slen)
if (len > 1)
{
if (GetDefaultCharSignedness())
- {
trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
- len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
- }
else
- {
trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
- len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
- }
+
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_EQ);
}
trg->flag = ARRKEY;
--
2.51.0
[text/x-patch] v4-0003-Make-btint4cmp-branchless.patch (1.0K, 4-v4-0003-Make-btint4cmp-branchless.patch)
download | inline diff:
From 49ea44e1bc26d3166239b4efd257c5f8c4f136fd Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Mon, 10 Nov 2025 15:40:11 +0100
Subject: [PATCH v4 3/5] Make btint4cmp() branchless
---
src/backend/access/nbtree/nbtcompare.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index 1d343377e98..ac16e3d993d 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -61,6 +61,7 @@
#include "utils/fmgrprotos.h"
#include "utils/skipsupport.h"
#include "utils/sortsupport.h"
+#include "common/int.h"
#ifdef STRESS_SORT_INT_MIN
#define A_LESS_THAN_B INT_MIN
@@ -203,12 +204,7 @@ btint4cmp(PG_FUNCTION_ARGS)
int32 a = PG_GETARG_INT32(0);
int32 b = PG_GETARG_INT32(1);
- if (a > b)
- PG_RETURN_INT32(A_GREATER_THAN_B);
- else if (a == b)
- PG_RETURN_INT32(0);
- else
- PG_RETURN_INT32(A_LESS_THAN_B);
+ PG_RETURN_INT32(pg_cmp_s32(a, b));
}
Datum
--
2.51.0
[text/x-patch] v4-0002-Optimize-generate_trgm-with-sort_template.h.patch (2.6K, 5-v4-0002-Optimize-generate_trgm-with-sort_template.h.patch)
download | inline diff:
From ba64b2fed648f9b3fef53e244d2665ddf1ac9ea3 Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Mon, 10 Nov 2025 13:35:11 +0100
Subject: [PATCH v4 2/5] Optimize generate_trgm() with sort_template.h
---
contrib/pg_trgm/trgm_op.c | 47 ++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 10 deletions(-)
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index ee89e548d16..85df5ef2310 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -226,6 +226,23 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
return CMPTRGM(a, b);
}
+/* Define our specialized sort function name */
+#define ST_SORT trigram_qsort_signed
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
+#define ST_SORT trigram_qsort_unsigned
+#define ST_ELEMENT_TYPE_VOID
+#define ST_COMPARE(a, b) CMPTRGM_UNSIGNED(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
+
/*
* Deprecated function.
* Use "pg_trgm.similarity_threshold" GUC variable instead of this function.
@@ -281,12 +298,6 @@ show_limit(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT4(similarity_threshold);
}
-static int
-comp_trgm(const void *a, const void *b)
-{
- return CMPTRGM(a, b);
-}
-
/*
* Finds first word in string, returns pointer to the word,
* endword points to the character after word
@@ -569,8 +580,16 @@ generate_trgm(char *str, int slen)
*/
if (len > 1)
{
- qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
- len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+ if (GetDefaultCharSignedness())
+ {
+ trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+ }
+ else
+ {
+ trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+ }
}
SET_VARSIZE(trg, CALCGTSIZE(ARRKEY, len));
@@ -1100,8 +1119,16 @@ generate_wildcard_trgm(const char *str, int slen)
len = arr.length;
if (len > 1)
{
- qsort(GETARR(trg), len, sizeof(trgm), comp_trgm);
- len = qunique(GETARR(trg), len, sizeof(trgm), comp_trgm);
+ if (GetDefaultCharSignedness())
+ {
+ trigram_qsort_signed((void *) GETARR(trg), len, sizeof(trgm));
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_SIGNED);
+ }
+ else
+ {
+ trigram_qsort_unsigned((void *) GETARR(trg), len, sizeof(trgm));
+ len = qunique(GETARR(trg), len, sizeof(trgm), CMPTRGM_UNSIGNED);
+ }
}
trg->flag = ARRKEY;
--
2.51.0
[text/x-patch] v4-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch (7.8K, 6-v4-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch)
download | inline diff:
From 983e5543c0802201d8c2c43dbde0a19ae3d5ed1b Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Wed, 14 Jan 2026 10:54:32 +0100
Subject: [PATCH v4 1/5] Optimize sort and deduplication in ginExtractEntries
---
src/backend/access/gin/ginutil.c | 152 +++++++++++++------------------
src/include/access/gin_private.h | 2 +-
2 files changed, 66 insertions(+), 88 deletions(-)
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index ff927279cc3..15f4a686795 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -28,6 +28,7 @@
#include "utils/index_selfuncs.h"
#include "utils/rel.h"
#include "utils/typcache.h"
+#include "lib/qunique.h"
/*
@@ -387,19 +388,6 @@ GinInitMetabuffer(Buffer b)
((char *) metadata + sizeof(GinMetaPageData)) - (char *) page;
}
-/*
- * Support for sorting key datums in ginExtractEntries
- *
- * Note: we only have to worry about null and not-null keys here;
- * ginExtractEntries never generates more than one placeholder null,
- * so it doesn't have to sort those.
- */
-typedef struct
-{
- Datum datum;
- bool isnull;
-} keyEntryData;
-
typedef struct
{
FmgrInfo *cmpDatumFunc;
@@ -410,24 +398,14 @@ typedef struct
static int
cmpEntries(const void *a, const void *b, void *arg)
{
- const keyEntryData *aa = (const keyEntryData *) a;
- const keyEntryData *bb = (const keyEntryData *) b;
+ const Datum *aa = (const Datum *) a;
+ const Datum *bb = (const Datum *) b;
cmpEntriesArg *data = (cmpEntriesArg *) arg;
int res;
- if (aa->isnull)
- {
- if (bb->isnull)
- res = 0; /* NULL "=" NULL */
- else
- res = 1; /* NULL ">" not-NULL */
- }
- else if (bb->isnull)
- res = -1; /* not-NULL "<" NULL */
- else
- res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
- data->collation,
- aa->datum, bb->datum));
+ res = DatumGetInt32(FunctionCall2Coll(data->cmpDatumFunc,
+ data->collation,
+ *aa, *bb));
/*
* Detect if we have any duplicates. If there are equal keys, qsort must
@@ -440,6 +418,14 @@ cmpEntries(const void *a, const void *b, void *arg)
return res;
}
+#define ST_SORT qsort_arg_entries
+#define ST_ELEMENT_TYPE Datum
+#define ST_COMPARE_ARG_TYPE cmpEntriesArg
+#define ST_COMPARE(a, b, arg) cmpEntries(a, b, arg)
+#define ST_SCOPE static
+#define ST_DEFINE
+#define ST_DECLARE
+#include "lib/sort_template.h"
/*
* Extract the index key values from an indexable item
@@ -450,11 +436,13 @@ cmpEntries(const void *a, const void *b, void *arg)
Datum *
ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
Datum value, bool isNull,
- int32 *nentries, GinNullCategory **categories)
+ int32 *nentries_p, GinNullCategory **categories_p)
{
Datum *entries;
bool *nullFlags;
- int32 i;
+ GinNullCategory *categories;
+ bool hasNull;
+ int32 nentries;
/*
* We don't call the extractValueFn on a null item. Instead generate a
@@ -462,42 +450,60 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
*/
if (isNull)
{
- *nentries = 1;
+ *nentries_p = 1;
entries = palloc_object(Datum);
entries[0] = (Datum) 0;
- *categories = palloc_object(GinNullCategory);
- (*categories)[0] = GIN_CAT_NULL_ITEM;
+ *categories_p = palloc_object(GinNullCategory);
+ (*categories_p)[0] = GIN_CAT_NULL_ITEM;
return entries;
}
/* OK, call the opclass's extractValueFn */
nullFlags = NULL; /* in case extractValue doesn't set it */
+ nentries = 0;
entries = (Datum *)
DatumGetPointer(FunctionCall3Coll(&ginstate->extractValueFn[attnum - 1],
ginstate->supportCollation[attnum - 1],
value,
- PointerGetDatum(nentries),
+ PointerGetDatum(&nentries),
PointerGetDatum(&nullFlags)));
/*
* Generate a placeholder if the item contained no keys.
*/
- if (entries == NULL || *nentries <= 0)
+ if (entries == NULL || nentries <= 0)
{
- *nentries = 1;
+ *nentries_p = 1;
entries = palloc_object(Datum);
entries[0] = (Datum) 0;
- *categories = palloc_object(GinNullCategory);
- (*categories)[0] = GIN_CAT_EMPTY_ITEM;
+ *categories_p = palloc_object(GinNullCategory);
+ (*categories_p)[0] = GIN_CAT_EMPTY_ITEM;
return entries;
}
/*
- * If the extractValueFn didn't create a nullFlags array, create one,
- * assuming that everything's non-null.
+ * Scan the items for any NULLs. All NULLs are considered equal, so we
+ * just need to check and remember if there are any. We remove them from
+ * the array here, and if necessary, put back one NULL entry to represent
+ * them all after deduplication.
*/
- if (nullFlags == NULL)
- nullFlags = (bool *) palloc0(*nentries * sizeof(bool));
+ hasNull = false;
+ if (nullFlags)
+ {
+ int32 numNonNulls = 0;
+
+ for (int32 i = 0; i < nentries; i++)
+ {
+ if (nullFlags[i])
+ hasNull = true;
+ else
+ {
+ entries[numNonNulls] = entries[i];
+ numNonNulls++;
+ }
+ }
+ nentries = numNonNulls;
+ }
/*
* If there's more than one key, sort and unique-ify.
@@ -506,63 +512,35 @@ ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
* pretty bad too. For small numbers of keys it'd likely be better to use
* a simple insertion sort.
*/
- if (*nentries > 1)
+ if (nentries > 1)
{
- keyEntryData *keydata;
cmpEntriesArg arg;
-
- keydata = palloc_array(keyEntryData, *nentries);
- for (i = 0; i < *nentries; i++)
- {
- keydata[i].datum = entries[i];
- keydata[i].isnull = nullFlags[i];
- }
-
arg.cmpDatumFunc = &ginstate->compareFn[attnum - 1];
arg.collation = ginstate->supportCollation[attnum - 1];
arg.haveDups = false;
- qsort_arg(keydata, *nentries, sizeof(keyEntryData),
- cmpEntries, &arg);
- if (arg.haveDups)
- {
- /* there are duplicates, must get rid of 'em */
- int32 j;
-
- entries[0] = keydata[0].datum;
- nullFlags[0] = keydata[0].isnull;
- j = 1;
- for (i = 1; i < *nentries; i++)
- {
- if (cmpEntries(&keydata[i - 1], &keydata[i], &arg) != 0)
- {
- entries[j] = keydata[i].datum;
- nullFlags[j] = keydata[i].isnull;
- j++;
- }
- }
- *nentries = j;
- }
- else
- {
- /* easy, no duplicates */
- for (i = 0; i < *nentries; i++)
- {
- entries[i] = keydata[i].datum;
- nullFlags[i] = keydata[i].isnull;
- }
- }
+ qsort_arg_entries(entries, nentries, &arg);
- pfree(keydata);
+ if (arg.haveDups)
+ nentries = qunique_arg(entries, nentries, sizeof(Datum), cmpEntries, &arg);
}
/*
- * Create GinNullCategory representation from nullFlags.
+ * Create GinNullCategory representation.
*/
- *categories = (GinNullCategory *) palloc0(*nentries * sizeof(GinNullCategory));
- for (i = 0; i < *nentries; i++)
- (*categories)[i] = (nullFlags[i] ? GIN_CAT_NULL_KEY : GIN_CAT_NORM_KEY);
+ StaticAssertStmt(GIN_CAT_NORM_KEY == 0, "Assuming GIN_CAT_NORM_KEY=0");
+ categories = palloc0_array(GinNullCategory, nentries + (hasNull ? 1 : 0));
+
+ /* Put back a NULL entry, if there were any */
+ if (hasNull)
+ {
+ entries[nentries] = (Datum) 0;
+ categories[nentries] = GIN_CAT_NULL_KEY;
+ nentries++;
+ }
+ *nentries_p = nentries;
+ *categories_p = categories;
return entries;
}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 7c3b4db94cd..c878546b9d2 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -99,7 +99,7 @@ extern void GinInitPage(Page page, uint32 f, Size pageSize);
extern void GinInitMetabuffer(Buffer b);
extern Datum *ginExtractEntries(GinState *ginstate, OffsetNumber attnum,
Datum value, bool isNull,
- int32 *nentries, GinNullCategory **categories);
+ int32 *nentries_p, GinNullCategory **categories_p);
extern OffsetNumber gintuple_get_attrnum(GinState *ginstate, IndexTuple tuple);
extern Datum gintuple_get_key(GinState *ginstate, IndexTuple tuple,
--
2.51.0
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-07 11:27 ` Heikki Linnakangas <[email protected]>
2026-04-08 02:15 ` Re: Reduce build times of pg_trgm GIN indexes John Naylor <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 15:06 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 4 replies; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-07 11:27 UTC (permalink / raw)
To: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; +Cc: pgsql-hackers
On 03/03/2026 19:31, David Geier wrote:
>> Attached are the patches rebased on latest master.
>>
>> I've removed the ASCII fast-path patch 0006 as it turned out to be more
>> complicated to make work than expected.
>>
>> I kept the radix sort patch because it gives a decent speedup but I
>> would like to focus for now on getting patches 0001 - 0004 merged.
>> They're all simple and, the way I see it, uncontroversial.
>>
>> I remeasured the savings of 0001 - 0004, which comes on top of the
>> already committed patch that inlined the comparison function, which gave
>> another ~5%:
>>
>> Data set | Patched (ms) | Master (ms) | Speedup
>> --------------------|--------------|--------------|----------
>> movies(plot) | 8,058 | 10,311 | 1.27x
>> lineitem(l_comment) | 223,233 | 256,986 | 1.19x
>>
>> I've also registered the change at the commit fest, see
>> https://commitfest.postgresql.org/patch/6418/.
>
> Attached is v5 that removes an incorrect assertion from the radix sort code.
>
> v5-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch
> v5-0002-Optimize-generate_trgm-with-sort_template.h.patch
> v5-0003-Make-btint4cmp-branchless.patch
> v5-0004-Faster-qunique-comparator-in-generate_trgm.patch
> v5-0005-Optimize-generate_trgm-with-radix-sort.patch
Pushed 0001 as commit 6f5ad00ab7.
I squashed 0002 and 0004 into one commit, and did some more refactoring:
I created a trigram_qsort() helper function that calls the signed or
unsigned variant, so that that logic doesn't need to be duplicated in
the callers. For symmetry, I also added a trigram_qunique() helper
function which just calls qunique() with the new, faster CMPTRGM_EQ
comparator. Pushed these as commit 9f3755ea07.
Patch 0003 gives me pause. It's a tiny patch:
> @@ -203,12 +204,7 @@ btint4cmp(PG_FUNCTION_ARGS)
> int32 a = PG_GETARG_INT32(0);
> int32 b = PG_GETARG_INT32(1);
>
> - if (a > b)
> - PG_RETURN_INT32(A_GREATER_THAN_B);
> - else if (a == b)
> - PG_RETURN_INT32(0);
> - else
> - PG_RETURN_INT32(A_LESS_THAN_B);
> + PG_RETURN_INT32(pg_cmp_s32(a, b));
> }
But the comments on the pg_cmp functions say:
> * NB: If the comparator function is inlined, some compilers may produce
> * worse code with these helper functions than with code with the
> * following form:
> *
> * if (a < b)
> * return -1;
> * if (a > b)
> * return 1;
> * return 0;
> *
So, uh, is that really a universal improvement? Is that comment about
producing worse code outdated?
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-08 02:15 ` John Naylor <[email protected]>
2026-04-13 15:05 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
3 siblings, 1 reply; 31+ messages in thread
From: John Naylor @ 2026-04-08 02:15 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On Tue, Apr 7, 2026 at 6:27 PM Heikki Linnakangas <[email protected]> wrote:
> But the comments on the pg_cmp functions say:
>
> > * NB: If the comparator function is inlined, some compilers may produce
> > * worse code with these helper functions than with code with the
> > * following form:
> > *
> > * if (a < b)
> > * return -1;
> > * if (a > b)
> > * return 1;
> > * return 0;
> > *
>
> So, uh, is that really a universal improvement? Is that comment about
> producing worse code outdated?
No, it's quite recent:
https://www.postgresql.org/message-id/20240212230423.GA3519%40nathanxps13
--
John Naylor
Amazon Web Services
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-08 02:15 ` Re: Reduce build times of pg_trgm GIN indexes John Naylor <[email protected]>
@ 2026-04-13 15:05 ` David Geier <[email protected]>
2026-04-14 13:05 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-04-13 15:05 UTC (permalink / raw)
To: John Naylor <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
On 08.04.2026 04:15, John Naylor wrote:
> On Tue, Apr 7, 2026 at 6:27 PM Heikki Linnakangas <[email protected]> wrote:
>> But the comments on the pg_cmp functions say:
>>
>>> * NB: If the comparator function is inlined, some compilers may produce
>>> * worse code with these helper functions than with code with the
>>> * following form:
>>> *
>>> * if (a < b)
>>> * return -1;
>>> * if (a > b)
>>> * return 1;
>>> * return 0;
>>> *
>>
>> So, uh, is that really a universal improvement? Is that comment about
>> producing worse code outdated?
Well spotted. Thanks!
>
> No, it's quite recent:
>
> https://www.postgresql.org/message-id/20240212230423.GA3519%40nathanxps13
In my original benchmarks it was faster. I'll rebase the remaining
commits and do some more analysis.
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-08 02:15 ` Re: Reduce build times of pg_trgm GIN indexes John Naylor <[email protected]>
2026-04-13 15:05 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-14 13:05 ` David Geier <[email protected]>
0 siblings, 0 replies; 31+ messages in thread
From: David Geier @ 2026-04-14 13:05 UTC (permalink / raw)
To: John Naylor <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
On 13.04.2026 17:05, David Geier wrote:
> On 08.04.2026 04:15, John Naylor wrote:
>> On Tue, Apr 7, 2026 at 6:27 PM Heikki Linnakangas <[email protected]> wrote:
>>> But the comments on the pg_cmp functions say:
>>>
>>>> * NB: If the comparator function is inlined, some compilers may produce
>>>> * worse code with these helper functions than with code with the
>>>> * following form:
>>>> *
>>>> * if (a < b)
>>>> * return -1;
>>>> * if (a > b)
>>>> * return 1;
>>>> * return 0;
>>>> *
>>>
>>> So, uh, is that really a universal improvement? Is that comment about
>>> producing worse code outdated?
>
> Well spotted. Thanks!
>
>>
>> No, it's quite recent:
>>
>> https://www.postgresql.org/message-id/20240212230423.GA3519%40nathanxps13
FWICS, this would only matter if btint4cmp() would get inlined
somewhere, where the compiler could actually make use of understanding
that parts of the if-cascade are not needed. Andres' example was
return DO_COMPARE(a, b) < 0 ?
(DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
: (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));
In the case of btint4cmp(), it's only ever invoked from the function
manager, where it cannot be inlined.
Or are there ways to invoke btint4cmp() that can be inlined, which I'm
unaware of?
> In my original benchmarks it was faster. I'll rebase the remaining
> commits and do some more analysis.
Here is the disassembly and the perf top output of master vs patched. I
compiled with GCC 15.2.0.
The unpatched version of btint4cmp() contains a conditional jump, which
is mispredicted frequently in the sort. The patched version is
completely branchless.
master
======
Dump of assembler code for function btint4cmp:
0x00005aa9e33ccdb0 <+0>: endbr64
0x00005aa9e33ccdb4 <+4>: mov 0x20(%rdi),%edx
0x00005aa9e33ccdb7 <+7>: mov $0x1,%eax
0x00005aa9e33ccdbc <+12>: cmp %edx,0x30(%rdi)
0x00005aa9e33ccdbf <+15>: jl 0x5aa9e33ccdca <btint4cmp+26>
0x00005aa9e33ccdc1 <+17>: setne %al
0x00005aa9e33ccdc4 <+20>: movzbl %al,%eax
0x00005aa9e33ccdc7 <+23>: neg %rax
0x00005aa9e33ccdca <+26>: ret
37.22% pg_trgm.so [.] trigram_qsort_signed.constprop.0
7.99% postgres [.] cmpEntryAccumulator
6.60% postgres [.] ginCombineData
6.03% postgres [.] FunctionCall2Coll
3.19% postgres [.] btint4cmp
2.30% postgres [.] rbt_insert
2.29% pg_trgm.so [.] generate_trgm
2.24% postgres [.] pg_mblen_range
1.77% libc.so.6 [.] __towlower_l
1.73% pg_trgm.so [.] trigram_qsort_signed_med3
1.56% postgres [.] pg_utf2wchar_with_len
Patched
=======
Dump of assembler code for function btint4cmp:
0x000055a69e87bdb0 <+0>: endbr64
0x000055a69e87bdb4 <+4>: mov 0x20(%rdi),%eax
0x000055a69e87bdb7 <+7>: cmp %eax,0x30(%rdi)
0x000055a69e87bdba <+10>: setl %al
0x000055a69e87bdbd <+13>: setg %dl
0x000055a69e87bdc0 <+16>: movzbl %dl,%edx
0x000055a69e87bdc3 <+19>: movzbl %al,%eax
0x000055a69e87bdc6 <+22>: sub %edx,%eax
0x000055a69e87bdc8 <+24>: cltq
0x000055a69e87bdca <+26>: ret
38.07% pg_trgm.so [.] trigram_qsort_signed.constprop.0
7.69% postgres [.] cmpEntryAccumulator
6.96% postgres [.] ginCombineData
3.90% postgres [.] FunctionCall2Coll
2.54% postgres [.] pg_mblen_range
2.40% postgres [.] btint4cmp
2.38% pg_trgm.so [.] generate_trgm
1.86% postgres [.] rbt_insert
1.80% libc.so.6 [.] __towlower_l
1.73% pg_trgm.so [.] trigram_qsort_signed_med3
1.66% postgres [.] pg_utf2wchar_with_len
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-09 11:28 ` Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
3 siblings, 1 reply; 31+ messages in thread
From: Bertrand Drouvot @ 2026-04-09 11:28 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Hi,
On Tue, Apr 07, 2026 at 02:27:40PM +0300, Heikki Linnakangas wrote:
> On 03/03/2026 19:31, David Geier wrote:
> > > Attached are the patches rebased on latest master.
> > >
> > > I've removed the ASCII fast-path patch 0006 as it turned out to be more
> > > complicated to make work than expected.
> > >
> > > I kept the radix sort patch because it gives a decent speedup but I
> > > would like to focus for now on getting patches 0001 - 0004 merged.
> > > They're all simple and, the way I see it, uncontroversial.
> > >
> > > I remeasured the savings of 0001 - 0004, which comes on top of the
> > > already committed patch that inlined the comparison function, which gave
> > > another ~5%:
> > >
> > > Data set | Patched (ms) | Master (ms) | Speedup
> > > --------------------|--------------|--------------|----------
> > > movies(plot) | 8,058 | 10,311 | 1.27x
> > > lineitem(l_comment) | 223,233 | 256,986 | 1.19x
> > >
> > > I've also registered the change at the commit fest, see
> > > https://commitfest.postgresql.org/patch/6418/.
> >
> > Attached is v5 that removes an incorrect assertion from the radix sort code.
> >
> > v5-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch
> > v5-0002-Optimize-generate_trgm-with-sort_template.h.patch
> > v5-0003-Make-btint4cmp-branchless.patch
> > v5-0004-Faster-qunique-comparator-in-generate_trgm.patch
> > v5-0005-Optimize-generate_trgm-with-radix-sort.patch
>
> Pushed 0001 as commit 6f5ad00ab7.
This commit makes use of StaticAssertStmt() that has been deprecated in
d50c86e74375. The attached, fixes it.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
@ 2026-04-13 09:41 ` Peter Eisentraut <[email protected]>
2026-04-13 11:04 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Peter Eisentraut @ 2026-04-13 09:41 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 09.04.26 13:28, Bertrand Drouvot wrote:
> Hi,
>
> On Tue, Apr 07, 2026 at 02:27:40PM +0300, Heikki Linnakangas wrote:
>> On 03/03/2026 19:31, David Geier wrote:
>>>> Attached are the patches rebased on latest master.
>>>>
>>>> I've removed the ASCII fast-path patch 0006 as it turned out to be more
>>>> complicated to make work than expected.
>>>>
>>>> I kept the radix sort patch because it gives a decent speedup but I
>>>> would like to focus for now on getting patches 0001 - 0004 merged.
>>>> They're all simple and, the way I see it, uncontroversial.
>>>>
>>>> I remeasured the savings of 0001 - 0004, which comes on top of the
>>>> already committed patch that inlined the comparison function, which gave
>>>> another ~5%:
>>>>
>>>> Data set | Patched (ms) | Master (ms) | Speedup
>>>> --------------------|--------------|--------------|----------
>>>> movies(plot) | 8,058 | 10,311 | 1.27x
>>>> lineitem(l_comment) | 223,233 | 256,986 | 1.19x
>>>>
>>>> I've also registered the change at the commit fest, see
>>>> https://commitfest.postgresql.org/patch/6418/.
>>>
>>> Attached is v5 that removes an incorrect assertion from the radix sort code.
>>>
>>> v5-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch
>>> v5-0002-Optimize-generate_trgm-with-sort_template.h.patch
>>> v5-0003-Make-btint4cmp-branchless.patch
>>> v5-0004-Faster-qunique-comparator-in-generate_trgm.patch
>>> v5-0005-Optimize-generate_trgm-with-radix-sort.patch
>>
>> Pushed 0001 as commit 6f5ad00ab7.
>
> This commit makes use of StaticAssertStmt() that has been deprecated in
> d50c86e74375. The attached, fixes it.
I think the position of the static assertion is correct, because it
refers to the palloc0_array() that follows. Maybe the comment could be
a bit clearer, like "using palloc0_array requires GIN_CAT_NORM_KEY==0"?
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
@ 2026-04-13 11:04 ` Bertrand Drouvot <[email protected]>
2026-04-13 15:03 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Bertrand Drouvot @ 2026-04-13 11:04 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Hi,
On Mon, Apr 13, 2026 at 11:41:02AM +0200, Peter Eisentraut wrote:
> On 09.04.26 13:28, Bertrand Drouvot wrote:
> >
> > This commit makes use of StaticAssertStmt() that has been deprecated in
> > d50c86e74375. The attached, fixes it.
>
> I think the position of the static assertion is correct, because it refers
> to the palloc0_array() that follows. Maybe the comment could be a bit
> clearer, like "using palloc0_array requires GIN_CAT_NORM_KEY==0"?
Yeah that looks better to not lose the connection with palloc0_array() here.
Done that way in the attached and adding new braces to avoid warning from
-Wdeclaration-after-statement.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-13 11:04 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
@ 2026-04-13 15:03 ` David Geier <[email protected]>
2026-04-13 17:15 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-04-13 15:03 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Hi!
On 13.04.2026 13:04, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Apr 13, 2026 at 11:41:02AM +0200, Peter Eisentraut wrote:
>> On 09.04.26 13:28, Bertrand Drouvot wrote:
>>>
>>> This commit makes use of StaticAssertStmt() that has been deprecated in
>>> d50c86e74375. The attached, fixes it.
I cannot find a comment close to StaticAssertStmt() that says it got
deprecated. Is the goal to completely get rid of StaticAssertStmt()?
>> I think the position of the static assertion is correct, because it refers
>> to the palloc0_array() that follows. Maybe the comment could be a bit
>> clearer, like "using palloc0_array requires GIN_CAT_NORM_KEY==0"?
>
> Yeah that looks better to not lose the connection with palloc0_array() here.
> Done that way in the attached and adding new braces to avoid warning from
> -Wdeclaration-after-statement.
Looks good to me.
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-13 11:04 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 15:03 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-13 17:15 ` Bertrand Drouvot <[email protected]>
2026-04-14 09:22 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Bertrand Drouvot @ 2026-04-13 17:15 UTC (permalink / raw)
To: David Geier <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; Heikki Linnakangas <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Hi,
On Mon, Apr 13, 2026 at 05:03:11PM +0200, David Geier wrote:
> Hi!
>
> On 13.04.2026 13:04, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Mon, Apr 13, 2026 at 11:41:02AM +0200, Peter Eisentraut wrote:
> >> On 09.04.26 13:28, Bertrand Drouvot wrote:
> >>>
> >>> This commit makes use of StaticAssertStmt() that has been deprecated in
> >>> d50c86e74375. The attached, fixes it.
>
> I cannot find a comment close to StaticAssertStmt() that says it got
> deprecated.
The comment on top of it's definition is:
"
/*
* StaticAssertStmt() was previously used to make static assertions work as a
* statement, but its use is now deprecated.
*/
"
> Is the goal to completely get rid of StaticAssertStmt()?
According to its comment, I'd say so.
> > Yeah that looks better to not lose the connection with palloc0_array() here.
> > Done that way in the attached and adding new braces to avoid warning from
> > -Wdeclaration-after-statement.
>
> Looks good to me.
Thanks for looking at it!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-09 11:28 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-13 11:04 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
2026-04-13 15:03 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-13 17:15 ` Re: Reduce build times of pg_trgm GIN indexes Bertrand Drouvot <[email protected]>
@ 2026-04-14 09:22 ` Heikki Linnakangas <[email protected]>
0 siblings, 0 replies; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-14 09:22 UTC (permalink / raw)
To: Bertrand Drouvot <[email protected]>; David Geier <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 13/04/2026 20:15, Bertrand Drouvot wrote:
> On Mon, Apr 13, 2026 at 05:03:11PM +0200, David Geier wrote:
>> On 13.04.2026 13:04, Bertrand Drouvot wrote:
>>> On Mon, Apr 13, 2026 at 11:41:02AM +0200, Peter Eisentraut wrote:
>>>> On 09.04.26 13:28, Bertrand Drouvot wrote:
>>>>>
>>>>> This commit makes use of StaticAssertStmt() that has been deprecated in
>>>>> d50c86e74375. The attached, fixes it.
>>
>> I cannot find a comment close to StaticAssertStmt() that says it got
>> deprecated.
>
> The comment on top of it's definition is:
>
> "
> /*
> * StaticAssertStmt() was previously used to make static assertions work as a
> * statement, but its use is now deprecated.
> */
> "
>
>> Is the goal to completely get rid of StaticAssertStmt()?
>
> According to its comment, I'd say so.
>
>>> Yeah that looks better to not lose the connection with palloc0_array() here.
>>> Done that way in the attached and adding new braces to avoid warning from
>>> -Wdeclaration-after-statement.
>>
>> Looks good to me.
>
> Thanks for looking at it!
Committed this StaticAssertStmt/Decl() fix, thanks
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-12 18:05 ` Tom Lane <[email protected]>
2026-04-13 15:22 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
3 siblings, 2 replies; 31+ messages in thread
From: Tom Lane @ 2026-04-12 18:05 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Heikki Linnakangas <[email protected]> writes:
> Pushed 0001 as commit 6f5ad00ab7.
This commit has caused Coverity to start complaining that
most of ginExtractEntries() is unreachable:
*** CID 1691468: Control flow issues (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/gin/ginutil.c: 495 in ginExtractEntries()
489 /*
490 * Scan the items for any NULLs. All NULLs are considered equal, so we
491 * just need to check and remember if there are any. We remove them from
492 * the array here, and after deduplication, put back one NULL entry to
493 * represent them all.
494 */
>>> CID 1691468: Control flow issues (DEADCODE)
>>> Execution cannot reach this statement: "hasNull = false;".
495 hasNull = false;
496 if (nullFlags)
497 {
498 int32 numNonNulls = 0;
499
500 for (int32 i = 0; i < nentries; i++)
Evidently, it does not realize that the extractValueFn() can change
nentries from its initial value of zero. I wouldn't be too surprised
if that's related to our casting of the pointer to uintptr_t --- that
may cause it to not see the passed pointer as a potential reference
mechanism.
I would just write that off as Coverity not being smart enough, except
that I'm worried that some compiler might make a similar deduction and
break the function completely. Was the switch to a local variable
for nentries really a useful win performance-wise?
regards, tom lane
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
@ 2026-04-13 15:22 ` David Geier <[email protected]>
1 sibling, 0 replies; 31+ messages in thread
From: David Geier @ 2026-04-13 15:22 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Heikki Linnakangas <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
On 12.04.2026 20:05, Tom Lane wrote:
> Heikki Linnakangas <[email protected]> writes:
>> Pushed 0001 as commit 6f5ad00ab7.
>
> This commit has caused Coverity to start complaining that
> most of ginExtractEntries() is unreachable:
>
> *** CID 1691468: Control flow issues (DEADCODE)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/access/gin/ginutil.c: 495 in ginExtractEntries()
> 489 /*
> 490 * Scan the items for any NULLs. All NULLs are considered equal, so we
> 491 * just need to check and remember if there are any. We remove them from
> 492 * the array here, and after deduplication, put back one NULL entry to
> 493 * represent them all.
> 494 */
>>>> CID 1691468: Control flow issues (DEADCODE)
>>>> Execution cannot reach this statement: "hasNull = false;".
> 495 hasNull = false;
> 496 if (nullFlags)
> 497 {
> 498 int32 numNonNulls = 0;
> 499
> 500 for (int32 i = 0; i < nentries; i++)
>
> Evidently, it does not realize that the extractValueFn() can change
> nentries from its initial value of zero. I wouldn't be too surprised
> if that's related to our casting of the pointer to uintptr_t --- that
> may cause it to not see the passed pointer as a potential reference
> mechanism.
Curious that we don't see that more frequently for other functions that
have output arguments. But maybe there are just too few?
> I would just write that off as Coverity not being smart enough, except
> that I'm worried that some compiler might make a similar deduction and
> break the function completely. Was the switch to a local variable
> for nentries really a useful win performance-wise?
I haven't benchmarked the variant with using the pointer directly. I can
do that.
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
@ 2026-04-13 16:14 ` Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
1 sibling, 1 reply; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-13 16:14 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 12/04/2026 21:05, Tom Lane wrote:
> Heikki Linnakangas <[email protected]> writes:
>> Pushed 0001 as commit 6f5ad00ab7.
>
> This commit has caused Coverity to start complaining that
> most of ginExtractEntries() is unreachable:
>
> *** CID 1691468: Control flow issues (DEADCODE)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/access/gin/ginutil.c: 495 in ginExtractEntries()
> 489 /*
> 490 * Scan the items for any NULLs. All NULLs are considered equal, so we
> 491 * just need to check and remember if there are any. We remove them from
> 492 * the array here, and after deduplication, put back one NULL entry to
> 493 * represent them all.
> 494 */
>>>> CID 1691468: Control flow issues (DEADCODE)
>>>> Execution cannot reach this statement: "hasNull = false;".
> 495 hasNull = false;
> 496 if (nullFlags)
> 497 {
> 498 int32 numNonNulls = 0;
> 499
> 500 for (int32 i = 0; i < nentries; i++)
>
> Evidently, it does not realize that the extractValueFn() can change
> nentries from its initial value of zero. I wouldn't be too surprised
> if that's related to our casting of the pointer to uintptr_t --- that
> may cause it to not see the passed pointer as a potential reference
> mechanism.
>
> I would just write that off as Coverity not being smart enough, except
> that I'm worried that some compiler might make a similar deduction and
> break the function completely. Was the switch to a local variable
> for nentries really a useful win performance-wise?
I didn't do it for performance, but because I find the function easier
to read that way. We could change it back.
It's a pretty scary thought that a compiler might misoptimize that
though. In the same function we have 'nullFlags', too, as a local
variable, even before this commit. Not sure why Coverity doesn't
complain about that.
> /*
> * PointerGetDatum
> * Returns datum representation for a pointer.
> */
> static inline Datum
> PointerGetDatum(const void *X)
> {
> return (Datum) (uintptr_t) X;
> }
Hmm, is that 'const' incorrect? This function doesn't modify *X, but the
resulting address will be used to modify it. Maybe changing it to
non-const "void *X" would give Coverity a hint.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-14 07:02 ` David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-04-14 07:02 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; Tom Lane <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
> I didn't do it for performance, but because I find the function easier
> to read that way. We could change it back.
>
> It's a pretty scary thought that a compiler might misoptimize that
> though. In the same function we have 'nullFlags', too, as a local
> variable, even before this commit. Not sure why Coverity doesn't
> complain about that.
>
>> /*
>> * PointerGetDatum
>> * Returns datum representation for a pointer.
>> */
>> static inline Datum
>> PointerGetDatum(const void *X)
>> {
>> return (Datum) (uintptr_t) X;
>> }
>
> Hmm, is that 'const' incorrect? This function doesn't modify *X, but the
> resulting address will be used to modify it. Maybe changing it to non-
> const "void *X" would give Coverity a hint.
Ah, that could be it.
Is there a way for me to run Coverity on a patch to test that out?
Which Coverity CI do we actually use? Is it this one here [1]?
[1] https://scan.coverity.com/projects/209?
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-15 11:06 ` Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-15 11:06 UTC (permalink / raw)
To: David Geier <[email protected]>; Tom Lane <[email protected]>; Peter Eisentraut <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
On 14/04/2026 10:02, David Geier wrote:
>> I didn't do it for performance, but because I find the function easier
>> to read that way. We could change it back.
>>
>> It's a pretty scary thought that a compiler might misoptimize that
>> though. In the same function we have 'nullFlags', too, as a local
>> variable, even before this commit. Not sure why Coverity doesn't
>> complain about that.
>>
>>> /*
>>> * PointerGetDatum
>>> * Returns datum representation for a pointer.
>>> */
>>> static inline Datum
>>> PointerGetDatum(const void *X)
>>> {
>>> return (Datum) (uintptr_t) X;
>>> }
>>
>> Hmm, is that 'const' incorrect? This function doesn't modify *X, but the
>> resulting address will be used to modify it. Maybe changing it to non-
>> const "void *X" would give Coverity a hint.
This was briefly discussed when PointerGetDatum() was changed from a
macro to a static inline function [1]. On that email, Peter pointed out
that the compiler was doing the same deduction that Coverity did now,
i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
function, it cannot change *foo. I'm surprised we dismissed that worry
so quickly. If the compiler optimizes based on that assumption, you can
get incorrect code.
Three alternative fixes were discussed on that thread. Here's a fourth
one that I think is better;
#define PointerGetDatum(X) \
((Datum) (uintptr_t) (true ? (X) : NULL))
I found this trick with the dummy conditional expression at [2]. It
always evaluates to just (X), but it has the effect that you get a
compiler error if (X) is not a pointer.
[1]
https://www.postgresql.org/message-id/812568f2-ff1d-ebd9-aee6-e00d8f2e0fb6%40enterprisedb.com
[2] See "TO_VOID_PTR_EXPR()" at
https://medium.com/@pauljlucas/generic-in-c-d7ab47e3b5ab
> Ah, that could be it.
> Is there a way for me to run Coverity on a patch to test that out?
Not really I'm afraid. I can commit a fix and we'll see if it helps the
next time that Coverity runs (= Sunday).
> Which Coverity CI do we actually use? Is it this one here [1]?
>
> [1] https://scan.coverity.com/projects/209?
Yeah, that's the one, but only the security team has access.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-15 19:12 ` Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Peter Eisentraut @ 2026-04-15 19:12 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; David Geier <[email protected]>; Tom Lane <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; pgsql-hackers
On 15.04.26 13:06, Heikki Linnakangas wrote:
> On 14/04/2026 10:02, David Geier wrote:
>>> I didn't do it for performance, but because I find the function easier
>>> to read that way. We could change it back.
>>>
>>> It's a pretty scary thought that a compiler might misoptimize that
>>> though. In the same function we have 'nullFlags', too, as a local
>>> variable, even before this commit. Not sure why Coverity doesn't
>>> complain about that.
>>>
>>>> /*
>>>> * PointerGetDatum
>>>> * Returns datum representation for a pointer.
>>>> */
>>>> static inline Datum
>>>> PointerGetDatum(const void *X)
>>>> {
>>>> return (Datum) (uintptr_t) X;
>>>> }
>>>
>>> Hmm, is that 'const' incorrect? This function doesn't modify *X, but the
>>> resulting address will be used to modify it. Maybe changing it to non-
>>> const "void *X" would give Coverity a hint.
>
> This was briefly discussed when PointerGetDatum() was changed from a
> macro to a static inline function [1]. On that email, Peter pointed out
> that the compiler was doing the same deduction that Coverity did now,
> i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
> function, it cannot change *foo. I'm surprised we dismissed that worry
> so quickly. If the compiler optimizes based on that assumption, you can
> get incorrect code.
I don't think this is in evidence. AFAICT, it's just Coverity that is
complaining here, which is its right, but the code is not incorrect.
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
@ 2026-04-15 21:25 ` Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Tom Lane @ 2026-04-15 21:25 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Peter Eisentraut <[email protected]> writes:
> On 15.04.26 13:06, Heikki Linnakangas wrote:
>> This was briefly discussed when PointerGetDatum() was changed from a
>> macro to a static inline function [1]. On that email, Peter pointed out
>> that the compiler was doing the same deduction that Coverity did now,
>> i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
>> function, it cannot change *foo. I'm surprised we dismissed that worry
>> so quickly. If the compiler optimizes based on that assumption, you can
>> get incorrect code.
> I don't think this is in evidence. AFAICT, it's just Coverity that is
> complaining here, which is its right, but the code is not incorrect.
Are you sure? This seems like the sort of thing that will bite us on
the rear sometime in the future, as the compiler geeks put in more and
more aggressive optimizations.
I think we should at least test the theory that changing
PointerGetDatum to remove the const cast would silence Coverity's
complaint. If it does not then we're attributing too much
intelligence to Coverity. But if it does, then we've correctly
identified why it's complaining, and we should take seriously the
idea that they aren't the only ones making this sort of deduction
(or won't be for long).
regards, tom lane
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
@ 2026-04-16 08:45 ` Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 14:43 ` Re: Reduce build times of pg_trgm GIN indexes Andres Freund <[email protected]>
0 siblings, 2 replies; 31+ messages in thread
From: Peter Eisentraut @ 2026-04-16 08:45 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 15.04.26 23:25, Tom Lane wrote:
> Peter Eisentraut <[email protected]> writes:
>> On 15.04.26 13:06, Heikki Linnakangas wrote:
>>> This was briefly discussed when PointerGetDatum() was changed from a
>>> macro to a static inline function [1]. On that email, Peter pointed out
>>> that the compiler was doing the same deduction that Coverity did now,
>>> i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
>>> function, it cannot change *foo. I'm surprised we dismissed that worry
>>> so quickly. If the compiler optimizes based on that assumption, you can
>>> get incorrect code.
>
>> I don't think this is in evidence. AFAICT, it's just Coverity that is
>> complaining here, which is its right, but the code is not incorrect.
>
> Are you sure? This seems like the sort of thing that will bite us on
> the rear sometime in the future, as the compiler geeks put in more and
> more aggressive optimizations.
>
> I think we should at least test the theory that changing
> PointerGetDatum to remove the const cast would silence Coverity's
> complaint. If it does not then we're attributing too much
> intelligence to Coverity. But if it does, then we've correctly
> identified why it's complaining, and we should take seriously the
> idea that they aren't the only ones making this sort of deduction
> (or won't be for long).
I think it's quite clear to me that Coverity is complaining about this
correctly, in its view of the world. Compilers sometimes complain about
this, too, although in this case they apparently don't look quite as
deeply to do this analysis.
What I'm missing here is, essentially where the previous thread stopped:
What is the overall message that we want to communicate with the API?
If the default assumption is that what pointers converted to Datums
point to should not be modified on the other side (where the Datum is
converted back to a pointer), then the current declaration of
PointerGetDatum() is suitable, and the GIN code can be considered an
exception and we make a special API for that. The previous thread
proposed NonconstPointerGetDatum().
(If this is the resolution, I also have half a patch somewhere that
makes the string input argument for the InputFunctionCall family of
functions const, which also seems intuitively sensible.)
If, on the other hand, the decision is that there is in fact no such
guarantee, that consumers of Datums are free to modify whatever they
seem fit, then we should drop the const of PointerGetDatum and fix the
fallout up the call stack.
The macro proposed by Heikki, I don't know, still doesn't actually
answer this question, just (possibly) makes these warnings go away in a
slightly mysterious way.
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
@ 2026-04-16 09:49 ` Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
1 sibling, 1 reply; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-16 09:49 UTC (permalink / raw)
To: Peter Eisentraut <[email protected]>; Tom Lane <[email protected]>; +Cc: David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 16/04/2026 11:45, Peter Eisentraut wrote:
> On 15.04.26 23:25, Tom Lane wrote:
>> Peter Eisentraut <[email protected]> writes:
>>> On 15.04.26 13:06, Heikki Linnakangas wrote:
>>>> This was briefly discussed when PointerGetDatum() was changed from a
>>>> macro to a static inline function [1]. On that email, Peter pointed out
>>>> that the compiler was doing the same deduction that Coverity did now,
>>>> i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
>>>> function, it cannot change *foo. I'm surprised we dismissed that worry
>>>> so quickly. If the compiler optimizes based on that assumption, you can
>>>> get incorrect code.
>>
>>> I don't think this is in evidence. AFAICT, it's just Coverity that is
>>> complaining here, which is its right, but the code is not incorrect.
>>
>> Are you sure? This seems like the sort of thing that will bite us on
>> the rear sometime in the future, as the compiler geeks put in more and
>> more aggressive optimizations.
>>
>> I think we should at least test the theory that changing
>> PointerGetDatum to remove the const cast would silence Coverity's
>> complaint. If it does not then we're attributing too much
>> intelligence to Coverity. But if it does, then we've correctly
>> identified why it's complaining, and we should take seriously the
>> idea that they aren't the only ones making this sort of deduction
>> (or won't be for long).
>
> I think it's quite clear to me that Coverity is complaining about this
> correctly, in its view of the world. Compilers sometimes complain about
> this, too, although in this case they apparently don't look quite as
> deeply to do this analysis.
>
> What I'm missing here is, essentially where the previous thread stopped:
> What is the overall message that we want to communicate with the API?
>
> If the default assumption is that what pointers converted to Datums
> point to should not be modified on the other side (where the Datum is
> converted back to a pointer), then the current declaration of
> PointerGetDatum() is suitable, and the GIN code can be considered an
> exception and we make a special API for that. The previous thread
> proposed NonconstPointerGetDatum().
>
> (If this is the resolution, I also have half a patch somewhere that
> makes the string input argument for the InputFunctionCall family of
> functions const, which also seems intuitively sensible.)
>
> If, on the other hand, the decision is that there is in fact no such
> guarantee, that consumers of Datums are free to modify whatever they
> seem fit, then we should drop the const of PointerGetDatum and fix the
> fallout up the call stack.
>
> The macro proposed by Heikki, I don't know, still doesn't actually
> answer this question, just (possibly) makes these warnings go away in a
> slightly mysterious way.
My intention was that if you do:
const foo *ptr;
...
datum = PointerGetDatum(ptr);
Then you're not allowed to modify the contents of *ptr through the
'datum'. But if you do:
foo *ptr;
...
datum = PointerGetDatum(ptr);
Then it is allowed.
I don't know how to tell the compiler exactly that. I tried various
hacks with _Generic(), but couldn't make it work. The macro casts away
the 'const' in the first case, which might disable some optimizations
that the compiler could otherwise do.
We could have all three:
ConstPointerGetDatum(const void *): Promises that the returned Datum is
not used to modify
NonConstPointerGetDatum(void *): No such promise.
PointerGetDatum(): Macro as I proposed. Like NonConstPointerGetDatum(),
but for backwards-compatibility it doesn't emit a warning if you pass a
const pointer to it.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-16 14:37 ` Tom Lane <[email protected]>
2026-04-16 17:30 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Tom Lane @ 2026-04-16 14:37 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Heikki Linnakangas <[email protected]> writes:
> On 16/04/2026 11:45, Peter Eisentraut wrote:
>> What I'm missing here is, essentially where the previous thread stopped:
>> What is the overall message that we want to communicate with the API?
Good point.
>> If the default assumption is that what pointers converted to Datums
>> point to should not be modified on the other side (where the Datum is
>> converted back to a pointer), then the current declaration of
>> PointerGetDatum() is suitable, and the GIN code can be considered an
>> exception and we make a special API for that. The previous thread
>> proposed NonconstPointerGetDatum().
I think there can be no doubt that most functions receiving a
pass-by-ref Datum are not supposed to scribble on the pointed-to
data. So it makes sense to me that PointerGetDatum should carry
an implication of const-ness, and then we need to invent a new
notation to use in the small number of places where that's not
appropriate. I'd capitalize it as NonConstPointerGetDatum,
but other than that nit that naming suggestion seems fine to me.
Of course, then the *real* question is why DatumGetPointer
doesn't deliver a const pointer. But I don't see how to get
there without extremely invasive changes.
> We could have all three:
Not excited about making massive changes for this.
I remain far less certain than Peter is that this discussion has
anything to do with why Coverity is complaining about
ginExtractEntries. I still think we should make some minimum-effort
change to see if the complaint goes away before expending a lot of
brain cells on choosing a final fix.
regards, tom lane
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
@ 2026-04-16 17:30 ` Heikki Linnakangas <[email protected]>
2026-04-16 17:47 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-16 17:30 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 16/04/2026 17:37, Tom Lane wrote:
> Heikki Linnakangas <[email protected]> writes:
>> On 16/04/2026 11:45, Peter Eisentraut wrote:
>>> What I'm missing here is, essentially where the previous thread stopped:
>>> What is the overall message that we want to communicate with the API?
>
> Good point.
>
>>> If the default assumption is that what pointers converted to Datums
>>> point to should not be modified on the other side (where the Datum is
>>> converted back to a pointer), then the current declaration of
>>> PointerGetDatum() is suitable, and the GIN code can be considered an
>>> exception and we make a special API for that. The previous thread
>>> proposed NonconstPointerGetDatum().
>
> I think there can be no doubt that most functions receiving a
> pass-by-ref Datum are not supposed to scribble on the pointed-to
> data. So it makes sense to me that PointerGetDatum should carry
> an implication of const-ness, and then we need to invent a new
> notation to use in the small number of places where that's not
> appropriate. I'd capitalize it as NonConstPointerGetDatum,
> but other than that nit that naming suggestion seems fine to me.
That makes sense. My worry is that we're changing the rules in a very
subtle way: It used to be OK to use PointerGetDatum(), pass the
resulting datum to something that modifies it. Now we say it's not OK,
and you must use NonConstPointerGetDatum(). You don't get any compiler
warnings if you use it wrong, except for this one coverity warning
apparently, but it doesn't catch this reliably either.
> Of course, then the *real* question is why DatumGetPointer
> doesn't deliver a const pointer. But I don't see how to get
> there without extremely invasive changes.
Good point.
>> We could have all three:
>
> Not excited about making massive changes for this.
Having all three would be a very localized change in postgres.h.
> I remain far less certain than Peter is that this discussion has
> anything to do with why Coverity is complaining about
> ginExtractEntries. I still think we should make some minimum-effort
> change to see if the complaint goes away before expending a lot of
> brain cells on choosing a final fix.
I think I'm going to commit my proposal to turn PointerGetDatum() back
into a macro, and see if that makes Coverity happy. Then we'll know, and
we can decide on the next steps. Any objections?
One open question is whether we should backpatch any of this. I guess
compilers don't misoptimize this in practice, or we would've gotten more
reports, but I really can't rationalize why not and a new compiler
version might well start hitting this.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 17:30 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-16 17:47 ` Tom Lane <[email protected]>
2026-04-17 19:21 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Tom Lane @ 2026-04-16 17:47 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Heikki Linnakangas <[email protected]> writes:
> On 16/04/2026 17:37, Tom Lane wrote:
>> Not excited about making massive changes for this.
> Having all three would be a very localized change in postgres.h.
Sure, but *using* them in a consistent way would be invasive.
>> I remain far less certain than Peter is that this discussion has
>> anything to do with why Coverity is complaining about
>> ginExtractEntries. I still think we should make some minimum-effort
>> change to see if the complaint goes away before expending a lot of
>> brain cells on choosing a final fix.
> I think I'm going to commit my proposal to turn PointerGetDatum() back
> into a macro, and see if that makes Coverity happy. Then we'll know, and
> we can decide on the next steps. Any objections?
WFM.
regards, tom lane
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 17:30 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 17:47 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
@ 2026-04-17 19:21 ` Heikki Linnakangas <[email protected]>
2026-04-22 21:11 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-17 19:21 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Peter Eisentraut <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
On 16/04/2026 20:47, Tom Lane wrote:
> Heikki Linnakangas <[email protected]> writes:
>> On 16/04/2026 17:37, Tom Lane wrote:
>>> Not excited about making massive changes for this.
>
>> Having all three would be a very localized change in postgres.h.
>
> Sure, but *using* them in a consistent way would be invasive.
>
>>> I remain far less certain than Peter is that this discussion has
>>> anything to do with why Coverity is complaining about
>>> ginExtractEntries. I still think we should make some minimum-effort
>>> change to see if the complaint goes away before expending a lot of
>>> brain cells on choosing a final fix.
>
>> I think I'm going to commit my proposal to turn PointerGetDatum() back
>> into a macro, and see if that makes Coverity happy. Then we'll know, and
>> we can decide on the next steps. Any objections?
>
> WFM.
Grepping for PointerGetDatum(), there are a bunch of wrappers of it for
specific types, like:
static inline Datum CStringGetDatum(const char *X)
static inline Datum NumericGetDatum(Numeric X)
Most are marked "const". These all potentially have the same problem,
but I think for these it is a good assumption that resulting Datum will
not be used to modify *X, so we can leave them alone. I guess we didn't
do that for NumericGetDatum just because the Numeric typedef doesn't
allow that.
There's also:
static inline Datum fetch_att(const void *T, bool attbyval, int attlen)
The "const" seems reasonable on that too.
This is an interesting case:
static inline Datum
EOHPGetRWDatum(const struct ExpandedObjectHeader *eohptr)
{
return PointerGetDatum(eohptr->eoh_rw_ptr);
}
That RW stands for read/write, which sounds alarming. But the returned
datum points to eohptr->eoh_rw_ptr rather than *eohptr itself, so I
think the 'const' is correct here after all.
So, pushed a commit that changes just PointerGetDatum() itself, leaving
all those others alone.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 17:30 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-16 17:47 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-17 19:21 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-22 21:11 ` Heikki Linnakangas <[email protected]>
0 siblings, 0 replies; 31+ messages in thread
From: Heikki Linnakangas @ 2026-04-22 21:11 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Peter Eisentraut <[email protected]>; pgsql-hackers; David Geier <[email protected]>
On 17/04/2026 22:21, Heikki Linnakangas wrote:
> On 16/04/2026 20:47, Tom Lane wrote:
>> Heikki Linnakangas <[email protected]> writes:
>>> On 16/04/2026 17:37, Tom Lane wrote:
>>>> Not excited about making massive changes for this.
>>
>>> Having all three would be a very localized change in postgres.h.
>>
>> Sure, but *using* them in a consistent way would be invasive.
>>
>>>> I remain far less certain than Peter is that this discussion has
>>>> anything to do with why Coverity is complaining about
>>>> ginExtractEntries. I still think we should make some minimum-effort
>>>> change to see if the complaint goes away before expending a lot of
>>>> brain cells on choosing a final fix.
>>
>>> I think I'm going to commit my proposal to turn PointerGetDatum() back
>>> into a macro, and see if that makes Coverity happy. Then we'll know, and
>>> we can decide on the next steps. Any objections?
>>
>> WFM.
>
> ...
>
> So, pushed a commit that changes just PointerGetDatum() itself, leaving
> all those others alone.
As we thought, this made the Coverity warning go away.
I'm happy with the status quo in master, but if we want to introduce new
ConstPointerGetDatum() or NonConstPointerGetDatum() variants instead of
the macro, now is the time to do it.
For backbranches, IMHO we should go with the macro. It's a little scary
to replace such a widely used function as PointerGetDatum() in
back-branches, but I do think this should be fixed. Introducing new
variants doesn't seems even less backpatchable.
- Heikki
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-13 16:14 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-15 11:06 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Re: Reduce build times of pg_trgm GIN indexes Tom Lane <[email protected]>
2026-04-16 08:45 ` Re: Reduce build times of pg_trgm GIN indexes Peter Eisentraut <[email protected]>
@ 2026-04-16 14:43 ` Andres Freund <[email protected]>
1 sibling, 0 replies; 31+ messages in thread
From: Andres Freund @ 2026-04-16 14:43 UTC (permalink / raw)
To: [email protected]; Peter Eisentraut <[email protected]>; Tom Lane <[email protected]>; +Cc: Heikki Linnakangas <[email protected]>; David Geier <[email protected]>; Matthias van de Meent <[email protected]>; pgsql-hackers
Hi,
On April 16, 2026 4:45:55 AM EDT, Peter Eisentraut <[email protected]> wrote:
>On 15.04.26 23:25, Tom Lane wrote:
>> Peter Eisentraut <[email protected]> writes:
>>> On 15.04.26 13:06, Heikki Linnakangas wrote:
>>>> This was briefly discussed when PointerGetDatum() was changed from a
>>>> macro to a static inline function [1]. On that email, Peter pointed out
>>>> that the compiler was doing the same deduction that Coverity did now,
>>>> i.e. that if you pass the Datum returned by PointerGetDatum(&foo) to a
>>>> function, it cannot change *foo. I'm surprised we dismissed that worry
>>>> so quickly. If the compiler optimizes based on that assumption, you can
>>>> get incorrect code.
>>
>>> I don't think this is in evidence. AFAICT, it's just Coverity that is
>>> complaining here, which is its right, but the code is not incorrect.
>>
>> Are you sure? This seems like the sort of thing that will bite us on
>> the rear sometime in the future, as the compiler geeks put in more and
>> more aggressive optimizations.
>>
>> I think we should at least test the theory that changing
>> PointerGetDatum to remove the const cast would silence Coverity's
>> complaint. If it does not then we're attributing too much
>> intelligence to Coverity. But if it does, then we've correctly
>> identified why it's complaining, and we should take seriously the
>> idea that they aren't the only ones making this sort of deduction
>> (or won't be for long).
>
>I think it's quite clear to me that Coverity is complaining about this correctly, in its view of the world. Compilers sometimes complain about this, too, although in this case they apparently don't look quite as deeply to do this analysis.
>
>What I'm missing here is, essentially where the previous thread stopped: What is the overall message that we want to communicate with the API?
>
>If the default assumption is that what pointers converted to Datums point to should not be modified on the other side (where the Datum is converted back to a pointer), then the current declaration of PointerGetDatum() is suitable, and the GIN code can be considered an exception and we make a special API for that. The previous thread proposed NonconstPointerGetDatum().
To me it seems way way way to dangerous to just redefine what PointerGetDatum() means for all existing callers, without doing an exhaustive verification of all the callers.
Separately from that, it doesn't seem defensible to take a const pointer and return a non const one. Why is that sane?
Greetings,
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
@ 2026-04-13 15:06 ` David Geier <[email protected]>
2026-04-14 14:24 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
3 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-04-13 15:06 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; Matthias van de Meent <[email protected]>; +Cc: pgsql-hackers
Hi Heikki!
> Pushed 0001 as commit 6f5ad00ab7.
>
> I squashed 0002 and 0004 into one commit, and did some more refactoring:
> I created a trigram_qsort() helper function that calls the signed or
> unsigned variant, so that that logic doesn't need to be duplicated in
> the callers. For symmetry, I also added a trigram_qunique() helper
> function which just calls qunique() with the new, faster CMPTRGM_EQ
> comparator. Pushed these as commit 9f3755ea07.
Thanks for committing these patches.
--
David Geier
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-13 15:06 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-14 14:24 ` David Geier <[email protected]>
2026-04-22 21:26 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
0 siblings, 1 reply; 31+ messages in thread
From: David Geier @ 2026-04-14 14:24 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; Matthias van de Meent <[email protected]>; +Cc: pgsql-hackers
On 13.04.2026 17:06, David Geier wrote:
>> I squashed 0002 and 0004 into one commit, and did some more refactoring:
>> I created a trigram_qsort() helper function that calls the signed or
>> unsigned variant, so that that logic doesn't need to be duplicated in
>> the callers. For symmetry, I also added a trigram_qunique() helper
>> function which just calls qunique() with the new, faster CMPTRGM_EQ
>> comparator. Pushed these as commit 9f3755ea07.
>
> Thanks for committing these patches.
Attached are the remaining patches (previously 0003 and 0005) rebased on
latest master. Currently, there's no radix sort variant for the unsigned
char case. Do we care about this case or is it fine if that case runs
slower?
The following perf profiles show that trigram_qsort() goes from ~34%
down to ~7% with the radix sort optimization. The optimized run also
includes the btint4cmp() optimization. Without that the result would be
even better.
With that change we could move on and tackle optimizing
1. 41.52% generate_trgm_only() by e.g. using an ASCII fast-patch
2. 32.72% ginInsertBAEntries() by no longer using the RB-tree but
e.g. also the radix sort
master
- heapam_index_build_range_scan
- 99.40% ginBuildCallback
- ginHeapTupleBulkInsert
- 66.55% ginExtractEntries
- 65.29% FunctionCall3Coll
- gin_extract_value_trgm
- 62.80% generate_trgm
+ 34.33% trigram_qsort (inlined)
+ 26.20% generate_trgm_only
+ 2.23% trigram_qunique (inlined)
+ 1.74% detoast_attr
+ 1.19% qsort_arg_entries
+ 32.72% ginInsertBAEntries
patched
- heapam_index_build_range_scan
- 99.42% ginBuildCallback
- 95.95% ginHeapTupleBulkInsert
- 59.11% ginExtractEntries
- 56.93% FunctionCall3Coll
- gin_extract_value_trgm
- 52.19% generate_trgm
+ 41.52% generate_trgm_only
+ 7.14% trigram_qsort (inlined)
+ 3.53% trigram_qunique (inlined)
+ 4.08% detoast_attr
+ 2.13% qsort_arg_entries
+ 36.78% ginInsertBAEntries
--
David Geier
Attachments:
[text/x-patch] v6-0002-Optimize-generate_trgm-with-radix-sort.patch (2.2K, 2-v6-0002-Optimize-generate_trgm-with-radix-sort.patch)
download | inline diff:
From b20910a814e8c8b64c3d74841fe27a7c12dd927a Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Tue, 11 Nov 2025 13:18:59 +0100
Subject: [PATCH v6 2/2] Optimize generate_trgm() with radix sort
---
contrib/pg_trgm/trgm_op.c | 59 +++++++++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 8 deletions(-)
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 0aca9b5826f..f2dbe88ece0 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -226,13 +226,56 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
return CMPTRGM(a, b);
}
-#define ST_SORT trigram_qsort_signed
-#define ST_ELEMENT_TYPE_VOID
-#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
-#define ST_SCOPE static
-#define ST_DEFINE
-#define ST_DECLARE
-#include "lib/sort_template.h"
+/*
+ * Needed to properly handle negative numbers in case char is signed.
+ */
+static inline unsigned char FlipSign(char x)
+{
+ return x^0x80;
+}
+
+static void radix_sort_trigrams_signed(trgm *trg, int count)
+{
+ trgm *buffer = palloc_array(trgm, count);
+ trgm *starts[256];
+ trgm *from = trg;
+ trgm *to = buffer;
+ int freqs[3][256];
+
+ /*
+ * Compute frequencies to partition the buffer.
+ */
+ memset(freqs, 0, sizeof(freqs));
+
+ for (int i=0; i<count; i++)
+ for (int j=0; j<3; j++)
+ freqs[j][FlipSign(trg[i][j])]++;
+
+ /*
+ * Do the sorting. Start with last character because that's the is "LSB"
+ * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers.
+ */
+ for (int i=2; i>=0; i--)
+ {
+ trgm *old_from = from;
+ trgm *next = to;
+
+ for (int j=0; j<256; j++)
+ {
+ starts[j] = next;
+ next += freqs[i][j];
+ }
+
+ for (int j=0; j<count; j++)
+ memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm));
+
+ from = to;
+ to = old_from;
+ }
+
+ memcpy(trg, buffer, sizeof(trgm) * count);
+ pfree(buffer);
+}
#define ST_SORT trigram_qsort_unsigned
#define ST_ELEMENT_TYPE_VOID
@@ -247,7 +290,7 @@ static void
trigram_qsort(trgm *array, size_t n)
{
if (GetDefaultCharSignedness())
- trigram_qsort_signed(array, n, sizeof(trgm));
+ radix_sort_trigrams_signed(array, n);
else
trigram_qsort_unsigned(array, n, sizeof(trgm));
}
--
2.51.0
[text/x-patch] v6-0001-Make-btint4cmp-branchless.patch (1.0K, 3-v6-0001-Make-btint4cmp-branchless.patch)
download | inline diff:
From 0f39f8618c784299b30566a0bf68beef8b2b8aef Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Mon, 10 Nov 2025 15:40:11 +0100
Subject: [PATCH v6 1/2] Make btint4cmp() branchless
---
src/backend/access/nbtree/nbtcompare.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index 1d343377e98..ac16e3d993d 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -61,6 +61,7 @@
#include "utils/fmgrprotos.h"
#include "utils/skipsupport.h"
#include "utils/sortsupport.h"
+#include "common/int.h"
#ifdef STRESS_SORT_INT_MIN
#define A_LESS_THAN_B INT_MIN
@@ -203,12 +204,7 @@ btint4cmp(PG_FUNCTION_ARGS)
int32 a = PG_GETARG_INT32(0);
int32 b = PG_GETARG_INT32(1);
- if (a > b)
- PG_RETURN_INT32(A_GREATER_THAN_B);
- else if (a == b)
- PG_RETURN_INT32(0);
- else
- PG_RETURN_INT32(A_LESS_THAN_B);
+ PG_RETURN_INT32(pg_cmp_s32(a, b));
}
Datum
--
2.51.0
^ permalink raw reply [nested|flat] 31+ messages in thread
* Re: Reduce build times of pg_trgm GIN indexes
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-03-02 12:17 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-07 11:27 ` Re: Reduce build times of pg_trgm GIN indexes Heikki Linnakangas <[email protected]>
2026-04-13 15:06 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
2026-04-14 14:24 ` Re: Reduce build times of pg_trgm GIN indexes David Geier <[email protected]>
@ 2026-04-22 21:26 ` David Geier <[email protected]>
0 siblings, 0 replies; 31+ messages in thread
From: David Geier @ 2026-04-22 21:26 UTC (permalink / raw)
To: Heikki Linnakangas <[email protected]>; Matthias van de Meent <[email protected]>; +Cc: pgsql-hackers
On 14.04.2026 16:24, David Geier wrote:
> Attached are the remaining patches (previously 0003 and 0005) rebased on
> latest master. Currently, there's no radix sort variant for the unsigned
> char case. Do we care about this case or is it fine if that case runs
> slower?
>
> The following perf profiles show that trigram_qsort() goes from ~34%
> down to ~7% with the radix sort optimization. The optimized run also
> includes the btint4cmp() optimization. Without that the result would be
> even better.
>
> With that change we could move on and tackle optimizing
>
> 1. 41.52% generate_trgm_only() by e.g. using an ASCII fast-patch
> 2. 32.72% ginInsertBAEntries() by no longer using the RB-tree but
> e.g. also the radix sort
Attached is the rebased patch set as well as a new patch that optimizes
ginInsertBAEntries(). Performance improvements are as follows, measured
with the same benchmark I used in the first mail of this thread.
Runtimes and deltas are in milliseconds.
Code | movies | delta | lineitem | delta
-----------------------------------|--------|--------|------------------
master | 11,160 | - | 248,146 | -
v7-0001-Make-btint4cmp-branchless | 9,509 | 1,651 | 236,760 | 11,386
v7-0002-Use-radix-sort | 6,123 | 3,386 | 214,632 | 22,128
v7-0003-Replace-RB-tree | 4,755 | 1,368 | 144,252 | 70,380
I optimized ginInsertBAEntries() by replacing the red-black tree by a
hash map (simplehash.h) on the keys (e.g. trigrams), where each hash map
entry contains an item pointer list with the row TIDs the key occurs in.
This is in the spirit of the original red-black tree implementation but
this way the deduplication of each key is handled in O(1), instead of
O(log(num_unique_keys)). This reduces the overall runtime complexity from
O(num_total_keys * log(num_unique_keys))
->
O(num_total_keys + num_unique_keys * log(num_unique_keys))
As there are normally a lot less keys than rows, this is complexity-wise
preferable. And even if all keys are unique, the rebalancing operations
are pretty expensive and in reality outweigh the slightly better
complexity in the worst-case.
- The sorting of the item pointer lists for each key stays as is, except
for that I switched to sort_template.h.
- The red-black tree code in rbtree.c/.h is now completely unused and
could be removed. Thoughts?
- The size on disk changed very slightly. I think this is because the
memory accounting is slightly different and with that slightly more or
less keys / item pointers can be processed in one go.
- FWICS, we can use datumIsEqual() and datum_image_hash() in the hash
map because as of today, the GIN index code doesn't work properly with
non-deterministic collations / non-image-equal types, e.g. see [1].
- I also simplified ginInsertBAEntries(). Previously, it used an
insertion order that minimizes the number of RB-tree rebalancing
operations for sorted inputs. With the hash map approach, it's better
to insert in sort order as this increases the likelihood of hitting
the same hash map entry again for equal keys.
- This patch set also improves parallel GIN index builds as the parallel
code builds on top of ginInsertBAEntries().
- Regression tests pass and gin_index_check() from amcheck couldn't find
an error in the data structures.
To be fair: 0001 is less interesting after removing the RB-tree because
a lot less key comparisons are being performed. I still think it makes
sense to merge it as it should also accelerate e.g. B-tree traversals.
With these changes the number one bottleneck becomes the trigram
generation in generate_trgm_only(). The following perf profile nicely
shows that:
- 99.89% 0.00% postgres postgres [.] ginbuild
ginbuild
table_index_build_scan (inlined)
- heapam_index_build_range_scan
- 94.05% ginBuildCallback
- 86.48% ginHeapTupleBulkInsert
- 68.57% ginExtractEntries
- 64.69% FunctionCall3Coll
- gin_extract_value_trgm
- 62.75% generate_trgm
- 39.57% generate_trgm_only
+ 18.53% str_tolower
+ 16.91% find_word (inlined)
+ 1.36% make_trigrams (inlined)
0.66% __strlen_avx2
+ 18.49% trigram_qsort (inlined)
+ 4.17% trigram_qunique (inlined)
0.79% trgm2int
+ 3.33% qsort_arg_entries
- 17.28% ginInsertBAEntries
- ginInsertBAEntry (inlined)
- 8.04% ginbuild_insert (inlined)
- 4.89% ginbuild_insert_hash_internal (inlined)
1.25% gin_equal_key (inlined)
0.68% ginbuild_initial_bucket (inlined)
+ 3.14% gin_hash_key (inlined)
+ 2.45% AllocSetRealloc
+ 0.85% asm_exc_page_fault
0.52% getDatumCopy (inlined)
+ 6.00% ginEntryInsert
+ 1.19% ginBeginBAScan
+ 2.99% heap_getnext
[1]
https://www.postgresql.org/message-id/flat/8ef4899c4acfebca45cc6c042a6dc611d25ffab1.camel%40cybertec...
--
David Geier
Attachments:
[text/x-patch] v7-0003-Replace-RB-tree-with-hash-map-and-sort-in-GIN-ind.patch (14.6K, 2-v7-0003-Replace-RB-tree-with-hash-map-and-sort-in-GIN-ind.patch)
download | inline diff:
From c3fb626cae56bcb620af02be55dd18ccab97da72 Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Wed, 22 Apr 2026 14:00:40 +0200
Subject: [PATCH v7 3/3] Replace RB-tree with hash map and sort in GIN index
---
src/backend/access/gin/ginbulk.c | 343 +++++++++++++++----------------
src/include/access/gin_private.h | 24 +--
2 files changed, 172 insertions(+), 195 deletions(-)
diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index 85865b39105..e2d23786225 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -17,107 +17,118 @@
#include <limits.h>
#include "access/gin_private.h"
+#include "common/hashfn.h"
#include "utils/datum.h"
#include "utils/memutils.h"
+#define DEF_NENTRY 2048 /* Initial hash table size */
+#define DEF_ITEMS_PER_KEY 8 /* Initial ItemPointer array size per key */
-#define DEF_NENTRY 2048 /* GinEntryAccumulator allocation quantum */
-#define DEF_NPTR 5 /* ItemPointer initial allocation quantum */
-
-
-/* Combiner function for rbtree.c */
-static void
-ginCombineData(RBTNode *existing, const RBTNode *newdata, void *arg)
+typedef struct GinHashKey
{
- GinEntryAccumulator *eo = (GinEntryAccumulator *) existing;
- const GinEntryAccumulator *en = (const GinEntryAccumulator *) newdata;
- BuildAccumulator *accum = (BuildAccumulator *) arg;
+ OffsetNumber attnum;
+ GinNullCategory category;
+ Datum key;
+} GinHashKey;
- /*
- * Note this code assumes that newdata contains only one itempointer.
- */
- if (eo->count >= eo->maxcount)
- {
- if (eo->maxcount > INT_MAX)
- ereport(ERROR,
- (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
- errmsg("posting list is too long"),
- errhint("Reduce \"maintenance_work_mem\".")));
+typedef struct GinHashEntry
+{
+ GinHashKey hashkey;
+ uint32 hash;
+ char status;
+ ItemPointerData * items;
+ uint32 numItems;
+ uint32 allocatedItems;
+} GinHashEntry;
+
+typedef struct GinSortEntry
+{
+ GinHashKey hashkey;
+ ItemPointerData * items;
+ uint32 numItems;
+} GinSortEntry;
+
+static uint32 gin_hash_key(struct ginbuild_hash *tb, GinHashKey *key);
+static bool gin_equal_key(struct ginbuild_hash *tb, GinHashKey *a, GinHashKey *b);
+
+#define SH_PREFIX ginbuild
+#define SH_ELEMENT_TYPE GinHashEntry
+#define SH_KEY_TYPE GinHashKey
+#define SH_KEY hashkey
+#define SH_HASH_KEY(tb, key) gin_hash_key(tb, &key)
+#define SH_EQUAL(tb, a, b) gin_equal_key(tb, &a, &b)
+#define SH_SCOPE static inline
+#define SH_STORE_HASH
+#define SH_GET_HASH(tb, a) (a)->hash
+#define SH_DEFINE
+#define SH_DECLARE
+#include "lib/simplehash.h"
+
+static uint32
+gin_hash_key(struct ginbuild_hash *tb, GinHashKey *key)
+{
+ BuildAccumulator *accum = (BuildAccumulator *) tb->private_data;
+ uint32 hash;
- accum->allocatedMemory -= GetMemoryChunkSpace(eo->list);
- eo->maxcount *= 2;
- eo->list = (ItemPointerData *)
- repalloc_huge(eo->list, sizeof(ItemPointerData) * eo->maxcount);
- accum->allocatedMemory += GetMemoryChunkSpace(eo->list);
- }
+ hash = hash_combine(0, murmurhash32((uint32) key->attnum));
+ hash = hash_combine(hash, murmurhash32((uint32) key->category));
- /* If item pointers are not ordered, they will need to be sorted later */
- if (eo->shouldSort == false)
+ if (key->category == GIN_CAT_NORM_KEY)
{
- int res;
-
- res = ginCompareItemPointers(eo->list + eo->count - 1, en->list);
- Assert(res != 0);
+ CompactAttribute *att;
- if (res > 0)
- eo->shouldSort = true;
+ att = TupleDescCompactAttr(accum->ginstate->origTupdesc, key->attnum - 1);
+ hash = hash_combine(hash, datum_image_hash(key->key, att->attbyval, att->attlen));
}
- eo->list[eo->count] = en->list[0];
- eo->count++;
+ return hash;
}
-/* Comparator function for rbtree.c */
-static int
-cmpEntryAccumulator(const RBTNode *a, const RBTNode *b, void *arg)
+static bool
+gin_equal_key(struct ginbuild_hash *tb, GinHashKey *a, GinHashKey *b)
{
- const GinEntryAccumulator *ea = (const GinEntryAccumulator *) a;
- const GinEntryAccumulator *eb = (const GinEntryAccumulator *) b;
- BuildAccumulator *accum = (BuildAccumulator *) arg;
-
- return ginCompareAttEntries(accum->ginstate,
- ea->attnum, ea->key, ea->category,
- eb->attnum, eb->key, eb->category);
-}
+ BuildAccumulator *accum = (BuildAccumulator *) tb->private_data;
+ CompactAttribute *att;
-/* Allocator function for rbtree.c */
-static RBTNode *
-ginAllocEntryAccumulator(void *arg)
-{
- BuildAccumulator *accum = (BuildAccumulator *) arg;
- GinEntryAccumulator *ea;
+ if (a->attnum != b->attnum)
+ return false;
+ if (a->category != b->category)
+ return false;
+ if (a->category != GIN_CAT_NORM_KEY)
+ return true;
/*
- * Allocate memory by rather big chunks to decrease overhead. We have no
- * need to reclaim RBTNodes individually, so this costs nothing.
+ * Compare the actual key values using image equality.
+ * This is correct because we don't want to deduplicate at this point.
*/
- if (accum->entryallocator == NULL || accum->eas_used >= DEF_NENTRY)
- {
- accum->entryallocator = palloc_array(GinEntryAccumulator, DEF_NENTRY);
- accum->allocatedMemory += GetMemoryChunkSpace(accum->entryallocator);
- accum->eas_used = 0;
- }
-
- /* Allocate new RBTNode from current chunk */
- ea = accum->entryallocator + accum->eas_used;
- accum->eas_used++;
-
- return (RBTNode *) ea;
+ att = TupleDescCompactAttr(accum->ginstate->origTupdesc, a->attnum - 1);
+ return datumIsEqual(a->key, b->key, att->attbyval, att->attlen);
}
+#define ST_SORT sort_itempointers
+#define ST_ELEMENT_TYPE ItemPointerData
+#define ST_COMPARE(a, b) ginCompareItemPointers(a, b)
+#define ST_SCOPE static
+#define ST_DEFINE
+#include "lib/sort_template.h"
+
+#define ST_SORT sort_keys
+#define ST_ELEMENT_TYPE GinSortEntry
+#define ST_COMPARE_ARG_TYPE GinState
+#define ST_COMPARE(a, b, state) ginCompareAttEntries(state, a->hashkey.attnum, a->hashkey.key, a->hashkey.category, b->hashkey.attnum, b->hashkey.key, b->hashkey.category)
+#define ST_SCOPE static
+#define ST_DEFINE
+#include "lib/sort_template.h"
+
void
ginInitBA(BuildAccumulator *accum)
{
/* accum->ginstate is intentionally not set here */
- accum->allocatedMemory = 0;
- accum->entryallocator = NULL;
- accum->eas_used = 0;
- accum->tree = rbt_create(sizeof(GinEntryAccumulator),
- cmpEntryAccumulator,
- ginCombineData,
- ginAllocEntryAccumulator,
- NULL, /* no freefunc needed */
- accum);
+ accum->hash = ginbuild_create(CurrentMemoryContext, DEF_NENTRY, accum);
+ accum->allocatedMemory = accum->hash->size * sizeof(GinHashEntry);
+ accum->sorted_entries = NULL;
+ accum->num_entries = 0;
+ accum->current_pos = 0;
}
/*
@@ -142,124 +153,109 @@ getDatumCopy(BuildAccumulator *accum, OffsetNumber attnum, Datum value)
}
/*
- * Find/store one entry from indexed value.
+ * Insert one entry into the hash map.
+ * If the key already exists, append to its ItemPointer array.
+ * Otherwise, create a new hash entry with a new ItemPointer array.
*/
static void
ginInsertBAEntry(BuildAccumulator *accum,
ItemPointer heapptr, OffsetNumber attnum,
Datum key, GinNullCategory category)
{
- GinEntryAccumulator eatmp;
- GinEntryAccumulator *ea;
- bool isNew;
-
- /*
- * For the moment, fill only the fields of eatmp that will be looked at by
- * cmpEntryAccumulator or ginCombineData.
- */
- eatmp.attnum = attnum;
- eatmp.key = key;
- eatmp.category = category;
- /* temporarily set up single-entry itempointer list */
- eatmp.list = heapptr;
+ GinHashKey hashkey;
+ GinHashEntry *entry;
+ bool found;
+ uint64 oldsize;
+
+ hashkey.attnum = attnum;
+ hashkey.category = category;
+ if (category == GIN_CAT_NORM_KEY)
+ hashkey.key = getDatumCopy(accum, attnum, key);
+ else
+ hashkey.key = key;
- ea = (GinEntryAccumulator *) rbt_insert(accum->tree, (RBTNode *) &eatmp,
- &isNew);
+ oldsize = accum->hash->size;
+ entry = ginbuild_insert(accum->hash, hashkey, &found);
- if (isNew)
+ if (!found)
{
- /*
- * Finish initializing new tree entry, including making permanent
- * copies of the datum (if it's not null) and itempointer.
- */
- if (category == GIN_CAT_NORM_KEY)
- ea->key = getDatumCopy(accum, attnum, key);
- ea->maxcount = DEF_NPTR;
- ea->count = 1;
- ea->shouldSort = false;
- ea->list = palloc_array(ItemPointerData, DEF_NPTR);
- ea->list[0] = *heapptr;
- accum->allocatedMemory += GetMemoryChunkSpace(ea->list);
+ entry->items = palloc_array(ItemPointerData, DEF_ITEMS_PER_KEY);
+ entry->numItems = 0;
+ entry->allocatedItems = DEF_ITEMS_PER_KEY;
+ accum->allocatedMemory += (accum->hash->size - oldsize) * sizeof(GinHashEntry);
+ accum->allocatedMemory += GetMemoryChunkSpace(entry->items);
}
- else
+
+ if (entry->numItems >= entry->allocatedItems)
{
- /*
- * ginCombineData did everything needed.
- */
+ uint32 new_allocated;
+
+ if (entry->allocatedItems > UINT32_MAX / 2)
+ ereport(ERROR,
+ (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("too many GIN item pointers for a single key"),
+ errhint("Reduce \"maintenance_work_mem\".")));
+
+ accum->allocatedMemory -= GetMemoryChunkSpace(entry->items);
+ new_allocated = entry->allocatedItems * 2;
+ entry->items = repalloc_huge(entry->items, sizeof(ItemPointerData) * new_allocated);
+ entry->allocatedItems = new_allocated;
+ accum->allocatedMemory += GetMemoryChunkSpace(entry->items);
}
+
+ entry->items[entry->numItems++] = *heapptr;
}
-/*
- * Insert the entries for one heap pointer.
- *
- * Since the entries are being inserted into a balanced binary tree, you
- * might think that the order of insertion wouldn't be critical, but it turns
- * out that inserting the entries in sorted order results in a lot of
- * rebalancing operations and is slow. To prevent this, we attempt to insert
- * the nodes in an order that will produce a nearly-balanced tree if the input
- * is in fact sorted.
- *
- * We do this as follows. First, we imagine that we have an array whose size
- * is the smallest power of two greater than or equal to the actual array
- * size. Second, we insert the middle entry of our virtual array into the
- * tree; then, we insert the middles of each half of our virtual array, then
- * middles of quarters, etc.
- */
void
ginInsertBAEntries(BuildAccumulator *accum,
ItemPointer heapptr, OffsetNumber attnum,
Datum *entries, GinNullCategory *categories,
int32 nentries)
{
- uint32 step = nentries;
-
if (nentries <= 0)
return;
Assert(ItemPointerIsValid(heapptr) && attnum >= FirstOffsetNumber);
- /*
- * step will contain largest power of 2 and <= nentries
- */
- step |= (step >> 1);
- step |= (step >> 2);
- step |= (step >> 4);
- step |= (step >> 8);
- step |= (step >> 16);
- step >>= 1;
- step++;
-
- while (step > 0)
- {
- int i;
-
- for (i = step - 1; i < nentries && i >= 0; i += step << 1 /* *2 */ )
- ginInsertBAEntry(accum, heapptr, attnum,
- entries[i], categories[i]);
-
- step >>= 1; /* /2 */
- }
+ for (int i = 0; i < nentries; i++)
+ ginInsertBAEntry(accum, heapptr, attnum, entries[i], categories[i]);
}
-static int
-qsortCompareItemPointers(const void *a, const void *b)
-{
- int res = ginCompareItemPointers((const ItemPointerData *) a, (const ItemPointerData *) b);
-
- /* Assert that there are no equal item pointers being sorted */
- Assert(res != 0);
- return res;
-}
-
-/* Prepare to read out the rbtree contents using ginGetBAEntry */
+/* Prepare to read out the hash table contents using ginGetBAEntry */
void
ginBeginBAScan(BuildAccumulator *accum)
{
- rbt_begin_iterate(accum->tree, LeftRightWalk, &accum->tree_walk);
+ ginbuild_iterator iter;
+ GinHashEntry *entry;
+ uint32 i = 0;
+
+ accum->num_entries = accum->hash->members;
+ accum->current_pos = 0;
+
+ if (accum->num_entries == 0)
+ return;
+
+ accum->sorted_entries = palloc_array(GinSortEntry, accum->num_entries);
+ ginbuild_start_iterate(accum->hash, &iter);
+
+ while ((entry = ginbuild_iterate(accum->hash, &iter)) != NULL)
+ {
+ GinSortEntry *se = &accum->sorted_entries[i];
+ sort_itempointers(entry->items, entry->numItems);
+
+ se->hashkey = entry->hashkey;
+ se->items = entry->items;
+ se->numItems = entry->numItems;
+ i++;
+ }
+
+ Assert(i == accum->num_entries);
+ sort_keys(accum->sorted_entries, accum->num_entries, accum->ginstate);
+ accum->current_pos = 0;
}
/*
- * Get the next entry in sequence from the BuildAccumulator's rbtree.
+ * Get the next entry in sequence from the BuildAccumulator's sorted hash entries.
* This consists of a single key datum and a list (array) of one or more
* heap TIDs in which that key is found. The list is guaranteed sorted.
*/
@@ -268,25 +264,18 @@ ginGetBAEntry(BuildAccumulator *accum,
OffsetNumber *attnum, Datum *key, GinNullCategory *category,
uint32 *n)
{
- GinEntryAccumulator *entry;
- ItemPointerData *list;
-
- entry = (GinEntryAccumulator *) rbt_iterate(&accum->tree_walk);
+ GinSortEntry *entry;
- if (entry == NULL)
+ if (accum->current_pos >= accum->num_entries)
return NULL; /* no more entries */
- *attnum = entry->attnum;
- *key = entry->key;
- *category = entry->category;
- list = entry->list;
- *n = entry->count;
-
- Assert(list != NULL && entry->count > 0);
+ entry = &accum->sorted_entries[accum->current_pos];
+ accum->current_pos++;
- if (entry->shouldSort && entry->count > 1)
- qsort(list, entry->count, sizeof(ItemPointerData),
- qsortCompareItemPointers);
+ *attnum = entry->hashkey.attnum;
+ *key = entry->hashkey.key;
+ *category = entry->hashkey.category;
+ *n = entry->numItems;
- return list;
+ return entry->items;
}
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index 6725ee2839f..39c015b1355 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -18,7 +18,6 @@
#include "common/int.h"
#include "catalog/pg_am_d.h"
#include "fmgr.h"
-#include "lib/rbtree.h"
#include "nodes/tidbitmap.h"
#include "storage/bufmgr.h"
@@ -420,26 +419,15 @@ extern void ginadjustmembers(Oid opfamilyoid,
List *functions);
/* ginbulk.c */
-typedef struct GinEntryAccumulator
-{
- RBTNode rbtnode;
- Datum key;
- GinNullCategory category;
- OffsetNumber attnum;
- bool shouldSort;
- ItemPointerData *list;
- uint32 maxcount; /* allocated size of list[] */
- uint32 count; /* current number of list[] entries */
-} GinEntryAccumulator;
typedef struct
{
- GinState *ginstate;
- Size allocatedMemory;
- GinEntryAccumulator *entryallocator;
- uint32 eas_used;
- RBTree *tree;
- RBTreeIterator tree_walk;
+ GinState * ginstate;
+ Size allocatedMemory;
+ struct ginbuild_hash * hash;
+ struct GinSortEntry * sorted_entries;
+ uint32 num_entries;
+ uint32 current_pos;
} BuildAccumulator;
extern void ginInitBA(BuildAccumulator *accum);
--
2.51.0
[text/x-patch] v7-0002-Optimize-generate_trgm-with-radix-sort.patch (2.2K, 3-v7-0002-Optimize-generate_trgm-with-radix-sort.patch)
download | inline diff:
From eb8f6c90152ce976161b04c9d540f338a0640b2a Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Tue, 11 Nov 2025 13:18:59 +0100
Subject: [PATCH v7 2/3] Optimize generate_trgm() with radix sort
---
contrib/pg_trgm/trgm_op.c | 59 +++++++++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 8 deletions(-)
diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index 3fd087852e7..f5dc5242c7d 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -226,13 +226,56 @@ CMPTRGM_CHOOSE(const void *a, const void *b)
return CMPTRGM(a, b);
}
-#define ST_SORT trigram_qsort_signed
-#define ST_ELEMENT_TYPE_VOID
-#define ST_COMPARE(a, b) CMPTRGM_SIGNED(a, b)
-#define ST_SCOPE static
-#define ST_DEFINE
-#define ST_DECLARE
-#include "lib/sort_template.h"
+/*
+ * Needed to properly handle negative numbers in case char is signed.
+ */
+static inline unsigned char FlipSign(char x)
+{
+ return x^0x80;
+}
+
+static void radix_sort_trigrams_signed(trgm *trg, int count)
+{
+ trgm *buffer = palloc_array(trgm, count);
+ trgm *starts[256];
+ trgm *from = trg;
+ trgm *to = buffer;
+ int freqs[3][256];
+
+ /*
+ * Compute frequencies to partition the buffer.
+ */
+ memset(freqs, 0, sizeof(freqs));
+
+ for (int i=0; i<count; i++)
+ for (int j=0; j<3; j++)
+ freqs[j][FlipSign(trg[i][j])]++;
+
+ /*
+ * Do the sorting. Start with last character because that's the is "LSB"
+ * in a trigram. Avoid unnecessary copies by ping-ponging between the buffers.
+ */
+ for (int i=2; i>=0; i--)
+ {
+ trgm *old_from = from;
+ trgm *next = to;
+
+ for (int j=0; j<256; j++)
+ {
+ starts[j] = next;
+ next += freqs[i][j];
+ }
+
+ for (int j=0; j<count; j++)
+ memcpy(starts[FlipSign(from[j][i])]++, from[j], sizeof(trgm));
+
+ from = to;
+ to = old_from;
+ }
+
+ memcpy(trg, buffer, sizeof(trgm) * count);
+ pfree(buffer);
+}
#define ST_SORT trigram_qsort_unsigned
#define ST_ELEMENT_TYPE_VOID
@@ -247,7 +290,7 @@ static void
trigram_qsort(trgm *array, size_t n)
{
if (GetDefaultCharSignedness())
- trigram_qsort_signed(array, n, sizeof(trgm));
+ radix_sort_trigrams_signed(array, n);
else
trigram_qsort_unsigned(array, n, sizeof(trgm));
}
--
2.51.0
[text/x-patch] v7-0001-Make-btint4cmp-branchless.patch (1.0K, 4-v7-0001-Make-btint4cmp-branchless.patch)
download | inline diff:
From abeea158b0f58344abb085f914d5cc5f804395c8 Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Mon, 10 Nov 2025 15:40:11 +0100
Subject: [PATCH v7 1/3] Make btint4cmp() branchless
---
src/backend/access/nbtree/nbtcompare.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/backend/access/nbtree/nbtcompare.c b/src/backend/access/nbtree/nbtcompare.c
index 1d343377e98..ac16e3d993d 100644
--- a/src/backend/access/nbtree/nbtcompare.c
+++ b/src/backend/access/nbtree/nbtcompare.c
@@ -61,6 +61,7 @@
#include "utils/fmgrprotos.h"
#include "utils/skipsupport.h"
#include "utils/sortsupport.h"
+#include "common/int.h"
#ifdef STRESS_SORT_INT_MIN
#define A_LESS_THAN_B INT_MIN
@@ -203,12 +204,7 @@ btint4cmp(PG_FUNCTION_ARGS)
int32 a = PG_GETARG_INT32(0);
int32 b = PG_GETARG_INT32(1);
- if (a > b)
- PG_RETURN_INT32(A_GREATER_THAN_B);
- else if (a == b)
- PG_RETURN_INT32(0);
- else
- PG_RETURN_INT32(A_LESS_THAN_B);
+ PG_RETURN_INT32(pg_cmp_s32(a, b));
}
Datum
--
2.51.0
^ permalink raw reply [nested|flat] 31+ messages in thread
end of thread, other threads:[~2026-04-22 21:26 UTC | newest]
Thread overview: 31+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-21 20:50 Re: Reduce build times of pg_trgm GIN indexes Matthias van de Meent <[email protected]>
2026-01-23 10:18 ` David Geier <[email protected]>
2026-03-02 12:17 ` David Geier <[email protected]>
2026-04-07 11:27 ` Heikki Linnakangas <[email protected]>
2026-04-08 02:15 ` John Naylor <[email protected]>
2026-04-13 15:05 ` David Geier <[email protected]>
2026-04-14 13:05 ` David Geier <[email protected]>
2026-04-09 11:28 ` Bertrand Drouvot <[email protected]>
2026-04-13 09:41 ` Peter Eisentraut <[email protected]>
2026-04-13 11:04 ` Bertrand Drouvot <[email protected]>
2026-04-13 15:03 ` David Geier <[email protected]>
2026-04-13 17:15 ` Bertrand Drouvot <[email protected]>
2026-04-14 09:22 ` Heikki Linnakangas <[email protected]>
2026-04-12 18:05 ` Tom Lane <[email protected]>
2026-04-13 15:22 ` David Geier <[email protected]>
2026-04-13 16:14 ` Heikki Linnakangas <[email protected]>
2026-04-14 07:02 ` David Geier <[email protected]>
2026-04-15 11:06 ` Heikki Linnakangas <[email protected]>
2026-04-15 19:12 ` Peter Eisentraut <[email protected]>
2026-04-15 21:25 ` Tom Lane <[email protected]>
2026-04-16 08:45 ` Peter Eisentraut <[email protected]>
2026-04-16 09:49 ` Heikki Linnakangas <[email protected]>
2026-04-16 14:37 ` Tom Lane <[email protected]>
2026-04-16 17:30 ` Heikki Linnakangas <[email protected]>
2026-04-16 17:47 ` Tom Lane <[email protected]>
2026-04-17 19:21 ` Heikki Linnakangas <[email protected]>
2026-04-22 21:11 ` Heikki Linnakangas <[email protected]>
2026-04-16 14:43 ` Andres Freund <[email protected]>
2026-04-13 15:06 ` David Geier <[email protected]>
2026-04-14 14:24 ` David Geier <[email protected]>
2026-04-22 21:26 ` David Geier <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox