public inbox for [email protected]
help / color / mirror / Atom feedbtree_gin, bigint and number literals
4+ messages / 2 participants
[nested] [flat]
* btree_gin, bigint and number literals
@ 2025-01-31 17:02 Quentin de Metz <[email protected]>
2025-01-31 20:42 ` Re: btree_gin, bigint and number literals Tom Lane <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Quentin de Metz @ 2025-01-31 17:02 UTC (permalink / raw)
To: pgsql-novice
Hi all,
On a multi-column GIN index over a bigint column and a text column, the query planner does not filter the index on the bigint column when a condition on this column is specified with a number literal.
Please find below a simple reproduction script:
CREATE TABLE cars (owner_id BIGINT, license_plate TEXT);
INSERT INTO cars (owner_id, license_plate)
SELECT i % 100, md5(random()::text)
FROM generate_series(1, 10000) AS t(i);
CREATE EXTENSION btree_gin;
CREATE EXTENSION pg_trgm;
CREATE INDEX testidx ON cars USING GIN (owner_id, license_plate gin_trgm_ops);
-- below, we see that the Index Cond does not take the owner_id condition into account
EXPLAIN SELECT * FROM cars WHERE owner_id = 12 AND license_plate ILIKE '%abc%';
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on cars (cost=13.32..112.93 rows=1 width=41)
Recheck Cond: (license_plate ~~* '%abc%'::text)
Filter: (owner_id = 12)
-> Bitmap Index Scan on testidx (cost=0.00..13.32 rows=101 width=0)
Index Cond: (license_plate ~~* '%abc%'::text)
-- when explicitly casting the number literal to bigint, the Index Cond takes it into account
EXPLAIN SELECT * FROM cars WHERE owner_id = 12::bigint AND license_plate ILIKE '%abc%';
QUERY PLAN
---------------------------------------------------------------------------------------
Bitmap Heap Scan on cars (cost=21.52..25.54 rows=1 width=41)
Recheck Cond: ((owner_id = '12'::bigint) AND (license_plate ~~* '%abc%'::text))
-> Bitmap Index Scan on testidx (cost=0.00..21.52 rows=1 width=0)
Index Cond: ((owner_id = '12'::bigint) AND (license_plate ~~* '%abc%'::text))
-- the more permanent solution seems to be altering the operator type
ALTER OPERATOR FAMILY int8_ops USING gin ADD
OPERATOR 1 < (int8, int4) ,
OPERATOR 2 <= (int8, int4) ,
OPERATOR 3 = (int8, int4) ,
OPERATOR 4 >= (int8, int4) ,
OPERATOR 5 > (int8, int4) ,
FUNCTION 1 (int8, int4) btint84cmp(int8, int4);
QUERY PLAN
-----------------------------------------------------------------------------
Bitmap Heap Scan on cars (cost=21.52..25.54 rows=1 width=41)
Recheck Cond: ((owner_id = 12) AND (license_plate ~~* '%abc%'::text))
-> Bitmap Index Scan on testidx (cost=0.00..21.52 rows=1 width=0)
Index Cond: ((owner_id = 12) AND (license_plate ~~* '%abc%'::text))
Would you be open to considering a patch to include the ALTER OPERATOR snippet in the btree_gin install script, so that this works out of the box?
Quentin de Metz
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: btree_gin, bigint and number literals
2025-01-31 17:02 btree_gin, bigint and number literals Quentin de Metz <[email protected]>
@ 2025-01-31 20:42 ` Tom Lane <[email protected]>
2025-08-11 10:11 ` Re: btree_gin, bigint and number literals Quentin de Metz <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Tom Lane @ 2025-01-31 20:42 UTC (permalink / raw)
To: Quentin de Metz <[email protected]>; +Cc: pgsql-novice
"Quentin de Metz" <[email protected]> writes:
> On a multi-column GIN index over a bigint column and a text column, the query planner does not filter the index on the bigint column when a condition on this column is specified with a number literal.
Yeah, because "owner_id = 12" will use int84eq, which as you observe
is not supported by btree_gin's opclass.
> Would you be open to considering a patch to include the ALTER OPERATOR snippet in the btree_gin install script, so that this works out of the box?
I'd be quite surprised if that "just works" without any corresponding
changes in the C code, because btree_gin.c only knows about applying
same-type-on-both-sides comparison functions. (int8 vs int4 might
appear to work as long as you don't try very hard, but for example
it'd fail on 32-bit or big-endian hardware.) If you feel like writing
a patch that actually takes care of the matter fully, step right up.
regards, tom lane
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: btree_gin, bigint and number literals
2025-01-31 17:02 btree_gin, bigint and number literals Quentin de Metz <[email protected]>
2025-01-31 20:42 ` Re: btree_gin, bigint and number literals Tom Lane <[email protected]>
@ 2025-08-11 10:11 ` Quentin de Metz <[email protected]>
2025-08-11 14:10 ` Re: btree_gin, bigint and number literals Tom Lane <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Quentin de Metz @ 2025-08-11 10:11 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: pgsql-novice
Hello Tom,
I see that you have commited a change (e2b64fcef35f70f96fa92db56fbfa9ac2da136c7) which addresses this issue. Thank you!
I looked into this issue recently and still don't understand why this would not work for other hardware variants. Will it yield the wrong plan, or will the plan's execution yield wrong results?
I'm surprised because the SQL changes I proposed seemed relatively aligned with the existing extension source code which references support functions defined in code related to btree indexes (e.g. btint2cmp). These functions are already hardware-independent. Aren't these functions the ones called by the engine when executing a query and going through the index - as explained here (https://www.postgresql.org/docs/18/xindex.html#XINDEX-OPFAMILY)?
Also there a specific reason the integer-related operator classes defined in btree_gin (int2_ops, int4_ops, int8_ops) don't belong to the same operator family? That seems to be the direction suggested by the documentation I linked to above.
Finally, what is your approach to testing on 32-bit or big-endian hardware?
Thank you for your guidance,
Quentin
On Fri, Jan 31, 2025, at 21:42, Tom Lane wrote:
> "Quentin de Metz" <[email protected]> writes:
>> On a multi-column GIN index over a bigint column and a text column, the query planner does not filter the index on the bigint column when a condition on this column is specified with a number literal.
>
> Yeah, because "owner_id = 12" will use int84eq, which as you observe
> is not supported by btree_gin's opclass.
>
>> Would you be open to considering a patch to include the ALTER OPERATOR snippet in the btree_gin install script, so that this works out of the box?
>
> I'd be quite surprised if that "just works" without any corresponding
> changes in the C code, because btree_gin.c only knows about applying
> same-type-on-both-sides comparison functions. (int8 vs int4 might
> appear to work as long as you don't try very hard, but for example
> it'd fail on 32-bit or big-endian hardware.) If you feel like writing
> a patch that actually takes care of the matter fully, step right up.
>
> regards, tom lane
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: btree_gin, bigint and number literals
2025-01-31 17:02 btree_gin, bigint and number literals Quentin de Metz <[email protected]>
2025-01-31 20:42 ` Re: btree_gin, bigint and number literals Tom Lane <[email protected]>
2025-08-11 10:11 ` Re: btree_gin, bigint and number literals Quentin de Metz <[email protected]>
@ 2025-08-11 14:10 ` Tom Lane <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Tom Lane @ 2025-08-11 14:10 UTC (permalink / raw)
To: Quentin de Metz <[email protected]>; +Cc: pgsql-novice
"Quentin de Metz" <[email protected]> writes:
> I looked into this issue recently and still don't understand why
> this would not work for other hardware variants. Will it yield the
> wrong plan, or will the plan's execution yield wrong results?
Your original patch will yield wrong results in some cases, up to and
including crashing. That's because the representations of different
datatypes as Datums aren't guaranteed compatible.
> I'm surprised because the SQL changes I proposed seemed relatively
> aligned with the existing extension source code which references
> support functions defined in code related to btree indexes
> (e.g. btint2cmp). These functions are already hardware-independent.
Um ... not really. If you dig down into what btint8cmp does, for
example, it's different on 32-bit hardware than 64-bit hardware.
We do our best to hide that behind macros in the source code, but
that doesn't mean the difference isn't there. So you can't just
blindly apply btint8cmp to int4 or int2 values.
You might find this thread informative:
https://www.postgresql.org/message-id/flat/[email protected]
as it's proposing to remove one of the reasons why there's a problem.
But even if that goes in, I still would not trust your original patch.
There's no code-level guarantee that treating a Datum of one type as
being of some other type is okay.
> Also there a specific reason the integer-related operator classes
> defined in btree_gin (int2_ops, int4_ops, int8_ops) don't belong to
> the same operator family? That seems to be the direction suggested
> by the documentation I linked to above.
Whoever invented the btree_gin extension didn't make them that way.
It might be that btree_gin actually predates our invention of operator
families, not sure. Anyway we don't have an easily-compatible way
to rearrange them into a single family, and there wouldn't be that
much benefit in doing so. (We invented operator families mostly
to allow the planner to reason about cross-type comparisons, and
it only cares about btree and hash families.)
> Finally, what is your approach to testing on 32-bit or big-endian hardware?
Well, I've still got an ancient MacBook (32-bit PPC), and when that
isn't suitable I have an account with the GCC compile farm:
https://portal.cfarm.net
In principle you could also use a QEMU VM, but I've found that its
emulation of other architectures frequently leaves a lot to be
desired.
Of course, our own buildfarm also provides testing of such
architectures, but we try hard to iron out portability problems
before patches get to the buildfarm.
regards, tom lane
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2025-08-11 14:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-01-31 17:02 btree_gin, bigint and number literals Quentin de Metz <[email protected]>
2025-01-31 20:42 ` Tom Lane <[email protected]>
2025-08-11 10:11 ` Quentin de Metz <[email protected]>
2025-08-11 14:10 ` Tom Lane <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox