public inbox for [email protected]
help / color / mirror / Atom feedFrom: Mark Dilger <[email protected]>
To: Jeff Davis <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: Wed, 25 Mar 2026 07:40:23 -0700
Message-ID: <CAHgHdKtb2jD+DaTJU+3jnQRZ9hEXSDcPCR8DCCzZTTVeo4jQcA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com>
<[email protected]>
On Tue, Mar 24, 2026 at 4:07 PM Jeff Davis <[email protected]> wrote:
> On Sat, 2026-03-21 at 20:14 -0700, Mark Dilger wrote:
> > After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm
> > still
> > uses str_tolower() for trigram extraction (trgm_op.c:352 and :948).
> > With builtin collations, these produce different results.
>
> Interesting, thank you. As stated in the original message, I was unsure
> about changing pg_trgm without adjusting the regex logic, also:
>
>
> https://www.postgresql.org/message-id/[email protected]
>
> do you have a suggestion about an easy way to do that, or should we
> revisit in the next cycle?
>
pg_trgm appears to be lossy, with recheck logic. I would think you just
need to make it give answers which at least include everything that a regex
would match, and then allow recheck to prune that down. My concern is
having pg_trgm give less than all the answers, so that after recheck you
get fewer results than a seqscan would have returned. Would switching to
casefold be strictly broader than regex? If so, you would just need to
convert pg_trgm to use casefold and then rely on the recheck machinery.
Sorry if this misses something discussed upthread. I'm clearly assuming
here that you don't mind that such a change necessitates a REINDEX.
--
*Mark Dilger*
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
In-Reply-To: <CAHgHdKtb2jD+DaTJU+3jnQRZ9hEXSDcPCR8DCCzZTTVeo4jQcA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox