public inbox for [email protected]  
help / color / mirror / Atom feed
From: Mark Dilger <[email protected]>
To: Jeff Davis <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: Sat, 21 Mar 2026 20:14:37 -0700
Message-ID: <CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On Tue, Mar 3, 2026 at 1:01 PM Jeff Davis <[email protected]> wrote:

> On Sat, 2026-02-28 at 14:27 +0100, Daniel Verite wrote:
> > I tried 0001 with a non-UTF8 database and got quickly stuck:
>
> Attached new versions. I moved the encoding check into the SQL-callable
> casefold() function, and other callers use str_casefold(). That
> slightly simplifies what happens in ILIKE, also.
>
> I removed the citext changes. citext has somewhat of a legacy status, I
> think, so I'm not sure it makes sense to try to modernize or change it.
> Also, some SQL-language functions in citext use LOWER(), so the changes
> aren't enough: we'd need to make the SQL CASEFOLD function callable in
> other encodings, and also run a citext upgrade script to change the
> definitions.
>
> Note that these changes affect the result of some expressions (e.g.
> ILIKE), so could theoretically make an expression index or predicate
> index inconsistent.
>

Thanks for the patches!

After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm still
uses str_tolower() for trigram extraction (trgm_op.c:352 and :948).
With builtin collations, these produce different results.


Attachments:

  [application/octet-stream] WIP-v3-0001-Demonstrate-inconsistency-in-gin-index-vs-seq-sca.patch-WIP (12.1K, 3-WIP-v3-0001-Demonstrate-inconsistency-in-gin-index-vs-seq-sca.patch-WIP)
  download

view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Use CASEFOLD() internally rather than LOWER()
  In-Reply-To: <CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox