public inbox for [email protected]
help / color / mirror / Atom feedFrom: Mark Dilger <[email protected]>
To: Jeff Davis <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: Wed, 25 Mar 2026 17:01:26 -0700
Message-ID: <CAHgHdKuGR7aJxZu7VTPA+kEDkzqJvKmi5799rhW+sKyt-WVihQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAHgHdKt+_+QhHK8WXQSoMNeUz43Cp2zGNEVX6=0RSaksA9zyJw@mail.gmail.com>
<[email protected]>
<CAHgHdKtb2jD+DaTJU+3jnQRZ9hEXSDcPCR8DCCzZTTVeo4jQcA@mail.gmail.com>
<[email protected]>
On Wed, Mar 25, 2026 at 2:02 PM Jeff Davis <[email protected]> wrote:
> I think the precise question would be: "are there any two characters
> that lowercase to the same character but do not casefold to the same
> character?".
>
I don't know. I'll set up a test to iterate across all locales across all
character pairs... no, I didn't find any on my system. Other searching
suggests that the Turkish and Azerbaijani locale do have this
characteristic, with I (U+0049) lowercasing to ı (U+0131) and case folding
to i (U+0069) while ı (U+0131) lowercases to ı (U+0131) but also case folds
to ı (U+0131). I have not confirmed that empirically, though.
> I don't have a counterexample, so perhaps using casefold would still be
> fine.
>
> Thoughts? Should we enhance regexes to consider more than two case
> variants first, or should we proceed with some of these patches (and/or
> a similar change to pg_trgm)?
>
I don't want to take a strong position either way. I'm still wrapping my
head around the various implications of the proposed changes, and don't
feel I have a complete picture yet.
--
*Mark Dilger*
view thread (4+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
In-Reply-To: <CAHgHdKuGR7aJxZu7VTPA+kEDkzqJvKmi5799rhW+sKyt-WVihQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox