public inbox for [email protected]
help / color / mirror / Atom feedFrom: Chao Li <[email protected]>
To: Jeff Davis <[email protected]>
Cc: [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: Tue, 13 Jan 2026 10:14:02 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
> On Jan 13, 2026, at 02:22, Jeff Davis <[email protected]> wrote:
>
> There are a number of internal callers of LOWER(), and conceptually
> those should all be using CASEFOLD(). Patches attached.
>
> I'm not sure if we want the citext patch -- it would require REINDEX of
> all existing citext indexes after upgrade, and there's already a
> documented tip ("Consider using nondeterministic collations...), so
> perhaps it's a legacy extension anyway.
>
> It would be nice to make the tsearch change this release, as there are
> already changes that could require a reindex.
>
> I didn't change pg_trgm yet, because I think that we have to change the
> regex machinery to be aware of more than two case variants first (and
> potentially increasing string lengths, too).
>
> Regards,
> Jeff Davis
>
>
> <v1-0001-ILIKE-use-CASEFOLD-rather-than-LOWER.patch><v1-0002-citext-use-CASEFOLD-rather-than-LOWER.patch><v1-0003-dict_xsyn-use-CASEFOLD-rather-than-LOWER.patch><v1-0004-tsearch-use-CASEFOLD-rather-than-LOWER.patch>
Hi Jeff,
Thanks for the patch. I have reviewed the patch set and got a few comments for tests:
1 - 0001
```
+SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_C_UTF8;
```
Do we want to added one more test:
```
SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST;
?column?
----------
t
(1 row)
```
Which tests the different behaviors against collations.
2 - 0002
Do we need to add a test:
```
SELECT 'straße'::citext = 'STRASSE'::citext;
?column?
----------
f
(1 row)
```
I initially thought to add test cases with different collations, but after debugging, I found that citext intentionally ignores specified collation.
3 - 0003 LGTM. Seems the existing test coverage is good enough.
4 - 0004
I thought to suggest add a test:
```
SELECT to_tsvector('straße') @@ to_tsquery('strasse');
?column?
----------
f
(1 row)
```
But I don’t see existing tests under backend/tsearch. So, I’m now not sure whether or not to insist the suggestion.
BWT, while reviewing this patch, I noticed a copy-paste error in str_casefold():
```
errmsg("could not determine which collation to use for %s function",
- "lower()"),
+ "casefold()”),
```
I have posted a patch to fix. See [1].
[1] https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox