public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Jeff Davis <[email protected]>
Cc: [email protected]
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Date: Tue, 13 Jan 2026 10:14:02 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>



> On Jan 13, 2026, at 02:22, Jeff Davis <[email protected]> wrote:
> 
> There are a number of internal callers of LOWER(), and conceptually
> those should all be using CASEFOLD(). Patches attached.
> 
> I'm not sure if we want the citext patch -- it would require REINDEX of
> all existing citext indexes after upgrade, and there's already a
> documented tip ("Consider using nondeterministic collations...), so
> perhaps it's a legacy extension anyway.
> 
> It would be nice to make the tsearch change this release, as there are
> already changes that could require a reindex.
> 
> I didn't change pg_trgm yet, because I think that we have to change the
> regex machinery to be aware of more than two case variants first (and
> potentially increasing string lengths, too).
> 
> Regards,
> Jeff Davis
> 
> 
> <v1-0001-ILIKE-use-CASEFOLD-rather-than-LOWER.patch><v1-0002-citext-use-CASEFOLD-rather-than-LOWER.patch><v1-0003-dict_xsyn-use-CASEFOLD-rather-than-LOWER.patch><v1-0004-tsearch-use-CASEFOLD-rather-than-LOWER.patch>

Hi Jeff,

Thanks for the patch. I have reviewed the patch set and got a few comments for tests:

1 - 0001
```
+SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_C_UTF8;
```

Do we want to added one more test:
```
SELECT U&'straße' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST;
 ?column?
----------
 t
(1 row)
```
Which tests the different behaviors against collations.

2 - 0002

Do we need to add a test:
```
SELECT 'straße'::citext = 'STRASSE'::citext;
 ?column?
----------
 f
(1 row)
```

I initially thought to add test cases with different collations, but after debugging, I found that citext intentionally ignores specified collation.

3 - 0003 LGTM. Seems the existing test coverage is good enough.

4 - 0004

I thought to suggest add a test:
```
SELECT to_tsvector('straße') @@ to_tsquery('strasse');
 ?column?
----------
 f
(1 row)
```

But I don’t see existing tests under backend/tsearch. So, I’m now not sure whether or not to insist the suggestion.

BWT, while reviewing this patch, I noticed a copy-paste error in str_casefold():
```
                                 errmsg("could not determine which collation to use for %s function",
-                                               "lower()"),
+                                               "casefold()”),
```

I have posted a patch to fix. See [1].

[1] https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/










view thread (3+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Use CASEFOLD() internally rather than LOWER()
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox