Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5YAS-003Me6-0J for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Mar 2026 00:01:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5YAQ-0001bE-1k for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Mar 2026 00:01:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5YAQ-0001b4-0J for pgsql-hackers@lists.postgresql.org; Thu, 26 Mar 2026 00:01:42 +0000 Received: from mail-ot1-x32a.google.com ([2607:f8b0:4864:20::32a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5YAN-000000013NB-0Him for pgsql-hackers@postgresql.org; Thu, 26 Mar 2026 00:01:40 +0000 Received: by mail-ot1-x32a.google.com with SMTP id 46e09a7af769-7d7e5e8c907so256740a34.0 for ; Wed, 25 Mar 2026 17:01:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774483298; cv=none; d=google.com; s=arc-20240605; b=Y8vw+4ck//hkkuiC6mt4gdTOKEsA2dysIKkuzcO54Kuzs1VdEMNCVVhr4KOUkrZB+E 0Y3J2o6lamkJlDlrBPooZaGUTE9gAD/efhlCIfzv30ia+krozr0hykc0/E/y0jiCL++8 q1MsOZ2M8ylZFFvFgamaC28L2Yb5fl8gaIldVvXb/u82/MiVefnHjNd463uXILg6Nf3D pvnTBhUifyycrrIimuQrpTtshJC6d6oGV1A8KtmNJqf5sTolZs3O48o8O2G9VHshWnAp GDhKnVxbv8Kb1cYUzm+Ce9hEd2uUBfbw8i+ckUy+59P2lE5VlxZY037f55wCZpWoYgJg AqPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=cAFl5GiZ+idgA9h88vcPjc9VduwSA4DgW5nyIfqonb8=; fh=IL7IBiseU1PaoPSMAKepf1IN76smCKXS8qP9LBRjiZQ=; b=NFXdkpYn4z/LGvTgaTVTuKN4+nnK9CGqVUXtdxsPd+udL/NlEoAmBx2sHLBJeZ3+eT Dy512tWHIPAFqBdjDjRro7URe0d/cAvFnMhfTU6fwPcU4ackBQmUXNMUmnWvLesl6sJc QLaIHv0bPerIuJ6OnBqDKB9pd/eTcJkV3/UcZ3gqrVdUbur6v5ClIAY+ubwqJxxPON4m DxwiE/TY1q3fMjXAjiVLk2Hs0vDHoBY9K0WZlD3N4wKpctyInTWu5tktem9tnjkOAA+q 96w1P4pEx2/h4f6Dl39Wgyl8/rvhH7XLfTHFCryWO6ChhXHPQ1Db50HQYBF3h0W+VeAx Aokg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1774483298; x=1775088098; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cAFl5GiZ+idgA9h88vcPjc9VduwSA4DgW5nyIfqonb8=; b=a9tirzd7WJ6K2CB/wwc5AiUqYewKzGoVThpocaJvOBrNYEW1lVQGjZyN2xzlFKzp7T +yVn8xL7FtZu111yF931be2M8J2FsM2X2atHB8usdjFFpXj/+niEHGwnXY5bLN0xpZPe Rhmph2ADaZBuA8RiOCCNmFOjLpmm+zq/25lpZcRo6ALo0yz8bBSO09LL/UrRWbqqRwg5 rTP0MZxa9FWDopH/HDF+ws1wVLhgM07fGrtsfvg9u7O1L5/DSHXBQKnNNpTN1oMBb038 GHgncgA/840PDWJr9F595hIYD5lgQq9LtSZBCoiB6BDdek8DlDiPgA0vW7D9+y50RYFV W6ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774483298; x=1775088098; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cAFl5GiZ+idgA9h88vcPjc9VduwSA4DgW5nyIfqonb8=; b=i+L5rKhJmAXaQNqWq04WSWBkIjfn2c2HjAauHnN6+06HO3y9z19gSU87C2BcHPc0FA yiumy6vKo5fwEuuw4mJh0Cngl4fJMOV28A4oVHrtDWlJyXKvrvahWj4gc01x3u44HFZg ZQP90VSzPtSa9pST+mOW/sCmDUNdhJkYcmufADbQ/5nP7hj4KecO1ffvDdBem6+BY45G IoZd8B0XruBEhj48VYyuQuhLsy33l078MMx71UVNlLYAQZwHIWo8Ayimyq84JzES5M3L swtfeXbeGAsyQ64bK+X6VgEmDzp2IQUikfmOUkN17EEgsUvRH35AiVW3yyT7IAAQVNRq h7CQ== X-Forwarded-Encrypted: i=1; AJvYcCVG5vw0GYA4HtEy665ytdd4UHnN7rB0oB7tPGnMoy5Nsy5rwjR7MZZpCKHpY2ie+/JZRiTLTv0uf2STpwDC@postgresql.org X-Gm-Message-State: AOJu0Yzmu+9lCxrK8fZ0KyT//6amGFDjzfx0OG8hg3GFGKP54qoD/8DG cK9WLBIBr3zj7XK9zgxteQxEqj8o24vEzJhML6nsC3kFrvkmBF8l/KxHANA1XzQiJmO0j7HSrgU XzaBQu5nZNql7FV+x/39c4Hz8O/PsEijL2Fpn2PW07Lg3sA4a2bu+wA== X-Gm-Gg: ATEYQzxvKnuuxXkPNpb0mrcjR2Fl/8ISuPbYS2MEK5u+u+RgD6FNugunqhl/VdWY5qn fsAnX4wBTXXvCH8rUjkPW6I5mrcI0yVRLuuMsA1zq7alQn8GoVBcslc1lZQTUAZi0EdfEpm/H+Y oYR9padSRIHZEpLl2PCiKtkSqoExDCRA9sFUeOaoPlXB5oQ/P+1sXwCOtZDudod/6v0h2Ec0JBi Npehc7jUSwl5d93EKZ6QqNzyZuwUh40P8iawuVdN1ognwvR9NggXgBf34nkyrQudnZqEXakqYxV g17/U33xLQ== X-Received: by 2002:a05:6830:3886:b0:7d7:b701:ef54 with SMTP id 46e09a7af769-7d9d68b3c55mr3018400a34.29.1774483297635; Wed, 25 Mar 2026 17:01:37 -0700 (PDT) MIME-Version: 1.0 References: <0a4f41b84efa5d821aba166658fb845f1ab97f03.camel@j-davis.com> <449d14b3cb80e259bbd9d8728ea204b15b6a025a.camel@j-davis.com> <0c21d77497c2316f9f5af143122dd24a81eb40db.camel@j-davis.com> In-Reply-To: <0c21d77497c2316f9f5af143122dd24a81eb40db.camel@j-davis.com> From: Mark Dilger Date: Wed, 25 Mar 2026 17:01:26 -0700 X-Gm-Features: AQROBzAtjQyvoVjHxjRYSkxvJNbA6IXhuozHnfb6baFjv7TbxsznzIGpOk0a9Mk Message-ID: Subject: Re: Use CASEFOLD() internally rather than LOWER() To: Jeff Davis Cc: Daniel Verite , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="0000000000009da911064de216dc" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009da911064de216dc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Mar 25, 2026 at 2:02=E2=80=AFPM Jeff Davis wrot= e: > I think the precise question would be: "are there any two characters > that lowercase to the same character but do not casefold to the same > character?". > I don't know. I'll set up a test to iterate across all locales across all character pairs... no, I didn't find any on my system. Other searching suggests that the Turkish and Azerbaijani locale do have this characteristic, with I (U+0049) lowercasing to =C4=B1 (U+0131) and case fol= ding to i (U+0069) while =C4=B1 (U+0131) lowercases to =C4=B1 (U+0131) but also = case folds to =C4=B1 (U+0131). I have not confirmed that empirically, though. > I don't have a counterexample, so perhaps using casefold would still be > fine. > > Thoughts? Should we enhance regexes to consider more than two case > variants first, or should we proceed with some of these patches (and/or > a similar change to pg_trgm)? > I don't want to take a strong position either way. I'm still wrapping my head around the various implications of the proposed changes, and don't feel I have a complete picture yet. --=20 *Mark Dilger* --0000000000009da911064de216dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Wed, Mar 25,= 2026 at 2:02=E2=80=AFPM Jeff Davis <pgsql@j-davis.com> wrote:
I think the precise question would be: "are there any two characters that lowercase to the same character but do not casefold to the same
character?".

I don't know.=C2= =A0 I'll set up a test to iterate across all locales across all charact= er pairs... no, I didn't find any on my system.=C2=A0 Other searching s= uggests that the Turkish and Azerbaijani locale do have this characteristic= , with I (U+0049) lowercasing to =C4=B1 (U+0131) and case folding to i (U+0= 069) while =C4=B1 (U+0131) lowercases to =C4=B1 (U+0131) but also case fold= s to =C4=B1 (U+0131).=C2=A0 I have not confirmed that empirically, though.<= /div>
=C2=A0
I don't have a counterexample, so perhaps using casefold would still be=
fine.

Thoughts? Should we enhance regexes to consider more than two case
variants first, or should we proceed with some of these patches (and/or
a similar change to pg_trgm)?

I don'= ;t want to take a strong position either way.=C2=A0 I'm still wrapping = my head around the various implications of the proposed changes, and don= 9;t feel I have a complete picture yet.

--

Mark Dilger
--0000000000009da911064de216dc--