Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfTvh-002B77-0x for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Jan 2026 02:14:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vfTvg-002E82-1F for pgsql-hackers@arkaria.postgresql.org; Tue, 13 Jan 2026 02:14:44 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfTvg-002E7s-0D for pgsql-hackers@lists.postgresql.org; Tue, 13 Jan 2026 02:14:44 +0000 Received: from mail-dl1-x1233.google.com ([2607:f8b0:4864:20::1233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vfTvb-0008Qy-1h for pgsql-hackers@postgresql.org; Tue, 13 Jan 2026 02:14:43 +0000 Received: by mail-dl1-x1233.google.com with SMTP id a92af1059eb24-1220154725fso2874236c88.0 for ; Mon, 12 Jan 2026 18:14:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768270477; x=1768875277; darn=postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/791ZYCsFqT6JHvGOANUMTRGy456uU+erKv7dVDOTUc=; b=W71XiblkP7R6tIooq/PCf8D61Piw9z0LcqltxYUYfg7C78kD5AjkAVzuze+b803ows LEq5IEp1gG3G5l2SIkp6NVK3oTVuy1S84Pl9i+98MI91AL9VlIpzwNOmq7RZuPOSIlOQ nxPRLqUrWANu4kBsZXnASv2L/lHXhwXOtSvpXPFud7/gKowaDYk/ylYPMBefgMBAADuX 8N/LmI+AySgCuLuWgXrEYg3IvbrEZULD/DglEfkCwmblVMG6JTg0PYACPkk6aT2T/jqm Gv1ZowA6kz78wxxt05W52pRbn24RYuTJF05Q1yIc2QPq/Ip/48zrFBzNHO9go7a59MoK XTxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768270477; x=1768875277; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/791ZYCsFqT6JHvGOANUMTRGy456uU+erKv7dVDOTUc=; b=KykeffyChrZNh33MyXMRX+DBcp3/GpcEHCql7+FS9oreXeDZ4mZwRDvK+lnH57LDki E4XhWoLXvWc9aHoia+NYITU9iy5lXTEYVwErLPkC1AadbuihRWuR9LvuHuWkOQGV1b0/ 8mct4x/q5LMQqXEaiQyPZlR0YjsPfcYYZIk6WSkXmEo22KXG0ogW5iFnOcVD4faCwS0T CdceNptkMPGqTjOpUAKMrOM5YAvmoA1Q4F/Typ9D6S/M+JV4geGbyx2+wzRD8Apnitkl 2n+YjbzgbnCoWHZXsPz+iAmTq4lHpmqWPiQgWBUlcgQOrfuY0lbBaKyUxG7Go93++JW6 adew== X-Gm-Message-State: AOJu0Yzws0TGOUoBQl2/T50Ci2hrL908ICJgQ0nmH2vWEpewqPDlTF0I 80PE3BkorjiT/NZbvrvkx8ojWeVeYdCQv0AwRAVfPUBof6zZ0FSSLIqk X-Gm-Gg: AY/fxX6Hu2t/tQL9ac1snKFu21jR2u51rUT5UGjfjtTf6FhKBk6dNtm/twdPN9in4Xg Qs2lQTxopjpCqlXYI0lIH36ilR0wss3JUnm1OS8Tw7snGCMjtD1KsfugQturRM8JiMCfPHIEWT7 cj2FQlYqgYIQ+TOFZzKM4AKUqW1At+i42RagAgkXBDS66iEOqefvA1okPcoxFaajZt7t/xL9a2C M/Lbj2TLpiNh+HVztdKVVxVq9fnvsQkdw+4R1/ydx6E3VbdQrud96pJuxFJBDgy7AlL6ua2mGro HKWRaWh93GdPm8MNaLQsIcFCSJFSzSh6YVZOwTZ1V+sXp4+VHvW79DnT9t+p8KrYEz+ri5qJemy PXsgBMjzkQL5EoY831L/Ioz2tyu6tBmVJncyKoceKW4mDeRZwniCehMOgGMG1HHTXSW5q5rgmq7 5y4NvHwFr3ME7fYToi3mc4 X-Google-Smtp-Source: AGHT+IEWqUsTE8FwPlkp+Gl/8HP8LvFR4rxWyvDgkAuFGQWgvtmq6y4GRU3JAoeeHY8RyEStLRKGfA== X-Received: by 2002:a05:7022:388a:b0:119:e569:f86c with SMTP id a92af1059eb24-1232b5b6d76mr1710730c88.9.1768270476818; Mon, 12 Jan 2026 18:14:36 -0800 (PST) Received: from smtpclient.apple ([142.171.105.12]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-121f248c246sm26466491c88.11.2026.01.12.18.14.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Jan 2026 18:14:36 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81.1.4\)) Subject: Re: Use CASEFOLD() internally rather than LOWER() From: Chao Li In-Reply-To: <64d7949bad90545f981ac7513fb0b4954daca2c9.camel@j-davis.com> Date: Tue, 13 Jan 2026 10:14:02 +0800 Cc: pgsql-hackers@postgresql.org Content-Transfer-Encoding: quoted-printable Message-Id: <1A46D941-E0A4-4B3E-AAEA-1F7B6CCD24E6@gmail.com> References: <64d7949bad90545f981ac7513fb0b4954daca2c9.camel@j-davis.com> To: Jeff Davis X-Mailer: Apple Mail (2.3826.700.81.1.4) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Jan 13, 2026, at 02:22, Jeff Davis wrote: >=20 > There are a number of internal callers of LOWER(), and conceptually > those should all be using CASEFOLD(). Patches attached. >=20 > I'm not sure if we want the citext patch -- it would require REINDEX = of > all existing citext indexes after upgrade, and there's already a > documented tip ("Consider using nondeterministic collations...), so > perhaps it's a legacy extension anyway. >=20 > It would be nice to make the tsearch change this release, as there are > already changes that could require a reindex. >=20 > I didn't change pg_trgm yet, because I think that we have to change = the > regex machinery to be aware of more than two case variants first (and > potentially increasing string lengths, too). >=20 > Regards, > Jeff Davis >=20 >=20 > = Hi Jeff, Thanks for the patch. I have reviewed the patch set and got a few = comments for tests: 1 - 0001 ``` +SELECT U&'stra=C3=9Fe' ILIKE U&'STRASSE' COLLATE PG_C_UTF8; ``` Do we want to added one more test: ``` SELECT U&'stra=C3=9Fe' ILIKE U&'STRASSE' COLLATE PG_UNICODE_FAST; ?column? ---------- t (1 row) ``` Which tests the different behaviors against collations. 2 - 0002 Do we need to add a test: ``` SELECT 'stra=C3=9Fe'::citext =3D 'STRASSE'::citext; ?column? ---------- f (1 row) ``` I initially thought to add test cases with different collations, but = after debugging, I found that citext intentionally ignores specified = collation. 3 - 0003 LGTM. Seems the existing test coverage is good enough. 4 - 0004 I thought to suggest add a test: ``` SELECT to_tsvector('stra=C3=9Fe') @@ to_tsquery('strasse'); ?column? ---------- f (1 row) ``` But I don=E2=80=99t see existing tests under backend/tsearch. So, I=E2=80=99= m now not sure whether or not to insist the suggestion. BWT, while reviewing this patch, I noticed a copy-paste error in = str_casefold(): ``` errmsg("could not determine which = collation to use for %s function", - "lower()"), + "casefold()=E2=80=9D), ``` I have posted a patch to fix. See [1]. [1] = https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=3D7US_9EMA@= mail.gmail.com Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/