Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5PPX-003DsX-18 for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 14:40:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5PPV-00ESqg-2o for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Mar 2026 14:40:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5PPV-00ESqX-1M for pgsql-hackers@lists.postgresql.org; Wed, 25 Mar 2026 14:40:42 +0000 Received: from mail-oi1-x229.google.com ([2607:f8b0:4864:20::229]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w5PPO-00000000ytK-2Iki for pgsql-hackers@postgresql.org; Wed, 25 Mar 2026 14:40:35 +0000 Received: by mail-oi1-x229.google.com with SMTP id 5614622812f47-466ec4c6852so1268982b6e.3 for ; Wed, 25 Mar 2026 07:40:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774449633; cv=none; d=google.com; s=arc-20240605; b=cg/eWRS/dw4vo0sE/KzI5ArQkgafFR9e3I+LhwBid30RLrvkUk02gvxrizu8+Ch143 AB3nTHyrviwXWYH3b4FYA0P2pP407TFsG8IKi2theBm6B0RlxwiOzi8DiJ7aaQ2qufZq pY0TohxfRy6uPMFqSZqPLNbY5bBXZeOkkWk4tq5LhBoCP2cIz2kaAfIhrQIkToqCyaQR W66QhtHRPGJ43G4vuwyilDfeTOYXS9Yj8MHO17k5WAOjKGIDht/m4yMX3+17as+glopD Pkt16k2vzPVqsooZSJAQba9pgAahRZ66WC5Dh1yJMq9DLWlfU9Mt+6GB34smb5iJK944 Vy0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=vJaQbLW1xri07X4sFw9hmE30HTTpe9EYPRcmYr71qAE=; fh=i3fuNMLSXUCJzdIQpdqBHt2T5P6F5sWSItd3qf1b9z0=; b=QzLqwQYBSrjMZE9YrHfud9+VvvD5wFOVcdU5Y4A3Syzv7jP26w27OWMZV558nNwk3x wDPElAnkyMUqN67OJjb2nl9qm0Lk2n9A6lSX3AGr6aZqM8NPkIizwF+kPQngn8dUpAA9 YUrvrtuOPZB8W8JNCg/fKLe28cXGDT7Iezys9Ib+6bYAjy+TeVUkP7n9LKhlLBaQ2j7V mGqVidhdVChmondbtGuVr3AnncewxyJV9pdnzQGHdQy4OzWzZrWNnYjeEsJU0X9lDztu IwjgNUvC94MOAFA4NlDjCkHlsTzYCwbEgjQD9B+xZsHd/+z/ssG1G75eVHfe55LeJ6pi tp7A==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1774449633; x=1775054433; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vJaQbLW1xri07X4sFw9hmE30HTTpe9EYPRcmYr71qAE=; b=AXIdUkG83tzE1+GXr+EvrUY1UWcvnAvTFB7hRGkeuZJBOb6tsBAnvn93dVhWFUKX1u 7pVbwT1yv8dA5jWK0hbRunl6evt7sevFXvdqYdbjrH1WDTe2s5U3fvitD6fxV3pUeYJ6 qCZfXE0+LrEwMbyPqYEObaTn2s6L9nym/BZ/hl/1nCuRNhVyZkzXmRze5RD8EYJN6hci adFoZlfO0Xz2Lu74YISclWmMwklXJOcnnjvJULw/Rpsn/73uI9kbuvRdSqCUoVWbGR9S 79zyNZwjTM7UMzjqeZ60vTwLhWfRfVBtTPOcwZW9srgO8tzGJjqV98XiI3VSKP4vVGIa NvBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774449633; x=1775054433; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vJaQbLW1xri07X4sFw9hmE30HTTpe9EYPRcmYr71qAE=; b=r2O9DCNM9kG7FwjxpWYu6DFVbyvQQ9Bsruj8AFSWKfHTwHYIbIvlCvm+WfmEBBfilY 7rS0ckm0CJFX4t2B8SqDoxNo+DPUvBhv35rz6yc1hXew/ElGXj2/P+l6BQ1Js/OxyIc7 G9R+qrBpy+0dcfLZcipqzDa3zYeeQeAWMB+IKvvzizN8y+8CSuatnqRleUCACEw0F8IY E3RNowjFuQxG9m9Of+aDAEAZ7A720LdOxxiT3yuyeWh/dhfqw2IJAwN0vfjnZZHvIZSz pb40m21hPmNk/zC7Z04P4po3lxnuJkeJ9Ii66k/y5vb7o+oNEO0IjaRm46ukqQ34FO2r MoqA== X-Forwarded-Encrypted: i=1; AJvYcCVIlc78iuyuSrVyXMwRHtgh66QnKPkL16z0cn3Y16enoF50pga3dYm4aHgA4Jf3jJXidrYtzKhOWg/8t/ix@postgresql.org X-Gm-Message-State: AOJu0Ywj3c+FYWYln44XMYc1/46xcp9shHj97iUHWxLYqugd6Z01PHIZ FjjH5epPFi7/fAxFDidKDuCOZCqfkxJ8caLP8tYfVE8/3VDkbQX9jefjjAiqxBQUrYrmwhMaW18 cjScE7E9Cfdx5uvYiys6HBD+luUelpiqYqAlrEjNCWHKcBPludmY= X-Gm-Gg: ATEYQzzfi4G3AL+1Be7rnLFCM7aWN3HHvQQxfXMGdQCuP/aqzOvOs6j+oY89MKxVauR PaPrc+nIKChnqDxMrM7rTbrNyvaFiRWhFKbDrdzcPPEurS+jg/u5Q4jl0AQcrMPqQyU2ACwO0+e LKXlYelEV722oMuiffTWFtUjITC3phT7yuunWXPpzG8IxSLnmLI9mzG46aokqhgC64XJ61F1cKA l0PHO+wJ+gR+wByFmvuck2lQhP6zrThX4/RGTDWYlJsFuYD0bDDpvO4/1HkZRk6fmynUw06X9Dq 4QuT5r63x5EWMECPqd7b X-Received: by 2002:a05:6808:4fcf:b0:45e:6cac:92c9 with SMTP id 5614622812f47-46a5c72c1d6mr1612272b6e.27.1774449633423; Wed, 25 Mar 2026 07:40:33 -0700 (PDT) MIME-Version: 1.0 References: <0a4f41b84efa5d821aba166658fb845f1ab97f03.camel@j-davis.com> <449d14b3cb80e259bbd9d8728ea204b15b6a025a.camel@j-davis.com> In-Reply-To: <449d14b3cb80e259bbd9d8728ea204b15b6a025a.camel@j-davis.com> From: Mark Dilger Date: Wed, 25 Mar 2026 07:40:23 -0700 X-Gm-Features: AQROBzCer-wsUTp4--QEbprOYd8kHjzyKefE_mHnn5NVbhhPbiNQsAXwj076hnM Message-ID: Subject: Re: Use CASEFOLD() internally rather than LOWER() To: Jeff Davis Cc: Daniel Verite , pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000128e05064dda402f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000128e05064dda402f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 24, 2026 at 4:07=E2=80=AFPM Jeff Davis wrot= e: > On Sat, 2026-03-21 at 20:14 -0700, Mark Dilger wrote: > > After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm > > still > > uses str_tolower() for trigram extraction (trgm_op.c:352 and :948). > > With builtin collations, these produce different results. > > Interesting, thank you. As stated in the original message, I was unsure > about changing pg_trgm without adjusting the regex logic, also: > > > https://www.postgresql.org/message-id/64d7949bad90545f981ac7513fb0b4954da= ca2c9.camel@j-davis.com > > do you have a suggestion about an easy way to do that, or should we > revisit in the next cycle? > pg_trgm appears to be lossy, with recheck logic. I would think you just need to make it give answers which at least include everything that a regex would match, and then allow recheck to prune that down. My concern is having pg_trgm give less than all the answers, so that after recheck you get fewer results than a seqscan would have returned. Would switching to casefold be strictly broader than regex? If so, you would just need to convert pg_trgm to use casefold and then rely on the recheck machinery. Sorry if this misses something discussed upthread. I'm clearly assuming here that you don't mind that such a change necessitates a REINDEX. --=20 *Mark Dilger* --000000000000128e05064dda402f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Mar 24,= 2026 at 4:07=E2=80=AFPM Jeff Davis <pgsql@j-davis.com> wrote:
On Sat, 2026-0= 3-21 at 20:14 -0700, Mark Dilger wrote:
> After v2-0001, ILIKE uses str_casefold() for matching, but pg_trgm
> still
> uses str_tolower() for trigram extraction (trgm_op.c:352 and :948). > With builtin collations, these produce different results.

Interesting, thank you. As stated in the original message, I was unsure
about changing pg_trgm without adjusting the regex logic, also:

http= s://www.postgresql.org/message-id/64d7949bad90545f981ac7513fb0b4954daca2c9.= camel@j-davis.com

do you have a suggestion about an easy way to do that, or should we
revisit in the next cycle?

pg_trgm appe= ars to be lossy, with recheck logic.=C2=A0 I would think you just need to m= ake it give answers which at least include everything that a regex would ma= tch, and then allow recheck to prune that down.=C2=A0 My concern is having = pg_trgm give less than all the answers, so that after recheck you get fewer= results than a seqscan would have returned.=C2=A0 Would switching to casef= old be strictly broader than regex?=C2=A0 If so, you would just need to con= vert pg_trgm to use casefold and then rely on the recheck machinery.
<= div>
Sorry if this misses something discussed upthread.=C2=A0= I'm clearly assuming here that you don't mind that such a change n= ecessitates a REINDEX.=C2=A0

--

Mar= k Dilger
--000000000000128e05064dda402f--