Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s2y7C-007tvl-By for pgsql-hackers@arkaria.postgresql.org; Fri, 03 May 2024 18:58:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s2y79-00BM5Z-KG for pgsql-hackers@arkaria.postgresql.org; Fri, 03 May 2024 18:58:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s2y79-00BM5R-Aa for pgsql-hackers@lists.postgresql.org; Fri, 03 May 2024 18:58:36 +0000 Received: from fhigh6-smtp.messagingengine.com ([103.168.172.157]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s2y77-001O1P-Bs for pgsql-hackers@postgresql.org; Fri, 03 May 2024 18:58:35 +0000 Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 7C7151140130; Fri, 3 May 2024 14:58:32 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Fri, 03 May 2024 14:58:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eisentraut.org; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm2; t=1714762712; x=1714849112; bh=trybS16uzuV3ROH9iwbZQWOrREQzn6ky ohmRs81iPrc=; b=q1CeDpnbrGeDgUsydq0YMAtLyxlyobyNrhwmUeAt5TNSD0Pf 8CHgJuylAufc4PGFJ70lMqREqfxzmlsWDifIRAkCQsceKeJS6JLIyiPoJsT3Hedq 2VOSvTXHYQD4GUcX/xR+3ZaHb6yQyqeJzXuIJwcmaRDypAEaRiXqJKpgYaip1A4j yJd8obwdvottwNyqWveY6AbGhy5og0bWSeX3itP7lMGhGmWZt3UZEwqoXcrU/Uwy x1Jf10Km3M/Y0cOZy8550LvR6mnefzOfjc1bkXv8fEa10JoPYWCG5/fd7GmXfRM+ 8OupY89B7fkGW2n6fq+R+mBHM4FHlmxMKDIzaA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1714762712; x= 1714849112; bh=trybS16uzuV3ROH9iwbZQWOrREQzn6kyohmRs81iPrc=; b=k TEKamJvVeizNVOYrUUAG871ITD3zJwEE1w5cHG+Dib082E55EZzZVeTVuok7GKQ5 5MrxqaT3Cri++cfV2T0gPUCB/D/f5mh+ZI4n4vDuOcWs2CaoD4dOTpiYDqCFr430 8GEqkW27VAj1nYnl9E/Tzsi3rF3eXor4dwqiLoqTa0TAOnR/YcB/bvmtijY2dLfE 1L2j+BGP3Vdn9c9DVHO9LZGPgKjgYGN9Oc8NsbmIj3XTDqoZ1NfbEw4H6FDywBv0 GF8vaN/O33xj30zAuetgU52Jq97gfXvROTaKNHSC9xQdHsH2h4W08X3IVyS90j9a 7eM9T4IxeYaQ7PeU+7ZgQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvddvtddgudefudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enogfuuhhsphgvtghtffhomhgrihhnucdlgeelmdenucfjughrpefkffggfgfuvfevfhfh jggtgfesthejredttddvjeenucfhrhhomheprfgvthgvrhcugfhishgvnhhtrhgruhhtuc eophgvthgvrhesvghishgvnhhtrhgruhhtrdhorhhgqeenucggtffrrghtthgvrhhnpeei feffgeevfeehfeevgfegkeeileevgefhffduleehheefhfefjedvjeejgeduvdenucffoh hmrghinhepghhithhhuhgsrdhiohenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgr mhepmhgrihhlfhhrohhmpehpvghtvghrsegvihhsvghnthhrrghuthdrohhrgh X-ME-Proxy: Feedback-ID: ie0a040ee:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 3 May 2024 14:58:31 -0400 (EDT) Message-ID: Date: Fri, 3 May 2024 20:58:30 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Support LIKE with nondeterministic collations To: Daniel Verite Cc: Robert Haas , Pgsql-Hackers References: Content-Language: en-US From: Peter Eisentraut In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 03.05.24 17:47, Daniel Verite wrote: > Peter Eisentraut wrote: > >> However, off the top of my head, this definition has three flaws: (1) >> It would make the single-character wildcard effectively an >> any-number-of-characters wildcard, but only in some circumstances, which >> could be confusing, (2) it would be difficult to compute, because you'd >> have to check equality against all possible single-character strings, >> and (3) it is not what the SQL standard says. > > For #1 we're currently using the definition of a "character" as > being any single point of code, That is the definition that is used throughout SQL and PostgreSQL. We can't change that without redefining everything. To pick just one example, the various trim function also behave in seemingly inconsistent ways when you apply then to strings in different normalization forms. The better fix there is to enforce the normalization form somehow. > Intuitively I think that our interpretation of "character" here should > be whatever sequence of code points are between character > boundaries [1], and that the equality of such characters would be the > equality of their sequences of code points, with the string equality > check of the collation, whatever the length of these sequences. > > [1]: > https://unicode-org.github.io/icu/userguide/boundaryanalysis/#character-boundary Even that page says, what we are calling character here is really called a grapheme cluster. In a different world, pattern matching, character trimming, etc. would work by grapheme, but it does not.