public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter Eisentraut <[email protected]>
To: Daniel Verite <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Pgsql-Hackers <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Fri, 3 May 2024 20:58:30 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On 03.05.24 17:47, Daniel Verite wrote:
> 	Peter Eisentraut wrote:
> 
>>   However, off the top of my head, this definition has three flaws: (1)
>> It would make the single-character wildcard effectively an
>> any-number-of-characters wildcard, but only in some circumstances, which
>> could be confusing, (2) it would be difficult to compute, because you'd
>> have to check equality against all possible single-character strings,
>> and (3) it is not what the SQL standard says.
> 
> For #1 we're currently using the definition of a "character" as
> being any single point of code,

That is the definition that is used throughout SQL and PostgreSQL.  We 
can't change that without redefining everything.  To pick just one 
example, the various trim function also behave in seemingly inconsistent 
ways when you apply then to strings in different normalization forms. 
The better fix there is to enforce the normalization form somehow.

> Intuitively I think that our interpretation of "character" here should
> be whatever sequence of code points are between character
> boundaries [1], and that the equality of such characters would be the
> equality of their sequences of code points, with the string equality
> check of the collation, whatever the length of these sequences.
> 
> [1]:
> https://unicode-org.github.io/icu/userguide/boundaryanalysis/#character-boundary

Even that page says, what we are calling character here is really called 
a grapheme cluster.

In a different world, pattern matching, character trimming, etc. would 
work by grapheme, but it does not.







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox