public inbox for [email protected]  
help / color / mirror / Atom feed
From: Peter Eisentraut <[email protected]>
To: Daniel Verite <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Pgsql-Hackers <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Fri, 3 May 2024 20:53:52 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On 03.05.24 16:58, Daniel Verite wrote:
>     * Generating bounds for a sort key (prefix matching)
> 
>     Having sort keys for strings allows for easy creation of bounds -
>     sort keys that are guaranteed to be smaller or larger than any sort
>     key from a give range. For example, if bounds are produced for a
>     sortkey of string “smith”, strings between upper and lower bounds
>     with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
>     of upper bounds can be generated - the first one will match only
>     strings of equal length, while the second one will match all the
>     strings with the same initial prefix.
> 
>     CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
>     the maximum primary weight, so that for example the string
>     “smith\uFFFF” can be used as the upper bound rather than modifying
>     the sort key for “smith”.
> 
> In other words it says that
> 
>    col LIKE 'smith%' collate "nd"
> 
> is equivalent to:
> 
>    col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"
> 
> which could be obtained from an index scan, assuming a btree
> index on "col" collate "nd".
> 
> U+FFFF is a valid code point but a "non-character" [1] so it's
> not supposed to be present in normal strings.

Thanks, this could be very useful!







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox