public inbox for [email protected]
help / color / mirror / Atom feedFrom: Peter Eisentraut <[email protected]>
To: Daniel Verite <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Pgsql-Hackers <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Fri, 3 May 2024 20:53:52 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
On 03.05.24 16:58, Daniel Verite wrote:
> * Generating bounds for a sort key (prefix matching)
>
> Having sort keys for strings allows for easy creation of bounds -
> sort keys that are guaranteed to be smaller or larger than any sort
> key from a give range. For example, if bounds are produced for a
> sortkey of string “smith”, strings between upper and lower bounds
> with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
> of upper bounds can be generated - the first one will match only
> strings of equal length, while the second one will match all the
> strings with the same initial prefix.
>
> CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
> the maximum primary weight, so that for example the string
> “smith\uFFFF” can be used as the upper bound rather than modifying
> the sort key for “smith”.
>
> In other words it says that
>
> col LIKE 'smith%' collate "nd"
>
> is equivalent to:
>
> col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"
>
> which could be obtained from an index scan, assuming a btree
> index on "col" collate "nd".
>
> U+FFFF is a valid code point but a "non-character" [1] so it's
> not supposed to be present in normal strings.
Thanks, this could be very useful!
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Support LIKE with nondeterministic collations
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox