public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jeff Davis <[email protected]>
To: Daniel Verite <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Pgsql-Hackers <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Wed, 31 Jul 2024 15:26:34 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On Fri, 2024-05-03 at 16:58 +0200, Daniel Verite wrote:
>    * Generating bounds for a sort key (prefix matching)
> 
>    Having sort keys for strings allows for easy creation of bounds -
>    sort keys that are guaranteed to be smaller or larger than any
> sort
>    key from a give range. For example, if bounds are produced for a
>    sortkey of string “smith”, strings between upper and lower bounds
>    with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
>    of upper bounds can be generated - the first one will match only
>    strings of equal length, while the second one will match all the
>    strings with the same initial prefix.
> 
>    CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
>    the maximum primary weight, so that for example the string
>    “smith\uFFFF” can be used as the upper bound rather than modifying
>    the sort key for “smith”.
> 
> In other words it says that
> 
>   col LIKE 'smith%' collate "nd"
> 
> is equivalent to:
> 
>   col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"

That logic seems to assume something about the collation. If you have a
collation that orders strings by their sha256 hash, that would entirely
break the connection between prefixes and ranges, and it wouldn't work.

Is there something about the way collations are defined that inherently
maintains a connection between a prefix and a range? Does it remain
true even when strange rules are added to a collation?

Regards,
	Jeff Davis







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox