public inbox for [email protected]  
help / color / mirror / Atom feed
From: PG Doc comments form <[email protected]>
To: [email protected]
Cc: [email protected]
Subject: Does the POSITION() function takes into account the COLLATION... or not ?!?
Date: Tue, 15 Feb 2022 16:17:53 +0000
Message-ID: <[email protected]> (raw)

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/functions-string.html
Description:

The doc doesn't say anything about the impact of the COLLATION on the
results of any of the string functions, so I tried some interesting tests
with POSITION():

SELECT POSITION(('ß' COLLATE "de_DE.utf8") IN 'DASS'); -- should return 3
instead of 0 !?!
SELECT POSITION(('ss' COLLATE "de_DE.utf8") IN 'daß'); -- should return 3
instead of 0 !?!

SELECT POSITION(('oe' COLLATE "fr_FR.utf8") IN 'bœuf'); -- should return 2
instead of 0 !?!
SELECT POSITION(('œ' COLLATE "fr_FR.utf8") IN 'boeuf'); -- should return 2
instead of 0 !?!

SELECT POSITION(('å' COLLATE "en_US.utf8") IN 'yeah'); -- should return 3
instead of 0 !?!
SELECT POSITION(('o' COLLATE "en_US.utf8") IN 'ångström'); -- should return
7 instead of 0 !?!

==> up to here, this seems pretty enough to conclude that POSITION() doesn't
care at all about COLLATION and always perform a byte search.

Now the great surprise comes !...

CREATE COLLATION public.case_and_accent_insensitive (
      provider = icu, 
      locale = 'und-u-ks-level1', 
      deterministic = false
    );
SELECT POSITION(('o' COLLATE public.case_and_accent_insensitive) IN
'ångström'); 

=> gives ERROR: "nondeterministic collations are not supported for substring
searches"

Does the POSITION() function pretends taking into account the COLLATION ??
or not ??

- If not, then why the hell is there this error message about
nondeterministic collations while the POSITION() doesn't care at all about
the COLLATION...
- If yes, then the first 6 lines of SQL above are returning the wrong
value... (are there any specific technical limitations here ?)

I would like to have something in the doc about that... i.e. either some
examples showing how the COLLATION is impacting the results of the
POSITION() function ; or a statement which confirms that the POSITION()
function doesn't care at all about the COLLATION (+ the advice to
systematically add ...COLLATE "C"... on one of the POSITION()'s arguments in
order to avoid the dumb "nondeterministic collations are not supported"
error, while in fine no collations at all are supported)

Thanks.


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox