public inbox for [email protected]  
help / color / mirror / Atom feed
Does the POSITION() function takes into account the COLLATION... or not ?!?
6+ messages / 5 participants
[nested] [flat]

* Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-15 16:17  PG Doc comments form <[email protected]>
  0 siblings, 2 replies; 6+ messages in thread

From: PG Doc comments form @ 2022-02-15 16:17 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/functions-string.html
Description:

The doc doesn't say anything about the impact of the COLLATION on the
results of any of the string functions, so I tried some interesting tests
with POSITION():

SELECT POSITION(('ß' COLLATE "de_DE.utf8") IN 'DASS'); -- should return 3
instead of 0 !?!
SELECT POSITION(('ss' COLLATE "de_DE.utf8") IN 'daß'); -- should return 3
instead of 0 !?!

SELECT POSITION(('oe' COLLATE "fr_FR.utf8") IN 'bœuf'); -- should return 2
instead of 0 !?!
SELECT POSITION(('œ' COLLATE "fr_FR.utf8") IN 'boeuf'); -- should return 2
instead of 0 !?!

SELECT POSITION(('å' COLLATE "en_US.utf8") IN 'yeah'); -- should return 3
instead of 0 !?!
SELECT POSITION(('o' COLLATE "en_US.utf8") IN 'ångström'); -- should return
7 instead of 0 !?!

==> up to here, this seems pretty enough to conclude that POSITION() doesn't
care at all about COLLATION and always perform a byte search.

Now the great surprise comes !...

CREATE COLLATION public.case_and_accent_insensitive (
      provider = icu, 
      locale = 'und-u-ks-level1', 
      deterministic = false
    );
SELECT POSITION(('o' COLLATE public.case_and_accent_insensitive) IN
'ångström'); 

=> gives ERROR: "nondeterministic collations are not supported for substring
searches"

Does the POSITION() function pretends taking into account the COLLATION ??
or not ??

- If not, then why the hell is there this error message about
nondeterministic collations while the POSITION() doesn't care at all about
the COLLATION...
- If yes, then the first 6 lines of SQL above are returning the wrong
value... (are there any specific technical limitations here ?)

I would like to have something in the doc about that... i.e. either some
examples showing how the COLLATION is impacting the results of the
POSITION() function ; or a statement which confirms that the POSITION()
function doesn't care at all about the COLLATION (+ the advice to
systematically add ...COLLATE "C"... on one of the POSITION()'s arguments in
order to avoid the dumb "nondeterministic collations are not supported"
error, while in fine no collations at all are supported)

Thanks.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-16 10:56  Peter Eisentraut <[email protected]>
  parent: PG Doc comments form <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: Peter Eisentraut @ 2022-02-16 10:56 UTC (permalink / raw)
  To: [email protected]; [email protected]

On 15.02.22 17:17, PG Doc comments form wrote:
> Does the POSITION() function pretends taking into account the COLLATION ??
> or not ??
> 
> - If not, then why the hell is there this error message about
> nondeterministic collations while the POSITION() doesn't care at all about
> the COLLATION...
> - If yes, then the first 6 lines of SQL above are returning the wrong
> value... (are there any specific technical limitations here ?)

I think you have that backwards.  Your examples would only succeed if 
POSITION() supported nondeterministic collations, but it doesn't.





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-16 11:01  Pierre-Aurélien GEORGES <[email protected]>
  parent: Peter Eisentraut <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Pierre-Aurélien GEORGES @ 2022-02-16 11:01 UTC (permalink / raw)
  To: Peter Eisentraut <[email protected]>; [email protected]; [email protected]

ok, let's take it forwards, then :-)

Can someone provide some examples where a supported deterministic 
collation is having an impact on the result of a call to function 
POSITION() ?

(as I said in my message, examples would be greatly appreciated)


Le 16/02/2022 à 11:56, Peter Eisentraut a écrit :
> On 15.02.22 17:17, PG Doc comments form wrote:
>> Does the POSITION() function pretends taking into account the 
>> COLLATION ??
>> or not ??
>>
>> - If not, then why the hell is there this error message about
>> nondeterministic collations while the POSITION() doesn't care at all 
>> about
>> the COLLATION...
>> - If yes, then the first 6 lines of SQL above are returning the wrong
>> value... (are there any specific technical limitations here ?)
>
> I think you have that backwards.  Your examples would only succeed if 
> POSITION() supported nondeterministic collations, but it doesn't.





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-16 14:07  Peter Eisentraut <[email protected]>
  parent: Pierre-Aurélien GEORGES <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Peter Eisentraut @ 2022-02-16 14:07 UTC (permalink / raw)
  To: Pierre-Aurélien GEORGES <[email protected]>; +Cc: [email protected]; [email protected]

On Wed, Feb 16, 2022 at 12:01 PM Pierre-Aurélien GEORGES <
[email protected]> wrote:

> ok, let's take it forwards, then :-)
>
> Can someone provide some examples where a supported deterministic
> collation is having an impact on the result of a call to function
> POSITION() ?
>

There are no such cases currently.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-16 16:10  David G. Johnston <[email protected]>
  parent: PG Doc comments form <[email protected]>
  1 sibling, 1 reply; 6+ messages in thread

From: David G. Johnston @ 2022-02-16 16:10 UTC (permalink / raw)
  To: [email protected]; Pg Docs <[email protected]>

On Tue, Feb 15, 2022 at 11:17 AM PG Doc comments form <
[email protected]> wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/14/functions-string.html
> Description:
>
>
> SELECT POSITION(('å' COLLATE "en_US.utf8") IN 'yeah'); -- should return 3
> instead of 0 !?!
> SELECT POSITION(('o' COLLATE "en_US.utf8") IN 'ångström'); -- should return
> 7 instead of 0 !?!
>
> ==> up to here, this seems pretty enough to conclude that POSITION()
> doesn't
> care at all about COLLATION and always perform a byte search.
>

IIUC "Performing a byte search" is what happens when you use a
deterministic collation where the only test is for equality.  So your
examples are not useful in distinguishing whether a collation agnostic byte
search or a deterministic collation search is happening.



> I would like to have something in the doc about that... i.e. either some
> examples showing how the COLLATION is impacting the results of the
> POSITION() function


I don't disagree, but lack of documentation regarding string functions and
collations is not just limited to the position function.  I find the fact
the word collation doesn't appear anywhere on the string functions page to
be potentially worthy of change.

How collations behave is documented, in particular:

"A collation is either deterministic or nondeterministic. A deterministic
collation uses deterministic comparisons, which means that it considers
strings to be equal only if they consist of the same byte sequence."

I'll admit I'm definitely an unfavorably biased observer here and don't
deal with these nuances on a daily basis.  I have the general impression
that our documentation is correct and sufficient but could be made more
user-friendly.  Updating a single function doesn't do that though and in
some ways makes things worse when other related elements, and the
presentation of the material as a whole, doesn't go along with that single
change.  Based on this, and the above observation about your test cases, I
don't see much motivation for change here.  The effort seems to outweigh
the reward.  But for someone who feels differently and submits a patch
there is, IMO, room enough for improvement that a well-written one is
likely to be welcomed.

David J.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Does the POSITION() function takes into account the COLLATION... or not ?!?
@ 2022-02-16 16:18  Tom Lane <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2022-02-16 16:18 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: [email protected]; Pg Docs <[email protected]>

"David G. Johnston" <[email protected]> writes:
> On Tue, Feb 15, 2022 at 11:17 AM PG Doc comments form <
> [email protected]> wrote:
>> ==> up to here, this seems pretty enough to conclude that POSITION()
>> doesn't
>> care at all about COLLATION and always perform a byte search.

> How collations behave is documented, in particular:

> "A collation is either deterministic or nondeterministic. A deterministic
> collation uses deterministic comparisons, which means that it considers
> strings to be equal only if they consist of the same byte sequence."

Right.  POSITION does indeed do bytewise search, which is a legal
implementation for any deterministic collation, but perhaps not
for a nondeterministic one.  Thus, it throws a feature-not-supported
error if asked to use a nondeterministic collation.  As you say,
this is far from the only function that acts that way.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2022-02-16 16:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-02-15 16:17 Does the POSITION() function takes into account the COLLATION... or not ?!? PG Doc comments form <[email protected]>
2022-02-16 10:56 ` Peter Eisentraut <[email protected]>
2022-02-16 11:01   ` Pierre-Aurélien GEORGES <[email protected]>
2022-02-16 14:07     ` Peter Eisentraut <[email protected]>
2022-02-16 16:10 ` David G. Johnston <[email protected]>
2022-02-16 16:18   ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox