Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nKMsx-0000dZ-Kp for pgsql-docs@arkaria.postgresql.org; Wed, 16 Feb 2022 16:10:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nKMsw-0008Ku-Hu for pgsql-docs@arkaria.postgresql.org; Wed, 16 Feb 2022 16:10:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nKMsw-0008Kl-4C for pgsql-docs@lists.postgresql.org; Wed, 16 Feb 2022 16:10:30 +0000 Received: from mail-vs1-xe31.google.com ([2607:f8b0:4864:20::e31]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nKMst-00009s-95 for pgsql-docs@lists.postgresql.org; Wed, 16 Feb 2022 16:10:29 +0000 Received: by mail-vs1-xe31.google.com with SMTP id i27so2979685vsr.10 for ; Wed, 16 Feb 2022 08:10:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=NwMdqfsu0+j2VrmVb5L5fD7OO6ytPSQBIVfZWjQbP2Y=; b=SJoryBExAIfFRiRlfhoGFxJhig5drH3hGyuiRTHZUEDdw7KpJg+BTztHEuy1fzjdOU mIqsQLyPXvKb67p9joAJCM/HRsAm5UFHgh7mEyF/PRAXxkKW76A/5WGP3W4jwWIQebjq lbUzO7nuPlPJAJqYo64b85AR0l5gBokgGjAyFWYdmr6aiamw+LluCzReXnt+cqb+xqM6 N4elInHRUvf2KB9O/tTOvrYt3RsYlnm2fy9ESUgz02rzB8ZUK6dVFn8KcRcGA2WSYaYc 6xK/jMMNYa006qfPZYRkI5WgyLiGPwMbJIMNtdlUyRzXy5l0yMkt/z14VTqkjUr2zsFj OU6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=NwMdqfsu0+j2VrmVb5L5fD7OO6ytPSQBIVfZWjQbP2Y=; b=WTJqCD6oo3vslaORhvu83TN9E+/uyKsLdtzLnPmmGwSs+AwcNKFHPTGSEIf1kO00nI Xs/hDybdEsXDfUCY7GkSJ/TwFKkI6bny3Acngsoc0i50mrk8rNyoiCKZEtpGKdu9cWk4 7bUOeiclbXY4TVBdWKG6TlhlpHM6rrz/oeccfGDk59Eiu3onI6/qeV5UxVrFwpoTKw/P 8zNZ9XFP32+scsid07povtCVzDsorJw9+ISWDNZheqPc5O1SRFME0szWJ9mLIoAQs6KU c7TcYF5EnKEJ0ltktPHdyIVqE7jJz1JN3tctoVLC9n4XzWACVp1g+PUH2qTkbzQ/Ymwg rKgA== X-Gm-Message-State: AOAM531PXJCkBy7tJf7HzMSHKaC9eDJzyXTcScvi35VEJzlUTx3/fILR B5aObTMJNfAemwhqf80UNJ/nnBjiwICZkTSuuH0= X-Google-Smtp-Source: ABdhPJyoVI6kum1zjTQUxJvaML1vTLFxZ7R7UF+Nep8NLg2mEwT8vgzRjBI59eTnudwAbZz7I0z2NMePwAYBWFOSC0w= X-Received: by 2002:a67:c01c:0:b0:31b:6444:2ba2 with SMTP id v28-20020a67c01c000000b0031b64442ba2mr1445101vsi.40.1645027825767; Wed, 16 Feb 2022 08:10:25 -0800 (PST) MIME-Version: 1.0 References: <164494187300.23318.373331246819207718@wrigleys.postgresql.org> In-Reply-To: <164494187300.23318.373331246819207718@wrigleys.postgresql.org> From: "David G. Johnston" Date: Wed, 16 Feb 2022 09:10:08 -0700 Message-ID: Subject: Re: Does the POSITION() function takes into account the COLLATION... or not ?!? To: pageorge@unice.fr, Pg Docs Content-Type: multipart/alternative; boundary="00000000000033565e05d824e344" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000033565e05d824e344 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Feb 15, 2022 at 11:17 AM PG Doc comments form < noreply@postgresql.org> wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/14/functions-string.html > Description: > > > SELECT POSITION(('=C3=A5' COLLATE "en_US.utf8") IN 'yeah'); -- should ret= urn 3 > instead of 0 !?! > SELECT POSITION(('o' COLLATE "en_US.utf8") IN '=C3=A5ngstr=C3=B6m'); -- s= hould return > 7 instead of 0 !?! > > =3D=3D> up to here, this seems pretty enough to conclude that POSITION() > doesn't > care at all about COLLATION and always perform a byte search. > IIUC "Performing a byte search" is what happens when you use a deterministic collation where the only test is for equality. So your examples are not useful in distinguishing whether a collation agnostic byte search or a deterministic collation search is happening. > I would like to have something in the doc about that... i.e. either some > examples showing how the COLLATION is impacting the results of the > POSITION() function I don't disagree, but lack of documentation regarding string functions and collations is not just limited to the position function. I find the fact the word collation doesn't appear anywhere on the string functions page to be potentially worthy of change. How collations behave is documented, in particular: "A collation is either deterministic or nondeterministic. A deterministic collation uses deterministic comparisons, which means that it considers strings to be equal only if they consist of the same byte sequence." I'll admit I'm definitely an unfavorably biased observer here and don't deal with these nuances on a daily basis. I have the general impression that our documentation is correct and sufficient but could be made more user-friendly. Updating a single function doesn't do that though and in some ways makes things worse when other related elements, and the presentation of the material as a whole, doesn't go along with that single change. Based on this, and the above observation about your test cases, I don't see much motivation for change here. The effort seems to outweigh the reward. But for someone who feels differently and submits a patch there is, IMO, room enough for improvement that a well-written one is likely to be welcomed. David J. --00000000000033565e05d824e344 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Feb 15, 2022 at 11:17 AM PG Doc comments form <= noreply@postgresql.org> wr= ote:
The following documentation comment has been l= ogged on the website:

Page: https://www.postgresql.org/docs/14/fun= ctions-string.html
Description:


SELECT POSITION(('=C3=A5' COLLATE "en_US.utf8") IN 'y= eah'); -- should return 3
instead of 0 !?!
SELECT POSITION(('o' COLLATE "en_US.utf8") IN '=C3=A5= ngstr=C3=B6m'); -- should return
7 instead of 0 !?!

=3D=3D> up to here, this seems pretty enough to conclude that POSITION()= doesn't
care at all about COLLATION and always perform a byte search.

IIUC "Performing a byte search" is what= happens when you use a deterministic=C2=A0collation where the only test is= for equality.=C2=A0 So your examples are not useful in distinguishing whet= her a collation agnostic byte search or a deterministic collation search is= happening.

=C2=A0
I would like to have something in the doc about that... i.e. either some examples showing how the COLLATION is impacting the results of the
POSITION() function

I don't disagree= , but lack of documentation regarding string functions and collations is no= t just limited to the position function.=C2=A0 I find the fact the word col= lation doesn't appear anywhere on the string functions page to be poten= tially worthy of change.

How collations b= ehave is documented, in particular:

"A collation = is either deterministic or nondeterministic. A deterministic collation uses= deterministic comparisons, which means that it considers strings to be equ= al only if they consist of the same byte sequence."

I'll admit I'm definitely an unfavorably biased o= bserver here and don't deal with these nuances on a daily basis.=C2=A0 = I have the general impression that our documentation is correct and suffici= ent but could be made more user-friendly.=C2=A0 Updating a single function = doesn't do that though and in some ways makes things worse when other r= elated elements, and the presentation of the material as a whole, doesn'= ;t go along with that single change.=C2=A0 Based on this, and the above obs= ervation about your test cases, I don't see much motivation for change = here.=C2=A0 The effort seems to outweigh the reward.=C2=A0 But for someone = who feels differently and submits a patch there is, IMO, room enough for im= provement that a well-written one is likely to be welcomed.

David J.

--00000000000033565e05d824e344--