public inbox for [email protected]
help / color / mirror / Atom feedFrom: Masahiko Sawada <[email protected]>
To: Tom Lane <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Aleksander Alekseev <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Dagfinn Ilmari Mannsåker <[email protected]>
Cc: Andrey Borodin <[email protected]>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: Wed, 25 Mar 2026 18:09:22 -0700
Message-ID: <CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAJ7c6TNCGT2eZ8JttAZKUhVv3jJFTHvjFw6typzSGkM5_vePow@mail.gmail.com>
<CAD21AoCcikUgU5coyvM7PLJg5M-75LzQhm-Uxmqn_7M=DHkj+w@mail.gmail.com>
<CAJ7c6TPu7NGrSj-V2-ns0GTNGx9cmoKQgzt-p2MpJepiGjQnYw@mail.gmail.com>
<CAD21AoC2Ooy=W=4YjQ_GO5nEYZ2tsn43=TkLf7ysE-96ygohKg@mail.gmail.com>
<[email protected]>
<CAD21AoA7UMWG8YO2Edf1UWSdtaof6gOnF98bJGoz1z1JBzozuQ@mail.gmail.com>
<CAJ7c6TMd4ebu7oha2C9QZ_aKNE5-keEJ=bE_k_-vrMuw2p1LNg@mail.gmail.com>
<CAJ7c6TNzQES3EhzoftNUEg=mfi69SYEQ=Fh-=rr84b7zHECHeg@mail.gmail.com>
<CAD21AoAj+KOc74wOr04oGo9ySYmf_TYkMi5fByLF80cqyZmB=g@mail.gmail.com>
<[email protected]>
<CAJ7c6TN7fjWDWSVmhQuNCPYLVL9cs9ifc4D_Hj6wUQd2Lw2oNQ@mail.gmail.com>
<CAD21AoAMST1V1OE=yvg3vQn=p6DDJBAaaFDx39z=P-+Z9iXwOQ@mail.gmail.com>
<[email protected]>
<[email protected]>
On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
>
> Tomas Vondra <[email protected]> writes:
> > On 3/26/26 00:40, Tom Lane wrote:
> >> I believe what's happening there is that in cs_CZ locale,
> >> "V" doesn't follow simple ASCII sort ordering.
>
> > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > other way around. V is not special in any way.
>
> Ah, sorry, I should have researched a bit instead of relying on
> fading memory. The quirk I was thinking of is that in cs_CZ,
> "ch" sorts after "h":
>
> u8=# select 'h' < 'ch'::text collate "en_US";
> ?column?
> ----------
> f
> (1 row)
>
> u8=# select 'h' < 'ch'::text collate "cs_CZ";
> ?column?
> ----------
> t
> (1 row)
>
> Regular hex encoding isn't bitten by that because it doesn't
> use 'h' in the text form ... but this base32hex thingie does.
>
> However, your point is also correct:
>
> u8=# select '0' < 'C'::text ;
> ?column?
> ----------
> t
> (1 row)
>
> u8=# select '0' < 'C'::text collate "cs_CZ";
> ?column?
> ----------
> f
> (1 row)
>
> and that breaks "text ordering matches numeric ordering"
> for both traditional hex and base32hex. So maybe this
> is not as big a deal as I first thought. We need a fix
> for the new test though. Probably adding COLLATE "C"
> would be enough.
Thank you for the report and the analysis.
I've reproduced the issue with "cs_CZ" collation and adding COLLATE
"C" to the query resolves it. It seems also a good idea to add a note
in the documentation too as users might face the same issue. For
example,
To maintain the lexicographical sort order of the encoded data, ensure
that the text is sorted using the C collation (e.g., using COLLATE
"C"). Natural language collations may sort characters differently and
break the ordering.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
view thread (63+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
In-Reply-To: <CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox