public inbox for [email protected]
help / color / mirror / Atom feedFrom: Masahiko Sawada <[email protected]>
To: Tom Lane <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Aleksander Alekseev <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Dagfinn Ilmari Mannsåker <[email protected]>
Cc: Andrey Borodin <[email protected]>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: Wed, 25 Mar 2026 20:13:55 -0700
Message-ID: <CAD21AoCd9e5YvVWQY5JyEzK6tGLtJJWgw1+zgGpAwTOnJ7X1GQ@mail.gmail.com> (raw)
In-Reply-To: <CAD21AoC=92zAEmaGt+3U1jSD77jgdQT_HXUwPPtZEze2sJ=vWA@mail.gmail.com>
References: <[email protected]>
<[email protected]>
<CAJ7c6TNCGT2eZ8JttAZKUhVv3jJFTHvjFw6typzSGkM5_vePow@mail.gmail.com>
<CAD21AoCcikUgU5coyvM7PLJg5M-75LzQhm-Uxmqn_7M=DHkj+w@mail.gmail.com>
<CAJ7c6TPu7NGrSj-V2-ns0GTNGx9cmoKQgzt-p2MpJepiGjQnYw@mail.gmail.com>
<CAD21AoC2Ooy=W=4YjQ_GO5nEYZ2tsn43=TkLf7ysE-96ygohKg@mail.gmail.com>
<[email protected]>
<CAD21AoA7UMWG8YO2Edf1UWSdtaof6gOnF98bJGoz1z1JBzozuQ@mail.gmail.com>
<CAJ7c6TMd4ebu7oha2C9QZ_aKNE5-keEJ=bE_k_-vrMuw2p1LNg@mail.gmail.com>
<CAJ7c6TNzQES3EhzoftNUEg=mfi69SYEQ=Fh-=rr84b7zHECHeg@mail.gmail.com>
<CAD21AoAj+KOc74wOr04oGo9ySYmf_TYkMi5fByLF80cqyZmB=g@mail.gmail.com>
<[email protected]>
<CAJ7c6TN7fjWDWSVmhQuNCPYLVL9cs9ifc4D_Hj6wUQd2Lw2oNQ@mail.gmail.com>
<CAD21AoAMST1V1OE=yvg3vQn=p6DDJBAaaFDx39z=P-+Z9iXwOQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com>
<CAD21AoC=92zAEmaGt+3U1jSD77jgdQT_HXUwPPtZEze2sJ=vWA@mail.gmail.com>
On Wed, Mar 25, 2026 at 6:21 PM Masahiko Sawada <[email protected]> wrote:
>
> On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <[email protected]> wrote:
> >
> > On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
> > >
> > > Tomas Vondra <[email protected]> writes:
> > > > On 3/26/26 00:40, Tom Lane wrote:
> > > >> I believe what's happening there is that in cs_CZ locale,
> > > >> "V" doesn't follow simple ASCII sort ordering.
> > >
> > > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > > other way around. V is not special in any way.
> > >
> > > Ah, sorry, I should have researched a bit instead of relying on
> > > fading memory. The quirk I was thinking of is that in cs_CZ,
> > > "ch" sorts after "h":
> > >
> > > u8=# select 'h' < 'ch'::text collate "en_US";
> > > ?column?
> > > ----------
> > > f
> > > (1 row)
> > >
> > > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> > > ?column?
> > > ----------
> > > t
> > > (1 row)
> > >
> > > Regular hex encoding isn't bitten by that because it doesn't
> > > use 'h' in the text form ... but this base32hex thingie does.
> > >
> > > However, your point is also correct:
> > >
> > > u8=# select '0' < 'C'::text ;
> > > ?column?
> > > ----------
> > > t
> > > (1 row)
> > >
> > > u8=# select '0' < 'C'::text collate "cs_CZ";
> > > ?column?
> > > ----------
> > > f
> > > (1 row)
> > >
> > > and that breaks "text ordering matches numeric ordering"
> > > for both traditional hex and base32hex. So maybe this
> > > is not as big a deal as I first thought. We need a fix
> > > for the new test though. Probably adding COLLATE "C"
> > > would be enough.
> >
> > Thank you for the report and the analysis.
> >
> > I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> > "C" to the query resolves it. It seems also a good idea to add a note
> > in the documentation too as users might face the same issue. For
> > example,
> >
> > To maintain the lexicographical sort order of the encoded data, ensure
> > that the text is sorted using the C collation (e.g., using COLLATE
> > "C"). Natural language collations may sort characters differently and
> > break the ordering.
> >
>
> Attached the patch doing the above idea.
Pushed the fix without the documentation changes to make the buildfarm
animals happy first.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
view thread (63+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
In-Reply-To: <CAD21AoCd9e5YvVWQY5JyEzK6tGLtJJWgw1+zgGpAwTOnJ7X1GQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox