public inbox for [email protected]  
help / color / mirror / Atom feed
From: Masahiko Sawada <[email protected]>
To: Sergey Prokhorenko <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Date: Thu, 23 Oct 2025 14:00:50 -0700
Message-ID: <CAD21AoCxy_3VW2_z5Rxc-tFsuqeyGsA-F_kD-tx6XXBC56nTCg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CAJ7c6TOramr1UTLcyB128LWMqita1Y7=arq3KHaU=qikf5yKOQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>

On Thu, Oct 23, 2025 at 10:34 AM Sergey Prokhorenko
<[email protected]> wrote:
>
> >> The value of converting uuid to base32 is not obvious though, so I
> >> would recommend explaining it in more detail.
>
> > Yes, and maybe some examples of other systems that adopted this format would be handy too.
>
> DNSSEC (https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions)
> many encoders and decoders
>
> > Sergey, can you, please, extend reasoning why this particular format is prominent? RFC 4648 describes a bunch of formats.
>
>
> > Best regards, Andrey Borodin.
>
>
> Base32hex:
> 1. Preserves sort order (unlike base64)
> 2. Compact
> 3. Standardized and therefore implemented consistently everywhere
> 4. Implemented in many programming languages' standard libraries
> 5. Does not require specifying character case during dictation
> 6. Has simple and high-performance encoding and decoding algorithms (necessary for system integration using JSON)
>
> The only compact text encoding eliminates the problem of incompatibility. The authors and contributors of RFC 9562 were categorically against having multiple encodings for UUIDs. They wanted to have only one compact, sort-order-preserving text encoding. For compatibility, they added the canonical UUID format. Due to time constraints, the compact encoding was not included in RFC 9562.
>
> In databases, UUIDs should preferably be stored in binary format (the UUID type in PostgreSQL) according to RFC 9562.
>
> Intermediate formats (bytea) reduce performance, which is the very reason we even abandoned the more compact base36 encoding.

Given that what uuid_to_base32hex() actually does is encoding the
input UUID,  I find that it could be confusing if we have a similar
function other than encode() function. Also, we could end up
introducing as many encoding and decoding functions dedicated for UUID
as we want to support encoding methods, bloating the functions.

So as the first step, +1 for supporting base32hex for encode() and
decode() functions and supporting the UUID <-> bytea conversion. I
believe it would cover most use cases and the cost of UUID <-> bytea
conversion is negligible.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





view thread (62+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  In-Reply-To: <CAD21AoCxy_3VW2_z5Rxc-tFsuqeyGsA-F_kD-tx6XXBC56nTCg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox