public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
26+ messages / 5 participants
[nested] [flat]

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
@ 2026-02-09 12:19 Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-02-09 12:19 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andrey Borodin <[email protected]>; Masahiko Sawada <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

Hi,

> I briefly tested the patched version of v3. The implemented
> functionality works correctly.
>
> ---
> You can also add a case with the error from v3-0002
> "invalid base32hex end sequence" to the tests :
>
> +        ereport(ERROR,
> +                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                 errmsg("invalid base32hex end sequence"),
> +                 errhint("Input data has non-zero padding bits.")));
>
> ---
> I agree with Masahiko Sawada; information about conversions
> should be added to the documentation.

Here is the rebased patch.

v4-0001 implements uuid <-> bytea casting
v4-0002 implements base32 encoding/decoding

Unless I missed something, 0001 is ready to be merged.

I only rebased v3 and improved the commit messages, but I didn't
account for Masahiko Sawada's feedback for 0002. Andrey, are you still
working on this or others can pick it up?

The patch is not on the commitfest, so I'm about to add it.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [text/x-patch] v4-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch (12.2K, 2-v4-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch)
  download | inline diff:
From d80aad5edf1f02f27c2d9c2c001f5ef6b748b332 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v4 2/2] Add base32hex encoding support to encode() and
 decode()

Implement base32hex encoding/decoding per RFC 4648 Section 7 for
encode() and decode() functions. This encoding uses the extended hex
alphabet (0-9, A-V) which preserves sort order.

The encode() function produces unpadded output, while decode() accepts
both padded and unpadded input. Decoding is case-insensitive.

This is particularly useful for encoding UUIDs compactly:

    SELECT encode(uuid_value::bytea, 'base32hex');

produces a 26-character string compared to the standard 36-character
UUID representation.

Author: Andrey Borodin <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Aleksander Alekseev <[email protected]>
Suggested-by: Sergey Prokhorenko <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  25 +++++
 src/backend/utils/adt/encode.c           | 124 +++++++++++++++++++++++
 src/test/regress/expected/uuid.out       |  88 ++++++++++++++++
 src/test/regress/sql/uuid.sql            |  27 +++++
 4 files changed, 264 insertions(+)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..257b1bf4c6b 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,30 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (0-9, A-V) which preserves sort order when encoding binary data.
+       The <function>encode</function> function produces unpadded output,
+       while <function>decode</function> accepts both padded and unpadded
+       input. Decoding is case-insensitive and ignores whitespace characters.
+      </para>
+      <para>
+       This format is particularly useful for encoding UUIDs in a compact,
+       sortable format: <literal>encode(uuid_value::bytea, 'base32hex')</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..94cc0722422 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -825,6 +825,124 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bits per base32hex character, so round up (srclen * 8 + 4) / 5 */
+	return ((uint64) srclen * 8 + 4) / 5;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+	{
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+	}
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+	size_t		decode_len = srclen;
+
+	/*
+	 * RFC 4648 allows padding with '=' to make the length a multiple of 8.
+	 * Count and skip trailing padding characters.
+	 */
+	while (decode_len > 0 && data[decode_len - 1] == '=')
+		decode_len--;
+
+	for (i = 0; i < decode_len; i++)
+	{
+		unsigned char c = data[i];
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Verify no extra bits remain (padding bits should be zero) */
+	if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer) - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid base32hex end sequence"),
+				 errhint("Input data has non-zero padding bits.")));
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +972,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 24486084aaf..86d21a29093 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -321,5 +321,93 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 ERROR:  invalid length for UUID
 DETAIL:  Expected 16 bytes, got 8.
+-- base32hex encoding via encode/decode
+SELECT encode('00000000-0000-0000-0000-000000000000'::uuid::bytea, 'base32hex');
+           encode           
+----------------------------
+ 00000000000000000000000000
+(1 row)
+
+SELECT encode('11111111-1111-1111-1111-111111111111'::uuid::bytea, 'base32hex');
+           encode           
+----------------------------
+ 248H248H248H248H248H248H24
+(1 row)
+
+SELECT encode('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid::bytea, 'base32hex');
+           encode           
+----------------------------
+ VVVVVVVVVVVVVVVVVVVVVVVVVS
+(1 row)
+
+SELECT encode('123e4567-e89b-12d3-a456-426614174000'::uuid::bytea, 'base32hex');
+           encode           
+----------------------------
+ 28V4APV8JC9D792M89J185Q000
+(1 row)
+
+-- test decode with base32hex
+SELECT decode('00000000000000000000000000', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 00000000-0000-0000-0000-000000000000
+(1 row)
+
+SELECT decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 123e4567-e89b-12d3-a456-426614174000
+(1 row)
+
+-- test round-trip conversions
+SELECT decode(encode('00000000-0000-0000-0000-000000000000'::uuid::bytea, 'base32hex'), 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 00000000-0000-0000-0000-000000000000
+(1 row)
+
+SELECT encode(decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid::bytea, 'base32hex');
+           encode           
+----------------------------
+ 28V4APV8JC9D792M89J185Q000
+(1 row)
+
+SELECT decode(encode('123e4567-e89b-12d3-a456-426614174000'::uuid::bytea, 'base32hex'), 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 123e4567-e89b-12d3-a456-426614174000
+(1 row)
+
+-- test case insensitivity
+SELECT decode('28v4apv8jc9d792m89j185q000', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 123e4567-e89b-12d3-a456-426614174000
+(1 row)
+
+SELECT decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 123e4567-e89b-12d3-a456-426614174000
+(1 row)
+
+-- test RFC 4648 padding (32 chars with 6 '=' signs)
+SELECT decode('28V4APV8JC9D792M89J185Q000======', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 123e4567-e89b-12d3-a456-426614174000
+(1 row)
+
+SELECT decode('00000000000000000000000000======', 'base32hex')::uuid;
+                decode                
+--------------------------------------
+ 00000000-0000-0000-0000-000000000000
+(1 row)
+
+-- test error cases for base32hex
+SELECT decode('28V4APV8JC9D792M89J185Q00W', 'base32hex')::uuid;  -- invalid character W
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('28V4APV8JC9D792M89J185Q00!', 'base32hex')::uuid;  -- invalid character !
+ERROR:  invalid symbol "!" found while decoding base32hex sequence
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 63520d0b640..44e8fa8b243 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -151,5 +151,32 @@ SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
 SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 
+-- base32hex encoding via encode/decode
+SELECT encode('00000000-0000-0000-0000-000000000000'::uuid::bytea, 'base32hex');
+SELECT encode('11111111-1111-1111-1111-111111111111'::uuid::bytea, 'base32hex');
+SELECT encode('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid::bytea, 'base32hex');
+SELECT encode('123e4567-e89b-12d3-a456-426614174000'::uuid::bytea, 'base32hex');
+
+-- test decode with base32hex
+SELECT decode('00000000000000000000000000', 'base32hex')::uuid;
+SELECT decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid;
+
+-- test round-trip conversions
+SELECT decode(encode('00000000-0000-0000-0000-000000000000'::uuid::bytea, 'base32hex'), 'base32hex')::uuid;
+SELECT encode(decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid::bytea, 'base32hex');
+SELECT decode(encode('123e4567-e89b-12d3-a456-426614174000'::uuid::bytea, 'base32hex'), 'base32hex')::uuid;
+
+-- test case insensitivity
+SELECT decode('28v4apv8jc9d792m89j185q000', 'base32hex')::uuid;
+SELECT decode('28V4APV8JC9D792M89J185Q000', 'base32hex')::uuid;
+
+-- test RFC 4648 padding (32 chars with 6 '=' signs)
+SELECT decode('28V4APV8JC9D792M89J185Q000======', 'base32hex')::uuid;
+SELECT decode('00000000000000000000000000======', 'base32hex')::uuid;
+
+-- test error cases for base32hex
+SELECT decode('28V4APV8JC9D792M89J185Q00W', 'base32hex')::uuid;  -- invalid character W
+SELECT decode('28V4APV8JC9D792M89J185Q00!', 'base32hex')::uuid;  -- invalid character !
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



  [text/x-patch] v4-0001-Allow-explicit-casting-between-bytea-and-UUID.patch (5.1K, 3-v4-0001-Allow-explicit-casting-between-bytea-and-UUID.patch)
  download | inline diff:
From 9d7db65047941eb1c9f8d79a3790719a486a3863 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <[email protected]>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH v4 1/2] Allow explicit casting between bytea and UUID
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This enables using encode() and decode() to convert UUIDs to and from
alternative formats, such as base64.

Author:	Dagfinn Ilmari Mannsåker <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Jelte Fennema-Nio <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 src/backend/utils/adt/bytea.c      | 27 +++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 16 ++++++++++++++++
 src/test/regress/sql/uuid.sql      |  4 ++++
 5 files changed, 60 insertions(+)

diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index fd7662d41ee..0e6d97412d0 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -28,6 +28,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -1340,3 +1341,29 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
 	return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	pg_uuid_t  *uuid;
+
+	if (len != UUID_LEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid length for UUID"),
+				 errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
+
+	uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));
+	memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+	PG_RETURN_UUID_P(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+	return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index 9b1cfb1b590..a7b6d812c5a 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -362,6 +362,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 83f6501df38..d2b30390671 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1208,6 +1208,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..24486084aaf 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,21 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x5b35380a714349129b55f322699c6770
+(1 row)
+
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ 019a2f85-9ced-7225-b99d-9c55044a2563
+(1 row)
+
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+ERROR:  invalid length for UUID
+DETAIL:  Expected 16 bytes, got 8.
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..63520d0b640 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,10 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-02-18 14:57 ` Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-02-18 14:57 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andrey Borodin <[email protected]>; Masahiko Sawada <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

Hi,

> I only rebased v3 and improved the commit messages, but I didn't
> account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> working on this or others can pick it up?
>
> The patch is not on the commitfest, so I'm about to add it.

Here is patch v5 where I accounted for the previous feedback from
Masahiko Sawada and also made some other changes, see below.

> How about the error message like "invalid input length for type uuid"?
> I think "uuid" should be lower case as it indicates PostgreSQL uuid
> data type, and it's better to use %s format instead of directly
> writing "uuid" (see string_to_uuid() for example).

Makes sense. Fixed.

> As for the errdetail message, should we add "bytea" also after "got %d"?

You probably meant "got %d bytes", not "got %d bytea". I believe the
current message is fine, but maybe native speakers will correct us.

> We already have tests for casting bytes to integer data types in
> strings.sql. I suggest moving the casting tests from bytea to uuid
> into therel.

I disagree on the grounds that there are zero tests related to UUID in
strings.sql; uuid.sql is a more appropriate place for these tests IMO.
However if someone seconds the idea we can easily move the tests at
any time.

> For the uuid.sql file, we could add a test to verify that
> a UUID value remains unchanged when it's cast to bytea and back to
> UUID. For example,
>
> SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;

Good point. Added.

> base32hex_encode() doesn't seem to add '=' paddings, but is it
> intentional? I don't see any description in RFC 4648 that we can omit
> '=' paddings.

You are right, both base32 and base32hex should add paddings;
substring() can be used if necessary. Fixed.

> I think the patch should add tests not only for uuid data type but
> also for general cases like other encodings.

Yes, and the good place for these tests would be closer to other tests
for encode() and decode() i.e. strings.sql. Fixed.

While working on it I noticed some inconsistencies between base32hex
implementation and our current implementation of base64. As an
example, we don't allow `=` input:

```
=# SELECT decode('=', 'base64');
ERROR:  unexpected "=" while decoding base64 sequence
```

... while base32hex did. I fixed such inconsistencies too.

> In uuid.sql tests, how about adding some tests to check if base32hex
> maintains the sortability of UUIDv7 data?

Agree. Added.

> I think we should update the documentation in the uuid section about
> casting data between bytea and uuid. For references, we have a similar
> description for bytea and integer[1].

Fair point. Fixed.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [text/x-patch] v5-0001-Allow-explicit-casting-between-bytea-and-UUID.patch (6.2K, 2-v5-0001-Allow-explicit-casting-between-bytea-and-UUID.patch)
  download | inline diff:
From 02bb2101b393587294cb3a93bd766091ca1cd32b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <[email protected]>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH v5 1/2] Allow explicit casting between bytea and UUID
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This enables using encode() and decode() to convert UUIDs to and from
alternative formats, such as base64.

Author:	Dagfinn Ilmari Mannsåker <[email protected]>
Author: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Jelte Fennema-Nio <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml         | 11 +++++++++++
 src/backend/utils/adt/bytea.c      | 27 +++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 22 ++++++++++++++++++++++
 src/test/regress/sql/uuid.sql      |  5 +++++
 6 files changed, 78 insertions(+)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3017c674040..f8264b119ab 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4439,6 +4439,17 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     Output is always in the standard form.
    </para>
 
+   <para>
+   It is possible to cast <type>uuid</type> values to and from type
+   <type>bytea</type>. This allows using <literal>encode()</literal>
+   and <literal>decode()</literal> functions for <type>uuid</type>.
+   Some examples:
+<programlisting>
+encode('1ea3d64c-bc40-4cc3-84bb-6b11ee31e5c2'::uuid::bytea, 'base64')
+decode('HqPWTLxATMOEu2sR7jHlwg==', 'base64')::uuid
+</programlisting>
+   </para>
+
    <para>
     See <xref linkend="functions-uuid"/> for how to generate a UUID in
     <productname>PostgreSQL</productname>.
diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index fd7662d41ee..4dc83671aa5 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -28,6 +28,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -1340,3 +1341,29 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
 	return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	pg_uuid_t  *uuid;
+
+	if (len != UUID_LEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid input length for type %s", "uuid"),
+				 errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
+
+	uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));
+	memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+	PG_RETURN_UUID_P(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+	return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index 9b1cfb1b590..a7b6d812c5a 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -362,6 +362,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 83f6501df38..d2b30390671 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1208,6 +1208,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..d157ef7d0b3 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,27 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x5b35380a714349129b55f322699c6770
+(1 row)
+
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ 019a2f85-9ced-7225-b99d-9c55044a2563
+(1 row)
+
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+ERROR:  invalid input length for type uuid
+DETAIL:  Expected 16 bytes, got 8.
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
+ matched 
+---------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..f512f4dea1d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,11 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



  [text/x-patch] v5-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch (13.9K, 3-v5-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch)
  download | inline diff:
From 857ec382cf2a220748e11d6bed28c8b006282589 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v5 2/2] Add base32hex encoding support to encode() and
 decode()

Implement base32hex encoding/decoding per RFC 4648 Section 7 for
encode() and decode() functions. This encoding uses the extended hex
alphabet (0-9, A-V) which preserves sort order.

The encode() function produces padded output, while decode() accepts
both padded and unpadded input. Decoding is case-insensitive.

Author: Andrey Borodin <[email protected]>
Author: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Suggested-by: Sergey Prokhorenko <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  25 ++++
 src/backend/utils/adt/encode.c           | 158 +++++++++++++++++++++++
 src/test/regress/expected/strings.out    | 107 ++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  29 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 342 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..51be0463eec 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,30 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (0-9, A-V) which preserves sort order when encoding binary data.
+       The <function>encode</function> function produces padded output,
+       while <function>decode</function> accepts both padded and unpadded
+       input. Decoding is case-insensitive and ignores whitespace characters.
+      </para>
+      <para>
+       This format can be used for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value :: bytea, 'base32hex') from 1 for 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..6153a5cb5f2 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -825,6 +825,158 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+	{
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+	}
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+	{
+		dst[output_pos++] = '=';
+	}
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	for (i = 0; i < srclen; i++)
+	{
+		unsigned char c = data[i];
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * Padding is only valid at positions 2, 4, 5, or 7 within an
+			 * 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
+			 * We only check the position for the first '=' character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding %s sequence",
+									"base32hex")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+							pg_mblen((const char *) &c), (const char *) &c,
+							"base32hex")));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+							pg_mblen((const char *) &c), (const char *) &c,
+							"base32hex")));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	/* Verify no extra bits remain (padding bits should be zero) */
+	if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer) - 1)) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid base32hex end sequence"),
+				 errhint("Input data has non-zero padding bits.")));
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +1006,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..910757537e7 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2605,9 +2605,114 @@ SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
 HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..df25a6aa5c5 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig :: bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff'] :: uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..b5237e85172 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,37 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..278a5773ada 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig :: bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff'] :: uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-14 04:10   ` Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-14 04:10 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Wed, Feb 18, 2026 at 6:58 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > I only rebased v3 and improved the commit messages, but I didn't
> > account for Masahiko Sawada's feedback for 0002. Andrey, are you still
> > working on this or others can pick it up?
> >
> > The patch is not on the commitfest, so I'm about to add it.
>
> Here is patch v5 where I accounted for the previous feedback from
> Masahiko Sawada and also made some other changes, see below.
>
> > How about the error message like "invalid input length for type uuid"?
> > I think "uuid" should be lower case as it indicates PostgreSQL uuid
> > data type, and it's better to use %s format instead of directly
> > writing "uuid" (see string_to_uuid() for example).
>
> Makes sense. Fixed.
>
> > As for the errdetail message, should we add "bytea" also after "got %d"?
>
> You probably meant "got %d bytes", not "got %d bytea". I believe the
> current message is fine, but maybe native speakers will correct us.
>
> > We already have tests for casting bytes to integer data types in
> > strings.sql. I suggest moving the casting tests from bytea to uuid
> > into therel.
>
> I disagree on the grounds that there are zero tests related to UUID in
> strings.sql; uuid.sql is a more appropriate place for these tests IMO.
> However if someone seconds the idea we can easily move the tests at
> any time.
>
> > For the uuid.sql file, we could add a test to verify that
> > a UUID value remains unchanged when it's cast to bytea and back to
> > UUID. For example,
> >
> > SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
>
> Good point. Added.
>
> > base32hex_encode() doesn't seem to add '=' paddings, but is it
> > intentional? I don't see any description in RFC 4648 that we can omit
> > '=' paddings.
>
> You are right, both base32 and base32hex should add paddings;
> substring() can be used if necessary. Fixed.
>
> > I think the patch should add tests not only for uuid data type but
> > also for general cases like other encodings.
>
> Yes, and the good place for these tests would be closer to other tests
> for encode() and decode() i.e. strings.sql. Fixed.
>
> While working on it I noticed some inconsistencies between base32hex
> implementation and our current implementation of base64. As an
> example, we don't allow `=` input:
>
> ```
> =# SELECT decode('=', 'base64');
> ERROR:  unexpected "=" while decoding base64 sequence
> ```
>
> ... while base32hex did. I fixed such inconsistencies too.
>
> > In uuid.sql tests, how about adding some tests to check if base32hex
> > maintains the sortability of UUIDv7 data?
>
> Agree. Added.
>
> > I think we should update the documentation in the uuid section about
> > casting data between bytea and uuid. For references, we have a similar
> > description for bytea and integer[1].
>
> Fair point. Fixed.
>

Thank you for updating the patch!

I've reviewed both patches and have some comments.

* 0001 patch

+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+                errmsg("invalid input length for type %s", "uuid"),
+                errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));

I think we need to handle plural and singular forms depending on the
value. Or we can change it to "Expected size %d, got %d".

* 0002 patch:

                errhint("Valid encodings are \"%s\", \"%s\", \"%s\",
and \"%s\".",
                        "base64", "base64url", "escape", "hex")));

We need to add 'base32hex' here.

---
+                   ereport(ERROR,
+                           (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                            errmsg("unexpected \"=\" while decoding
%s sequence",
+                                   "base32hex")));

I think we can directly write 'base32hex' in the error message.

---
+   /* Verify no extra bits remain (padding bits should be zero) */
+   if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
- 1)) != 0)
+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("invalid base32hex end sequence"),
+                errhint("Input data has non-zero padding bits.")));

This code checks if the remaining bits of the input data are all zero.
IIUC we don't have a similar check for base64 and base64url. For
instance, the following input data is accepted:

=# select decode('AB', 'base64');
 decode
--------
 \x00
(1 row)

I think it's better to have consistent behavior across our encoding.

I've attached a patch for the 0002 patch part that fixes the above
points (except for the last point) and has some minor fixes as well.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] fix_0002_masahiko.patch (7.4K, 2-fix_0002_masahiko.patch)
  download | inline diff:
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 51be0463eec..7aa3805f1ec 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -815,14 +815,16 @@
        The <literal>base32hex</literal> format is that of
        <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
        RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
-       (0-9, A-V) which preserves sort order when encoding binary data.
-       The <function>encode</function> function produces padded output,
-       while <function>decode</function> accepts both padded and unpadded
-       input. Decoding is case-insensitive and ignores whitespace characters.
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves sort order
+       when encoding binary data. The <function>encode</function> function
+       produces padded output, while <function>decode</function> accepts both
+       padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
       </para>
       <para>
        This format can be used for encoding UUIDs in a compact, sortable format:
-       <literal>substring(encode(uuid_value :: bytea, 'base32hex') from 1 for 26)</literal>
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
        produces a 26-character string compared to the standard 36-character
        UUID representation.
       </para>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 6153a5cb5f2..ea11bc3f3b5 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -873,15 +873,11 @@ base32hex_encode(const char *src, size_t srclen, char *dst)
 
 	/* Handle remaining bits (if any) */
 	if (bits_in_buffer > 0)
-	{
 		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
-	}
 
 	/* Add padding to make length a multiple of 8 (per RFC 4648) */
 	while (output_pos % 8 != 0)
-	{
 		dst[output_pos++] = '=';
-	}
 
 	return output_pos;
 }
@@ -909,8 +905,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 		if (c == '=')
 		{
 			/*
-			 * Padding is only valid at positions 2, 4, 5, or 7 within an
-			 * 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
+			 * The first padding is only valid at positions 2, 4, 5, or 7 within
+			 * an 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
 			 * We only check the position for the first '=' character.
 			 */
 			if (!end)
@@ -918,8 +914,7 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-							 errmsg("unexpected \"=\" while decoding %s sequence",
-									"base32hex")));
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
 				end = true;
 			}
 			pos++;
@@ -930,9 +925,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 		if (end)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
-							pg_mblen((const char *) &c), (const char *) &c,
-							"base32hex")));
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
 
 		/* Decode base32hex character (0-9, A-V, case-insensitive) */
 		if (c >= '0' && c <= '9')
@@ -944,9 +938,8 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
-							pg_mblen((const char *) &c), (const char *) &c,
-							"base32hex")));
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
 
 		/* Add 5 bits to buffer */
 		bits_buffer = (bits_buffer << 5) | val;
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 910757537e7..53b1a14c895 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,10 +2600,10 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
 -- base32hex encoding/decoding
 --
@@ -2698,6 +2698,12 @@ SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
  \x11
 (1 row)
 
+SELECT decode('24=======', 'base32hex');  -- OK
+ decode 
+--------
+ \x11
+(1 row)
+
 SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
      decode     
 ----------------
@@ -2710,6 +2716,9 @@ SELECT decode('W', 'base32hex');  -- error
 ERROR:  invalid symbol "W" found while decoding base32hex sequence
 SELECT decode('24H36H0=24', 'base32hex'); -- error
 ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex'); -- error
+ERROR:  invalid base32hex end sequence
+HINT:  Input data has non-zero padding bits.
 --
 -- base64url encoding/decoding
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b5237e85172..6ebc192a9b1 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -856,10 +856,12 @@ SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
 SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
 
 SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24=======', 'base32hex');  -- OK
 SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
 SELECT decode('=', 'base32hex');  -- error
 SELECT decode('W', 'base32hex');  -- error
 SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex'); -- error
 
 
 --


^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-18 11:14     ` Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-18 11:14 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

Hi,

> I've attached a patch for the 0002 patch part that fixes the above
> points (except for the last point) and has some minor fixes as well.

Applied, thanks.

> +   /* Verify no extra bits remain (padding bits should be zero) */
> +   if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
> - 1)) != 0)
> +       ereport(ERROR,
> +               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                errmsg("invalid base32hex end sequence"),
> +                errhint("Input data has non-zero padding bits.")));
>
> This code checks if the remaining bits of the input data are all zero.
> IIUC we don't have a similar check for base64 and base64url. For
> instance, the following input data is accepted:
>
> =# select decode('AB', 'base64');
>  decode
> --------
>  \x00
> (1 row)
>
> I think it's better to have consistent behavior across our encoding.

Agree. Fixed.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [text/x-patch] v6-0001-Allow-explicit-casting-between-bytea-and-UUID.patch (6.2K, 2-v6-0001-Allow-explicit-casting-between-bytea-and-UUID.patch)
  download | inline diff:
From bb3958f39f50ad356b9aafd2f9f460700d4d48ff Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <[email protected]>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH v6 1/2] Allow explicit casting between bytea and UUID
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This enables using encode() and decode() to convert UUIDs to and from
alternative formats, such as base64.

Author:	Dagfinn Ilmari Mannsåker <[email protected]>
Author: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Jelte Fennema-Nio <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml         | 11 +++++++++++
 src/backend/utils/adt/bytea.c      | 27 +++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 22 ++++++++++++++++++++++
 src/test/regress/sql/uuid.sql      |  5 +++++
 6 files changed, 78 insertions(+)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3017c674040..f8264b119ab 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4439,6 +4439,17 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     Output is always in the standard form.
    </para>
 
+   <para>
+   It is possible to cast <type>uuid</type> values to and from type
+   <type>bytea</type>. This allows using <literal>encode()</literal>
+   and <literal>decode()</literal> functions for <type>uuid</type>.
+   Some examples:
+<programlisting>
+encode('1ea3d64c-bc40-4cc3-84bb-6b11ee31e5c2'::uuid::bytea, 'base64')
+decode('HqPWTLxATMOEu2sR7jHlwg==', 'base64')::uuid
+</programlisting>
+   </para>
+
    <para>
     See <xref linkend="functions-uuid"/> for how to generate a UUID in
     <productname>PostgreSQL</productname>.
diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index fd7662d41ee..4dc83671aa5 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -28,6 +28,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -1340,3 +1341,29 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
 	return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	pg_uuid_t  *uuid;
+
+	if (len != UUID_LEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid input length for type %s", "uuid"),
+				 errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
+
+	uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));
+	memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+	PG_RETURN_UUID_P(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+	return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index 9b1cfb1b590..a7b6d812c5a 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -362,6 +362,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc8d82665b8..84e7adde0e5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1208,6 +1208,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..d157ef7d0b3 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,27 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x5b35380a714349129b55f322699c6770
+(1 row)
+
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ 019a2f85-9ced-7225-b99d-9c55044a2563
+(1 row)
+
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+ERROR:  invalid input length for type uuid
+DETAIL:  Expected 16 bytes, got 8.
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
+ matched 
+---------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..f512f4dea1d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,11 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



  [text/x-patch] v6-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch (15.5K, 3-v6-0002-Add-base32hex-encoding-support-to-encode-and-deco.patch)
  download | inline diff:
From e14d4ce24a614c313ce8af4ff0df2fcdbc4d5d08 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v6 2/2] Add base32hex encoding support to encode() and
 decode()

Implement base32hex encoding/decoding per RFC 4648 Section 7 for
encode() and decode() functions. This encoding uses the extended hex
alphabet (0-9, A-V) which preserves sort order.

The encode() function produces padded output, while decode() accepts
both padded and unpadded input. Decoding is case-insensitive.

Author: Andrey Borodin <[email protected]>
Author: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Suggested-by: Sergey Prokhorenko <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 152 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 123 +++++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  31 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 350 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..7aa3805f1ec 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves sort order
+       when encoding binary data. The <function>encode</function> function
+       produces padded output, while <function>decode</function> accepts both
+       padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format can be used for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..793cc7f2a34 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,144 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	for (i = 0; i < srclen; i++)
+	{
+		unsigned char c = data[i];
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7 within
+			 * an 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
+			 * We only check the position for the first '=' character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +992,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..12d7da9695d 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,131 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24=======', 'base32hex');  -- OK
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..df25a6aa5c5 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig :: bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff'] :: uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..28d582f39bd 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,39 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24=======', 'base32hex');  -- OK
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..278a5773ada 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig :: bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff'] :: uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-18 17:52       ` Masahiko Sawada <[email protected]>
  2026-03-19 11:18         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  0 siblings, 2 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-18 17:52 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Wed, Mar 18, 2026 at 4:14 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > I've attached a patch for the 0002 patch part that fixes the above
> > points (except for the last point) and has some minor fixes as well.
>
> Applied, thanks.
>
> > +   /* Verify no extra bits remain (padding bits should be zero) */
> > +   if (bits_in_buffer > 0 && (bits_buffer & ((1ULL << bits_in_buffer)
> > - 1)) != 0)
> > +       ereport(ERROR,
> > +               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +                errmsg("invalid base32hex end sequence"),
> > +                errhint("Input data has non-zero padding bits.")));
> >
> > This code checks if the remaining bits of the input data are all zero.
> > IIUC we don't have a similar check for base64 and base64url. For
> > instance, the following input data is accepted:
> >
> > =# select decode('AB', 'base64');
> >  decode
> > --------
> >  \x00
> > (1 row)
> >
> > I think it's better to have consistent behavior across our encoding.
>
> Agree. Fixed.

Thank you for updating the patches!

I've made some minor changes to both patches (e.g., rewording the
documentation changes and commit messages etc), and attached the
updated patches.

I'm going to push these patches unless there is no further comment.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] v7-0002-Add-base32hex-support-to-encode-and-decode-functi.patch (15.8K, 2-v7-0002-Add-base32hex-support-to-encode-and-decode-functi.patch)
  download | inline diff:
From 5fc039df55008ff8578a16cb27697b55407bd6c5 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v7 2/2] Add base32hex support to encode() and decode()
 functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 152 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 117 ++++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  30 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 343 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..3f77f6d20b0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..793cc7f2a34 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,144 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	for (i = 0; i < srclen; i++)
+	{
+		unsigned char c = data[i];
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7 within
+			 * an 8-character group (corresponding to 1, 2, 3, or 4 input bytes).
+			 * We only check the position for the first '=' character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen((const char *) &c), (const char *) &c)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +992,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..75a4a4f38a6 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,125 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..d2a45a0f07c 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..9f86f2cac19 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,38 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..ee14802630a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.53.0



  [text/x-patch] v7-0001-Allow-explicit-casting-between-bytea-and-uuid.patch (6.5K, 3-v7-0001-Allow-explicit-casting-between-bytea-and-uuid.patch)
  download | inline diff:
From 113b779dc1c126c4c5d9ebf113b84603051d2d4f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <[email protected]>
Date: Tue, 28 Oct 2025 16:33:17 +0000
Subject: [PATCH v7 1/2] Allow explicit casting between bytea and uuid.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This enables the use of functions such as encode() and decode() with
UUID values, allowing them to be converted to and from alternative
formats like base64 or hex.

The cast maps the 16-byte internal representation of a UUID directly
to a bytea datum. This is more efficient than going through a text
forepresentation.

Author:	Dagfinn Ilmari Mannsåker <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Reviewed-by: Jelte Fennema-Nio <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml         | 11 +++++++++++
 src/backend/utils/adt/bytea.c      | 27 +++++++++++++++++++++++++++
 src/include/catalog/pg_cast.dat    |  6 ++++++
 src/include/catalog/pg_proc.dat    |  7 +++++++
 src/test/regress/expected/uuid.out | 22 ++++++++++++++++++++++
 src/test/regress/sql/uuid.sql      |  5 +++++
 6 files changed, 78 insertions(+)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3017c674040..d8d91678e86 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4439,6 +4439,17 @@ a0ee-bc99-9c0b-4ef8-bb6d-6bb9-bd38-0a11
     Output is always in the standard form.
    </para>
 
+   <para>
+    It is possible to cast <type>uuid</type> values to and from type
+    <type>bytea</type>. This is useful for using functions such as
+    <function>encode()</function> and <function>decode()</function>
+    with UUID values. For example:
+<programlisting>
+encode('1ea3d64c-bc40-4cc3-84bb-6b11ee31e5c2'::uuid::bytea, 'base64')
+decode('HqPWTLxATMOEu2sR7jHlwg==', 'base64')::uuid
+</programlisting>
+   </para>
+
    <para>
     See <xref linkend="functions-uuid"/> for how to generate a UUID in
     <productname>PostgreSQL</productname>.
diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
index fd7662d41ee..4dc83671aa5 100644
--- a/src/backend/utils/adt/bytea.c
+++ b/src/backend/utils/adt/bytea.c
@@ -28,6 +28,7 @@
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/sortsupport.h"
+#include "utils/uuid.h"
 #include "varatt.h"
 
 /* GUC variable */
@@ -1340,3 +1341,29 @@ int8_bytea(PG_FUNCTION_ARGS)
 {
 	return int8send(fcinfo);
 }
+
+/* Cast bytea -> uuid */
+Datum
+bytea_uuid(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	pg_uuid_t  *uuid;
+
+	if (len != UUID_LEN)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+				 errmsg("invalid input length for type %s", "uuid"),
+				 errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
+
+	uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));
+	memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
+	PG_RETURN_UUID_P(uuid);
+}
+
+/* Cast uuid -> bytea; can just use uuid_send() */
+Datum
+uuid_bytea(PG_FUNCTION_ARGS)
+{
+	return uuid_send(fcinfo);
+}
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index 9b1cfb1b590..a7b6d812c5a 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -362,6 +362,12 @@
 { castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
   castcontext => 'e', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and uuid type
+{ castsource => 'bytea', casttarget => 'uuid', castfunc => 'uuid(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'uuid', casttarget => 'bytea', castfunc => 'bytea(uuid)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fc8d82665b8..84e7adde0e5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1208,6 +1208,13 @@
   proname => 'int8', prorettype => 'int8', proargtypes => 'bytea',
   prosrc => 'bytea_int8' },
 
+{ oid => '9880', descr => 'convert uuid to bytea',
+  proname => 'bytea', prorettype => 'bytea', proargtypes => 'uuid',
+  prosrc => 'uuid_bytea' },
+{ oid => '9881', descr => 'convert bytea to uuid',
+  proname => 'uuid', prorettype => 'uuid', proargtypes => 'bytea',
+  prosrc => 'bytea_uuid' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 95392003b86..d157ef7d0b3 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -305,5 +305,27 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+               bytea                
+------------------------------------
+ \x5b35380a714349129b55f322699c6770
+(1 row)
+
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+                 uuid                 
+--------------------------------------
+ 019a2f85-9ced-7225-b99d-9c55044a2563
+(1 row)
+
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+ERROR:  invalid input length for type uuid
+DETAIL:  Expected 16 bytes, got 8.
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
+ matched 
+---------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 465153a0341..f512f4dea1d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -146,6 +146,11 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- casts
+SELECT '5b35380a-7143-4912-9b55-f322699c6770'::uuid::bytea;
+SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
+SELECT '\x1234567890abcdef'::bytea::uuid; -- error
+SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-19 11:18         ` Aleksander Alekseev <[email protected]>
  2026-03-19 12:12           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
  1 sibling, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-19 11:18 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: pgsql-hackers; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

Hi,

> I've made some minor changes to both patches (e.g., rewording the
> documentation changes and commit messages etc), and attached the
> updated patches.
>
> I'm going to push these patches unless there is no further comment.

Many thanks! One little nitpick.

In 0001:

"""
The cast maps the 16-byte internal representation of a UUID directly
to a bytea datum. This is more efficient than going through a text
forepresentation.
"""

I'm pretty confident there is no such word "forepresentation".


-- 
Best regards,
Aleksander Alekseev





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 11:18         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-19 12:12           ` Chengxi Sun <[email protected]>
  2026-03-19 12:18             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Chengxi Sun @ 2026-03-19 12:12 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

I have a concern with base32hex_decode(). It only checks where the first =
appears,
but it does not validate the final group length or the required amount of
padding.
Because of that, some invalid inputs are accepted silently.

For example:

postgres=# SET bytea_output = hex;
SET
postgres=# SELECT '0' AS input, decode('0', 'base32hex');
 input | decode
-------+--------
 0     | \x
(1 row)

postgres=# SELECT '000' AS input , decode('000', 'base32hex');
 input | decode
-------+--------
 000   | \x00
(1 row)

postgres=# SELECT '24=' as input , decode('24=', 'base32hex');
 input | decode
-------+--------
 24=   | \x11
(1 row)

These looks good, but if we verify that with python:
% python3 - <<'PY'
import base64

tests = [
    "24",
    "24======",
    "0",
    "000",
    "24=",
]

for s in tests:
    try:
        out = base64.b32hexdecode(s, casefold=True)
        print(f"{s!r} -> OK {out.hex()}")
    except Exception as e:
        print(f"{s!r} -> ERROR: {e}")
PY

The outputs are:
'24' -> ERROR: Incorrect padding
'24======' -> OK 11
'0' -> ERROR: Incorrect padding
'000' -> ERROR: Incorrect padding
'24=' -> ERROR: Incorrect padding

I might be missing some context here, so I wanted to ask: is this behavior
intentional,
or would it make sense to enforce stricter validation for Base32hex input?

Best regards,

Chengxi Sun


^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 11:18         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-19 12:12           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
@ 2026-03-19 12:18             ` Aleksander Alekseev <[email protected]>
  2026-03-19 14:14               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
  2026-03-19 19:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 2 replies; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-19 12:18 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Chengxi Sun <[email protected]>; Masahiko Sawada <[email protected]>; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

Hi,

> I might be missing some context here, so I wanted to ask: is this behavior intentional,
> or would it make sense to enforce stricter validation for Base32hex input?

That's intentional - see the discussion above:

"""
[...]
This code checks if the remaining bits of the input data are all zero.
IIUC we don't have a similar check for base64 and base64url. For
instance, the following input data is accepted:

=# select decode('AB', 'base64');
 decode
--------
 \x00
(1 row)
"""

Also see the documentation in respect of padding.

-- 
Best regards,
Aleksander Alekseev





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 11:18         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-19 12:12           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
  2026-03-19 12:18             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-19 14:14               ` Chengxi Sun <[email protected]>
  1 sibling, 0 replies; 26+ messages in thread

From: Chengxi Sun @ 2026-03-19 14:14 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Masahiko Sawada <[email protected]>; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

>
>
> This code checks if the remaining bits of the input data are all zero.
> IIUC we don't have a similar check for base64 and base64url. For
> instance, the following input data is accepted:
>
> =# select decode('AB', 'base64');
>  decode
> --------
>  \x00
> (1 row)
> """
>

Thanks for the clarification, that makes sense.

Best regards,


^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 11:18         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-19 12:12           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
  2026-03-19 12:18             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-19 19:24               ` Masahiko Sawada <[email protected]>
  1 sibling, 0 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-19 19:24 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Chengxi Sun <[email protected]>; Andrey Borodin <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Thu, Mar 19, 2026 at 5:19 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > I might be missing some context here, so I wanted to ask: is this behavior intentional,
> > or would it make sense to enforce stricter validation for Base32hex input?
>
> That's intentional - see the discussion above:
>
> """
> [...]
> This code checks if the remaining bits of the input data are all zero.
> IIUC we don't have a similar check for base64 and base64url. For
> instance, the following input data is accepted:
>
> =# select decode('AB', 'base64');
>  decode
> --------
>  \x00
> (1 row)
> """

Right. I've also tested base32hex encoding/decoding with other
libraries such as python's one. IIUC our base32hex implementation
doesn't necessarily work exactly the same as other libraries to have a
better consistency with the existing encodings such as base64 but I
believe that it doesn't contradict the RFC.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-19 14:36         ` Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  1 sibling, 1 reply; 26+ messages in thread

From: Dagfinn Ilmari Mannsåker @ 2026-03-19 14:36 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: Aleksander Alekseev <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>

Masahiko Sawada <[email protected]> writes:

> I've made some minor changes to both patches (e.g., rewording the
> documentation changes and commit messages etc), and attached the
> updated patches.
>
> I'm going to push these patches unless there is no further comment.

Just one minor nitpick on my patch, which is that it should use
palloc_object(), which I wasn't aware of when I wrote it originally.

> diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
> index fd7662d41ee..4dc83671aa5 100644
> --- a/src/backend/utils/adt/bytea.c
> +++ b/src/backend/utils/adt/bytea.c
[...]
> +	if (len != UUID_LEN)
> +		ereport(ERROR,
> +				(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
> +				 errmsg("invalid input length for type %s", "uuid"),
> +				 errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
> +
> +	uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));

this should be:

+	uuid = palloc_object(pg_uuid_t);

> +	memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
> +	PG_RETURN_UUID_P(uuid);
> +}
> +


- ilmari





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
@ 2026-03-19 21:33           ` Masahiko Sawada <[email protected]>
  2026-03-20 03:06             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Chengxi Sun <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  0 siblings, 2 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-19 21:33 UTC (permalink / raw)
  To: Dagfinn Ilmari Mannsåker <[email protected]>; +Cc: Aleksander Alekseev <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>

On Thu, Mar 19, 2026 at 7:36 AM Dagfinn Ilmari Mannsåker
<[email protected]> wrote:
>
> Masahiko Sawada <[email protected]> writes:
>
> > I've made some minor changes to both patches (e.g., rewording the
> > documentation changes and commit messages etc), and attached the
> > updated patches.
> >
> > I'm going to push these patches unless there is no further comment.
>
> Just one minor nitpick on my patch, which is that it should use
> palloc_object(), which I wasn't aware of when I wrote it originally.
>
> > diff --git a/src/backend/utils/adt/bytea.c b/src/backend/utils/adt/bytea.c
> > index fd7662d41ee..4dc83671aa5 100644
> > --- a/src/backend/utils/adt/bytea.c
> > +++ b/src/backend/utils/adt/bytea.c
> [...]
> > +     if (len != UUID_LEN)
> > +             ereport(ERROR,
> > +                             (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
> > +                              errmsg("invalid input length for type %s", "uuid"),
> > +                              errdetail("Expected %d bytes, got %d.", UUID_LEN, len)));
> > +
> > +     uuid = (pg_uuid_t *) palloc(sizeof(pg_uuid_t));
>
> this should be:
>
> +       uuid = palloc_object(pg_uuid_t);
>
> > +     memcpy(uuid->data, VARDATA_ANY(v), UUID_LEN);
> > +     PG_RETURN_UUID_P(uuid);
> > +}
> > +

Good catch. I've pushed the 0001 patch after incorporating this change.

For 0002 patch, I don't push it yet as I've found a bug in the
decoding code during the self-review:

+           ereport(ERROR,
+                   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("invalid symbol \"%.*s\" found while
decoding base32hex sequence",
+                           pg_mblen((const char *) &c), (const char *) &c)));

We should not use pg_mblen() anymore (c.f., CVE-2026-2006). And since
'c' is just a single byte on the stack, it leads to a buffer over-read
if the invalid character is a multi-byte character.

Also, a small nitpick is that we can use uint32 instead of uint64 for
'bits_buffer'. I've attached the updated patch as well as the
difference from the previous version.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] fix_issue_masahiko.patch (2.6K, 2-fix_issue_masahiko.patch)
  download | inline diff:
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 214d835c624..dc86df27efa 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -885,17 +885,17 @@ base32hex_encode(const char *src, size_t srclen, char *dst)
 static uint64
 base32hex_decode(const char *src, size_t srclen, char *dst)
 {
-	const unsigned char *data = (const unsigned char *) src;
-	uint64		bits_buffer = 0;
+	const char	*srcend = src + srclen,
+		*s = src;
+	uint32		bits_buffer = 0;
 	int			bits_in_buffer = 0;
 	uint64		output_pos = 0;
-	size_t		i;
 	int			pos = 0;		/* position within 8-character group (0-7) */
 	bool		end = false;	/* have we seen padding? */
 
-	for (i = 0; i < srclen; i++)
+	while (s < srcend)
 	{
-		unsigned char c = data[i];
+		char c = *s++;
 		int			val;
 
 		/* Skip whitespace */
@@ -927,7 +927,7 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
-							pg_mblen((const char *) &c), (const char *) &c)));
+							pg_mblen_range(s - 1, srcend), s - 1)));
 
 		/* Decode base32hex character (0-9, A-V, case-insensitive) */
 		if (c >= '0' && c <= '9')
@@ -940,7 +940,7 @@ base32hex_decode(const char *src, size_t srclen, char *dst)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
-							pg_mblen((const char *) &c), (const char *) &c)));
+							pg_mblen_range(s - 1, srcend), s - 1)));
 
 		/* Add 5 bits to buffer */
 		bits_buffer = (bits_buffer << 5) | val;
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 75a4a4f38a6..0166a57b0d4 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2716,6 +2716,8 @@ SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (c
  \x08
 (1 row)
 
+SELECT decode('あ', 'base32hex'); -- error
+ERROR:  invalid symbol "あ" found while decoding base32hex sequence
 --
 -- base64url encoding/decoding
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 9f86f2cac19..13fcfe21241 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -861,6 +861,7 @@ SELECT decode('=', 'base32hex');  -- error
 SELECT decode('W', 'base32hex');  -- error
 SELECT decode('24H36H0=24', 'base32hex'); -- error
 SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+SELECT decode('あ', 'base32hex'); -- error
 
 
 --


  [text/x-patch] v8-0001-Add-base32hex-support-to-encode-and-decode-functi.patch (15.9K, 3-v8-0001-Add-base32hex-support-to-encode-and-decode-functi.patch)
  download | inline diff:
From 10b3df363ed5c1ff491e4e006f6b0528fc930ad2 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v8] Add base32hex support to encode() and decode() functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 153 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 119 +++++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  31 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 347 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..3f77f6d20b0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..dc86df27efa 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,145 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint64		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const char	*srcend = src + srclen,
+		*s = src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	while (s < srcend)
+	{
+		char c = *s++;
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7
+			 * within an 8-character group (corresponding to 1, 2, 3, or 4
+			 * input bytes). We only check the position for the first '='
+			 * character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1ULL << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +993,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..0166a57b0d4 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,127 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+SELECT decode('あ', 'base32hex'); -- error
+ERROR:  invalid symbol "あ" found while decoding base32hex sequence
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..d2a45a0f07c 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..13fcfe21241 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,39 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+SELECT decode('あ', 'base32hex'); -- error
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..ee14802630a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-20 03:06             ` Chengxi Sun <[email protected]>
  1 sibling, 0 replies; 26+ messages in thread

From: Chengxi Sun @ 2026-03-20 03:06 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: Dagfinn Ilmari Mannsåker <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Andrey Borodin <[email protected]>

I've run regression tests, LGTM overall.

Patch v8-0001 had some formatting issues, but they are already addressed in
the fix-issue patch.

Best regards,

Chengxi Sun


^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-20 13:02             ` Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  1 sibling, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-20 13:02 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Masahiko Sawada <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

Hi,

> Also, a small nitpick is that we can use uint32 instead of uint64 for
> 'bits_buffer'. I've attached the updated patch as well as the
> difference from the previous version.

Then I suggest using uint32 for the bits_buffer variable in
base32hex_encode() too. Also we should use 1U instead of 1ULL with
uint32.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [text/x-patch] v9-0001-Add-base32hex-support-to-encode-and-decode-functi.patch (15.9K, 2-v9-0001-Add-base32hex-support-to-encode-and-decode-functi.patch)
  download | inline diff:
From 02758297caf1022c3f07838b322113238c2eaa90 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v9] Add base32hex support to encode() and decode() functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 153 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 119 +++++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  31 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 347 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..3f77f6d20b0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..334fa080b95 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,145 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const char *srcend = src + srclen,
+			   *s = src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	while (s < srcend)
+	{
+		char		c = *s++;
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7
+			 * within an 8-character group (corresponding to 1, 2, 3, or 4
+			 * input bytes). We only check the position for the first '='
+			 * character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +993,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..0166a57b0d4 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,127 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+SELECT decode('あ', 'base32hex'); -- error
+ERROR:  invalid symbol "あ" found while decoding base32hex sequence
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..d2a45a0f07c 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..13fcfe21241 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,39 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+SELECT decode('あ', 'base32hex'); -- error
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..ee14802630a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-20 14:24               ` Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-20 14:24 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Masahiko Sawada <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

Hi,

> > Also, a small nitpick is that we can use uint32 instead of uint64 for
> > 'bits_buffer'. I've attached the updated patch as well as the
> > difference from the previous version.
>
> Then I suggest using uint32 for the bits_buffer variable in
> base32hex_encode() too. Also we should use 1U instead of 1ULL with
> uint32.

CI is not happy with the new test:

```
 SELECT decode('あ', 'base32hex'); -- error
-ERROR:  invalid symbol "あ" found while decoding base32hex sequence
+ERROR:  invalid symbol "ã" found while decoding base32hex sequence
```

Although it passes locally. My best guess is that something is off
with the database encoding on CI and that we shouldn't use this test.
We have a similar test which uses ASCII symbols only.

-- 
Best regards,
Aleksander Alekseev


Attachments:

  [text/x-patch] v10-0001-Add-base32hex-support-to-encode-and-decode-funct.patch (15.7K, 2-v10-0001-Add-base32hex-support-to-encode-and-decode-funct.patch)
  download | inline diff:
From cf76996ba9370e7c8a0aabf3a322f15ff1353600 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v10] Add base32hex support to encode() and decode() functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 153 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 117 ++++++++++++++++-
 src/test/regress/expected/uuid.out       |  16 +++
 src/test/regress/sql/strings.sql         |  30 ++++-
 src/test/regress/sql/uuid.sql            |   9 ++
 6 files changed, 344 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..3f77f6d20b0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..334fa080b95 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,145 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const char *srcend = src + srclen,
+			   *s = src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	while (s < srcend)
+	{
+		char		c = *s++;
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7
+			 * within an 8-character group (corresponding to 1, 2, 3, or 4
+			 * input bytes). We only check the position for the first '='
+			 * character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +993,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..75a4a4f38a6 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,125 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..d2a45a0f07c 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -327,5 +327,21 @@ SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
  t
 (1 row)
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+                 orig                 |               enc                
+--------------------------------------+----------------------------------
+ 00000000-0000-0000-0000-000000000000 | 00000000000000000000000000======
+ 11111111-1111-1111-1111-111111111111 | 248H248H248H248H248H248H24======
+ 123e4567-e89b-12d3-a456-426614174000 | 28V4APV8JC9D792M89J185Q000======
+ ffffffff-ffff-ffff-ffff-ffffffffffff | VVVVVVVVVVVVVVVVVVVVVVVVVS======
+(4 rows)
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..9f86f2cac19 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,38 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..ee14802630a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -152,5 +152,14 @@ SELECT '\x019a2f859ced7225b99d9c55044a2563'::bytea::uuid;
 SELECT '\x1234567890abcdef'::bytea::uuid; -- error
 SELECT v = v::bytea::uuid as matched FROM gen_random_uuid() v;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT orig, encode(orig::bytea, 'base32hex') AS enc
+FROM unnest(ARRAY[
+    '123e4567-e89b-12d3-a456-426614174000',
+    '00000000-0000-0000-0000-000000000000',
+    '11111111-1111-1111-1111-111111111111',
+    'ffffffff-ffff-ffff-ffff-ffffffffffff']::uuid[]
+) AS orig ORDER BY enc;
+
 -- clean up
 DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-24 01:17                 ` Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-24 01:17 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

On Fri, Mar 20, 2026 at 7:24 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > > Also, a small nitpick is that we can use uint32 instead of uint64 for
> > > 'bits_buffer'. I've attached the updated patch as well as the
> > > difference from the previous version.
> >
> > Then I suggest using uint32 for the bits_buffer variable in
> > base32hex_encode() too. Also we should use 1U instead of 1ULL with
> > uint32.
>
> CI is not happy with the new test:
>
> ```
>  SELECT decode('あ', 'base32hex'); -- error
> -ERROR:  invalid symbol "あ" found while decoding base32hex sequence
> +ERROR:  invalid symbol "ã" found while decoding base32hex sequence
> ```
>
> Although it passes locally. My best guess is that something is off
> with the database encoding on CI and that we shouldn't use this test.
> We have a similar test which uses ASCII symbols only.

Good catch. Yes, we should not use this test depending on the database
encoding and it seems we can omit this test in the first place.

The patch looks basically good to me. I've made some changes to the
regression test part as I want to have round-trip tests. I've merged
the tests checking the sortability to the existing tests and added
round-trip tests. With this change, we can test round-trip tests and
sortability tests with random UUID value in every test run while
minimizing the test time. Feedback is very welcome.


Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v11-0001-Add-base32hex-support-to-encode-and-decode-funct.patch (17.1K, 2-v11-0001-Add-base32hex-support-to-encode-and-decode-funct.patch)
  download | inline diff:
From 9460ad2548369afcd7fe6c3f3e779415589ebeb6 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v11] Add base32hex support to encode() and decode() functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Reviewed-by: Chengxi Sun <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 153 ++++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 133 +++++++++++++++++++-
 src/test/regress/expected/uuid.out       |  18 ++-
 src/test/regress/sql/strings.sql         |  47 ++++++-
 src/test/regress/sql/uuid.sql            |   8 +-
 6 files changed, 373 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..3f77f6d20b0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -729,6 +729,7 @@
        <parameter>format</parameter> values are:
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
        <link linkend="encode-format-hex"><literal>hex</literal></link>.
       </para>
@@ -804,6 +805,32 @@
      </listitem>
     </varlistentry>
 
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>substring(encode(uuid_value::bytea, 'base32hex') FROM 1 FOR 26)</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-escape">
      <term>escape
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..334fa080b95 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base64", "base64url", "base32hex", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,145 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Decode length is (srclen * 5) / 8, but we may have padding */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const char *srcend = src + srclen,
+			   *s = src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	while (s < srcend)
+	{
+		char		c = *s++;
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7
+			 * within an 8-character group (corresponding to 1, 2, 3, or 4
+			 * input bytes). We only check the position for the first '='
+			 * character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +993,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..59c463ae514 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,141 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base64", "base64url", "base32hex", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+-- Check round-trip capability of base32hex encoding for multiple random UUIDs.
+DO $$
+DECLARE
+  v1 uuid;
+  v2 uuid;
+BEGIN
+  FOR i IN 1..10 LOOP
+    v1 := gen_random_uuid();
+    v2 := decode(encode(v1::bytea, 'base32hex'), 'base32hex')::uuid;
+
+    IF v1 != v2 THEN
+      RAISE NOTICE 'base32hex encoding round-trip failed, expected % got %', v1, v2;
+    END IF;
+  END LOOP;
+END;
+$$;
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..142c529e693 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -13,7 +13,8 @@ CREATE TABLE guid2
 CREATE TABLE guid3
 (
 	id SERIAL,
-	guid_field UUID
+	guid_field UUID,
+	guid_encoded text GENERATED ALWAYS AS (encode(guid_field::bytea, 'base32hex')) STORED
 );
 -- inserting invalid data tests
 -- too long
@@ -226,11 +227,20 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 (1 row)
 
 -- test sortability of v7
+INSERT INTO guid3 (guid_field) VALUES ('00000000-0000-0000-0000-000000000000'::uuid);
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+INSERT INTO guid3 (guid_field) VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
-       array_agg        
-------------------------
- {1,2,3,4,5,6,7,8,9,10}
+          array_agg           
+------------------------------
+ {1,2,3,4,5,6,7,8,9,10,11,12}
+(1 row)
+
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+          array_agg           
+------------------------------
+ {1,2,3,4,5,6,7,8,9,10,11,12}
 (1 row)
 
 -- Check the timestamp offsets for v7.
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..dec2bd7c5e8 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,55 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+SELECT decode('24', 'base32hex');  -- OK, padding `=` are optional
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+
+-- Check round-trip capability of base32hex encoding for multiple random UUIDs.
+DO $$
+DECLARE
+  v1 uuid;
+  v2 uuid;
+BEGIN
+  FOR i IN 1..10 LOOP
+    v1 := gen_random_uuid();
+    v2 := decode(encode(v1::bytea, 'base32hex'), 'base32hex')::uuid;
+
+    IF v1 != v2 THEN
+      RAISE NOTICE 'base32hex encoding round-trip failed, expected % got %', v1, v2;
+    END IF;
+  END LOOP;
+END;
+$$;
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..f2ff00f5ddd 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -13,7 +13,8 @@ CREATE TABLE guid2
 CREATE TABLE guid3
 (
 	id SERIAL,
-	guid_field UUID
+	guid_field UUID,
+	guid_encoded text GENERATED ALWAYS AS (encode(guid_field::bytea, 'base32hex')) STORED
 );
 
 -- inserting invalid data tests
@@ -116,9 +117,14 @@ INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
 SELECT count(DISTINCT guid_field) FROM guid1;
 
 -- test sortability of v7
+INSERT INTO guid3 (guid_field) VALUES ('00000000-0000-0000-0000-000000000000'::uuid);
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+INSERT INTO guid3 (guid_field) VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+
 -- Check the timestamp offsets for v7.
 --
 -- generate UUIDv7 values with timestamps ranging from 1970 (the Unix epoch year)
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-24 15:31                   ` Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Aleksander Alekseev @ 2026-03-24 15:31 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Chao Li <[email protected]>; Masahiko Sawada <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

Hi,

> > The patch looks basically good to me. I've made some changes to the
> > regression test part as I want to have round-trip tests. I've merged
> > the tests checking the sortability to the existing tests and added
> > round-trip tests. With this change, we can test round-trip tests and
> > sortability tests with random UUID value in every test run while
> > minimizing the test time. Feedback is very welcome.

v11 looks good to me.

> It looks like leading, trailing, and embedded whitespace are all ignored. But I don’t see a test case covering this behavior, so maybe it would be good to add one.

I intentionally didn't include this test because the code is trivial:

``
        /* Skip whitespace */
        if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
            continue;
```

And also because we never tested it for base64. If we want to start
testing it we should add tests both for base64 and base32hex which IMO
should be a separate patch.

-- 
Best regards,
Aleksander Alekseev





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
@ 2026-03-24 17:26                     ` Masahiko Sawada <[email protected]>
  2026-03-26 01:09                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 16:37                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Andrey Borodin <[email protected]>
  2026-03-26 17:30                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 3 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-24 17:26 UTC (permalink / raw)
  To: Aleksander Alekseev <[email protected]>; +Cc: pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

On Tue, Mar 24, 2026 at 8:31 AM Aleksander Alekseev
<[email protected]> wrote:
>
> Hi,
>
> > > The patch looks basically good to me. I've made some changes to the
> > > regression test part as I want to have round-trip tests. I've merged
> > > the tests checking the sortability to the existing tests and added
> > > round-trip tests. With this change, we can test round-trip tests and
> > > sortability tests with random UUID value in every test run while
> > > minimizing the test time. Feedback is very welcome.
>
> v11 looks good to me.
>
> > It looks like leading, trailing, and embedded whitespace are all ignored. But I don’t see a test case covering this behavior, so maybe it would be good to add one.
>
> I intentionally didn't include this test because the code is trivial:
>
> ``
>         /* Skip whitespace */
>         if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
>             continue;
> ```
>
> And also because we never tested it for base64. If we want to start
> testing it we should add tests both for base64 and base32hex which IMO
> should be a separate patch.

Agreed.

I've attached the updated version patch that includes the following points:

- changed the order of encodings in the doc and the hint message to
maintain alphabetical order.
- changed the query example to extract data from the encoded UUID
value to use rtrim() as it's more intuitive.
- added some regression tests for decoding unpadded inputs.

I'm going to push the patch unless there are comments on these changes.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v12-0001-Add-base32hex-support-to-encode-and-decode-funct.patch (17.8K, 2-v12-0001-Add-base32hex-support-to-encode-and-decode-funct.patch)
  download | inline diff:
From 243404426a0b367d81b3c56af12ab59858a64bf6 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Wed, 29 Oct 2025 15:53:12 +0400
Subject: [PATCH v12] Add base32hex support to encode() and decode() functions.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This adds support for base32hex encoding and decoding, as defined in
RFC 4648 Section 7. Unlike standard base32, base32hex uses the
extended hex alphabet (0-9, A-V) which preserves the lexicographical
order of the encoded data.

This is particularly useful for representing UUIDv7 values in a
compact string format while maintaining their time-ordered sort
property.

The encode() function produces output padded with '=', while decode()
accepts both padded and unpadded input. Following the behavior of
other encoding types, decoding is case-insensitive.

Suggested-by: Sergey Prokhorenko <[email protected]>
Author: Andrey Borodin <[email protected]>
Co-authored-by: Aleksander Alekseev <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Илья Чердаков <[email protected]>
Reviewed-by: Chengxi Sun <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Discussion: https://postgr.es/m/CAJ7c6TOramr1UTLcyB128LWMqita1Y7%3Darq3KHaU%3Dqikf5yKOQ%40mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml |  27 ++++
 src/backend/utils/adt/encode.c           | 153 +++++++++++++++++++++-
 src/test/regress/expected/strings.out    | 160 ++++++++++++++++++++++-
 src/test/regress/expected/uuid.out       |  18 ++-
 src/test/regress/sql/strings.sql         |  56 +++++++-
 src/test/regress/sql/uuid.sql            |   8 +-
 6 files changed, 409 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index b256381e01f..0aaf9bc68f1 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -727,6 +727,7 @@
       <para>
        Encodes binary data into a textual representation; supported
        <parameter>format</parameter> values are:
+       <link linkend="encode-format-base32hex"><literal>base32hex</literal></link>,
        <link linkend="encode-format-base64"><literal>base64</literal></link>,
        <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
        <link linkend="encode-format-escape"><literal>escape</literal></link>,
@@ -766,6 +767,32 @@
    functions support the following textual formats:
 
    <variablelist>
+    <varlistentry id="encode-format-base32hex">
+     <term>base32hex
+      <indexterm>
+       <primary>base32hex format</primary>
+      </indexterm></term>
+     <listitem>
+      <para>
+       The <literal>base32hex</literal> format is that of
+       <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
+       RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
+       (<literal>0</literal>-<literal>9</literal> and
+       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
+       sort order of the encoded data. The <function>encode</function> function
+       produces output padded with <literal>'='</literal>, while <function>decode</function>
+       accepts both padded and unpadded input. Decoding is case-insensitive and ignores
+       whitespace characters.
+      </para>
+      <para>
+       This format is useful for encoding UUIDs in a compact, sortable format:
+       <literal>rtrim(encode(uuid_value::bytea, 'base32hex'), '=')</literal>
+       produces a 26-character string compared to the standard 36-character
+       UUID representation.
+      </para>
+     </listitem>
+    </varlistentry>
+
     <varlistentry id="encode-format-base64">
      <term>base64
      <indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index f5f835e944a..5f1645e8b14 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -65,8 +65,8 @@ binary_encode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base32hex", "base64", "base64url", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -115,8 +115,8 @@ binary_decode(PG_FUNCTION_ARGS)
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized encoding: \"%s\"", namebuf),
-				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", and \"%s\".",
-						 "base64", "base64url", "escape", "hex")));
+				 errhint("Valid encodings are \"%s\", \"%s\", \"%s\", \"%s\", and \"%s\".",
+						 "base32hex", "base64", "base64url", "escape", "hex")));
 
 	dataptr = VARDATA_ANY(data);
 	datalen = VARSIZE_ANY_EXHDR(data);
@@ -825,6 +825,145 @@ esc_dec_len(const char *src, size_t srclen)
 	return len;
 }
 
+/*
+ * BASE32HEX
+ */
+
+static const char base32hex_table[] = "0123456789ABCDEFGHIJKLMNOPQRSTUV";
+
+static uint64
+base32hex_enc_len(const char *src, size_t srclen)
+{
+	/* 5 bytes encode to 8 characters, round up to multiple of 8 for padding */
+	return ((uint64) srclen + 4) / 5 * 8;
+}
+
+static uint64
+base32hex_dec_len(const char *src, size_t srclen)
+{
+	/* Each 8 characters of input produces at most 5 bytes of output */
+	return ((uint64) srclen * 5) / 8;
+}
+
+static uint64
+base32hex_encode(const char *src, size_t srclen, char *dst)
+{
+	const unsigned char *data = (const unsigned char *) src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	size_t		i;
+
+	for (i = 0; i < srclen; i++)
+	{
+		/* Add 8 bits to the buffer */
+		bits_buffer = (bits_buffer << 8) | data[i];
+		bits_in_buffer += 8;
+
+		/* Extract 5-bit chunks while we have enough bits */
+		while (bits_in_buffer >= 5)
+		{
+			bits_in_buffer -= 5;
+			/* Extract top 5 bits */
+			dst[output_pos++] = base32hex_table[(bits_buffer >> bits_in_buffer) & 0x1F];
+			/* Clear the extracted bits by masking */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+	}
+
+	/* Handle remaining bits (if any) */
+	if (bits_in_buffer > 0)
+		dst[output_pos++] = base32hex_table[(bits_buffer << (5 - bits_in_buffer)) & 0x1F];
+
+	/* Add padding to make length a multiple of 8 (per RFC 4648) */
+	while (output_pos % 8 != 0)
+		dst[output_pos++] = '=';
+
+	return output_pos;
+}
+
+static uint64
+base32hex_decode(const char *src, size_t srclen, char *dst)
+{
+	const char *srcend = src + srclen,
+			   *s = src;
+	uint32		bits_buffer = 0;
+	int			bits_in_buffer = 0;
+	uint64		output_pos = 0;
+	int			pos = 0;		/* position within 8-character group (0-7) */
+	bool		end = false;	/* have we seen padding? */
+
+	while (s < srcend)
+	{
+		char		c = *s++;
+		int			val;
+
+		/* Skip whitespace */
+		if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
+			continue;
+
+		if (c == '=')
+		{
+			/*
+			 * The first padding is only valid at positions 2, 4, 5, or 7
+			 * within an 8-character group (corresponding to 1, 2, 3, or 4
+			 * input bytes). We only check the position for the first '='
+			 * character.
+			 */
+			if (!end)
+			{
+				if (pos != 2 && pos != 4 && pos != 5 && pos != 7)
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+							 errmsg("unexpected \"=\" while decoding base32hex sequence")));
+				end = true;
+			}
+			pos++;
+			continue;
+		}
+
+		/* No data characters allowed after padding */
+		if (end)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Decode base32hex character (0-9, A-V, case-insensitive) */
+		if (c >= '0' && c <= '9')
+			val = c - '0';
+		else if (c >= 'A' && c <= 'V')
+			val = c - 'A' + 10;
+		else if (c >= 'a' && c <= 'v')
+			val = c - 'a' + 10;
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid symbol \"%.*s\" found while decoding base32hex sequence",
+							pg_mblen_range(s - 1, srcend), s - 1)));
+
+		/* Add 5 bits to buffer */
+		bits_buffer = (bits_buffer << 5) | val;
+		bits_in_buffer += 5;
+		pos++;
+
+		/* Extract 8-bit bytes when we have enough bits */
+		while (bits_in_buffer >= 8)
+		{
+			bits_in_buffer -= 8;
+			dst[output_pos++] = (unsigned char) (bits_buffer >> bits_in_buffer);
+			/* Clear the extracted bits */
+			bits_buffer &= ((1U << bits_in_buffer) - 1);
+		}
+
+		/* Reset position after each complete 8-character group */
+		if (pos == 8)
+			pos = 0;
+	}
+
+	return output_pos;
+}
+
 /*
  * Common
  */
@@ -854,6 +993,12 @@ static const struct
 			pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
 		}
 	},
+	{
+		"base32hex",
+		{
+			base32hex_enc_len, base32hex_dec_len, base32hex_encode, base32hex_decode
+		}
+	},
 	{
 		"escape",
 		{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f38688b5c37..cc8de98a74a 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2600,14 +2600,168 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
 -- report an error with a hint listing valid encodings when an invalid encoding is specified
 SELECT encode('\x01'::bytea, 'invalid');  -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base32hex", "base64", "base64url", "escape", and "hex".
 SELECT decode('00', 'invalid');           -- error
 ERROR:  unrecognized encoding: "invalid"
-HINT:  Valid encodings are "base64", "base64url", "escape", and "hex".
+HINT:  Valid encodings are "base32hex", "base64", "base64url", "escape", and "hex".
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
+SELECT encode('', 'base32hex');  -- ''
+ encode 
+--------
+ 
+(1 row)
+
+SELECT encode('\x11', 'base32hex');  -- '24======'
+  encode  
+----------
+ 24======
+(1 row)
+
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+  encode  
+----------
+ 24H0====
+(1 row)
+
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+  encode  
+----------
+ 24H36===
+(1 row)
+
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+  encode  
+----------
+ 24H36H0=
+(1 row)
+
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+  encode  
+----------
+ 24H36H2L
+(1 row)
+
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+      encode      
+------------------
+ 24H36H2LCO======
+(1 row)
+
+SELECT decode('', 'base32hex');  -- ''
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('24======', 'base32hex');  -- \x11
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+ decode 
+--------
+ \x1122
+(1 row)
+
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+    decode    
+--------------
+ \x1122334455
+(1 row)
+
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+-- Tests for decoding unpadded base32hex strings. Padding '=' are optional.
+SELECT decode('24', 'base32hex');
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H', 'base32hex');
+ decode 
+--------
+ \x11
+(1 row)
+
+SELECT decode('24H36', 'base32hex');
+  decode  
+----------
+ \x112233
+(1 row)
+
+SELECT decode('24H36H0', 'base32hex');
+   decode   
+------------
+ \x11223344
+(1 row)
+
+SELECT decode('2', 'base32hex'); -- \x, 5 bits isn't enough for a byte, so nothing is emitted
+ decode 
+--------
+ \x
+(1 row)
+
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+ decode 
+--------
+ \x08
+(1 row)
+
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+     decode     
+----------------
+ \x112233445566
+(1 row)
+
+SELECT decode('=', 'base32hex');  -- error
+ERROR:  unexpected "=" while decoding base32hex sequence
+SELECT decode('W', 'base32hex');  -- error
+ERROR:  invalid symbol "W" found while decoding base32hex sequence
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+ERROR:  invalid symbol "2" found while decoding base32hex sequence
+-- Check round-trip capability of base32hex encoding for multiple random UUIDs.
+DO $$
+DECLARE
+  v1 uuid;
+  v2 uuid;
+BEGIN
+  FOR i IN 1..10 LOOP
+    v1 := gen_random_uuid();
+    v2 := decode(encode(v1::bytea, 'base32hex'), 'base32hex')::uuid;
+
+    IF v1 != v2 THEN
+      RAISE EXCEPTION 'base32hex encoding round-trip failed, expected % got %', v1, v2;
+    END IF;
+  END LOOP;
+  RAISE NOTICE 'OK';
+END;
+$$;
+NOTICE:  OK
+--
+-- base64url encoding/decoding
+--
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
  encode 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index d157ef7d0b3..142c529e693 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -13,7 +13,8 @@ CREATE TABLE guid2
 CREATE TABLE guid3
 (
 	id SERIAL,
-	guid_field UUID
+	guid_field UUID,
+	guid_encoded text GENERATED ALWAYS AS (encode(guid_field::bytea, 'base32hex')) STORED
 );
 -- inserting invalid data tests
 -- too long
@@ -226,11 +227,20 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 (1 row)
 
 -- test sortability of v7
+INSERT INTO guid3 (guid_field) VALUES ('00000000-0000-0000-0000-000000000000'::uuid);
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+INSERT INTO guid3 (guid_field) VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
-       array_agg        
-------------------------
- {1,2,3,4,5,6,7,8,9,10}
+          array_agg           
+------------------------------
+ {1,2,3,4,5,6,7,8,9,10,11,12}
+(1 row)
+
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+          array_agg           
+------------------------------
+ {1,2,3,4,5,6,7,8,9,10,11,12}
 (1 row)
 
 -- Check the timestamp offsets for v7.
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d8a09737668..c1d240cea6c 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -835,10 +835,64 @@ SELECT encode('\x01'::bytea, 'invalid');  -- error
 SELECT decode('00', 'invalid');           -- error
 
 --
--- base64url encoding/decoding
+-- base32hex encoding/decoding
 --
 SET bytea_output TO hex;
 
+SELECT encode('', 'base32hex');  -- ''
+SELECT encode('\x11', 'base32hex');  -- '24======'
+SELECT encode('\x1122', 'base32hex');  -- '24H0===='
+SELECT encode('\x112233', 'base32hex');  -- '24H36==='
+SELECT encode('\x11223344', 'base32hex');  -- '24H36H0='
+SELECT encode('\x1122334455', 'base32hex');  -- '24H36H2L'
+SELECT encode('\x112233445566', 'base32hex');  -- '24H36H2LCO======'
+
+SELECT decode('', 'base32hex');  -- ''
+SELECT decode('24======', 'base32hex');  -- \x11
+SELECT decode('24H0====', 'base32hex');  -- \x1122
+SELECT decode('24H36===', 'base32hex');  -- \x112233
+SELECT decode('24H36H0=', 'base32hex');  -- \x11223344
+SELECT decode('24H36H2L', 'base32hex');  -- \x1122334455
+SELECT decode('24H36H2LCO======', 'base32hex');  -- \x112233445566
+
+-- Tests for decoding unpadded base32hex strings. Padding '=' are optional.
+SELECT decode('24', 'base32hex');
+SELECT decode('24H', 'base32hex');
+SELECT decode('24H36', 'base32hex');
+SELECT decode('24H36H0', 'base32hex');
+
+SELECT decode('2', 'base32hex'); -- \x, 5 bits isn't enough for a byte, so nothing is emitted
+
+SELECT decode('11=', 'base32hex');  -- OK, non-zero padding bits are accepted (consistent with base64)
+SELECT decode('24h36h2lco', 'base32hex');  -- OK, the encoding is case-insensitive
+
+SELECT decode('=', 'base32hex');  -- error
+SELECT decode('W', 'base32hex');  -- error
+SELECT decode('24H36H0=24', 'base32hex'); -- error
+
+-- Check round-trip capability of base32hex encoding for multiple random UUIDs.
+DO $$
+DECLARE
+  v1 uuid;
+  v2 uuid;
+BEGIN
+  FOR i IN 1..10 LOOP
+    v1 := gen_random_uuid();
+    v2 := decode(encode(v1::bytea, 'base32hex'), 'base32hex')::uuid;
+
+    IF v1 != v2 THEN
+      RAISE EXCEPTION 'base32hex encoding round-trip failed, expected % got %', v1, v2;
+    END IF;
+  END LOOP;
+  RAISE NOTICE 'OK';
+END;
+$$;
+
+
+--
+-- base64url encoding/decoding
+--
+
 -- Simple encoding/decoding
 SELECT encode('\x69b73eff', 'base64url');  -- abc-_w
 SELECT decode('abc-_w', 'base64url');      -- \x69b73eff
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f512f4dea1d..f2ff00f5ddd 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -13,7 +13,8 @@ CREATE TABLE guid2
 CREATE TABLE guid3
 (
 	id SERIAL,
-	guid_field UUID
+	guid_field UUID,
+	guid_encoded text GENERATED ALWAYS AS (encode(guid_field::bytea, 'base32hex')) STORED
 );
 
 -- inserting invalid data tests
@@ -116,9 +117,14 @@ INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
 SELECT count(DISTINCT guid_field) FROM guid1;
 
 -- test sortability of v7
+INSERT INTO guid3 (guid_field) VALUES ('00000000-0000-0000-0000-000000000000'::uuid);
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+INSERT INTO guid3 (guid_field) VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
+-- make sure base32hex encoding works with UUIDs and preserves ordering
+SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+
 -- Check the timestamp offsets for v7.
 --
 -- generate UUIDv7 values with timestamps ranging from 1970 (the Unix epoch year)
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 01:09                       ` Masahiko Sawada <[email protected]>
  2026-03-26 01:21                         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-26 01:09 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
>
> Tomas Vondra <[email protected]> writes:
> > On 3/26/26 00:40, Tom Lane wrote:
> >> I believe what's happening there is that in cs_CZ locale,
> >> "V" doesn't follow simple ASCII sort ordering.
>
> > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > other way around. V is not special in any way.
>
> Ah, sorry, I should have researched a bit instead of relying on
> fading memory.  The quirk I was thinking of is that in cs_CZ,
> "ch" sorts after "h":
>
> u8=# select 'h' < 'ch'::text collate "en_US";
>  ?column?
> ----------
>  f
> (1 row)
>
> u8=# select 'h' < 'ch'::text collate "cs_CZ";
>  ?column?
> ----------
>  t
> (1 row)
>
> Regular hex encoding isn't bitten by that because it doesn't
> use 'h' in the text form ... but this base32hex thingie does.
>
> However, your point is also correct:
>
> u8=# select '0' < 'C'::text ;
>  ?column?
> ----------
>  t
> (1 row)
>
> u8=# select '0' < 'C'::text collate "cs_CZ";
>  ?column?
> ----------
>  f
> (1 row)
>
> and that breaks "text ordering matches numeric ordering"
> for both traditional hex and base32hex.  So maybe this
> is not as big a deal as I first thought.  We need a fix
> for the new test though.  Probably adding COLLATE "C"
> would be enough.

Thank you for the report and the analysis.

I've reproduced the issue with "cs_CZ" collation and adding COLLATE
"C" to the query resolves it. It seems also a good idea to add a note
in the documentation too as users might face the same issue. For
example,

To maintain the lexicographical sort order of the encoded data, ensure
that the text is sorted using the C collation (e.g., using COLLATE
"C"). Natural language collations may sort characters differently and
break the ordering.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 01:09                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 01:21                         ` Masahiko Sawada <[email protected]>
  2026-03-26 03:13                           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-26 01:21 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <[email protected]> wrote:
>
> On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
> >
> > Tomas Vondra <[email protected]> writes:
> > > On 3/26/26 00:40, Tom Lane wrote:
> > >> I believe what's happening there is that in cs_CZ locale,
> > >> "V" doesn't follow simple ASCII sort ordering.
> >
> > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > other way around. V is not special in any way.
> >
> > Ah, sorry, I should have researched a bit instead of relying on
> > fading memory.  The quirk I was thinking of is that in cs_CZ,
> > "ch" sorts after "h":
> >
> > u8=# select 'h' < 'ch'::text collate "en_US";
> >  ?column?
> > ----------
> >  f
> > (1 row)
> >
> > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> >  ?column?
> > ----------
> >  t
> > (1 row)
> >
> > Regular hex encoding isn't bitten by that because it doesn't
> > use 'h' in the text form ... but this base32hex thingie does.
> >
> > However, your point is also correct:
> >
> > u8=# select '0' < 'C'::text ;
> >  ?column?
> > ----------
> >  t
> > (1 row)
> >
> > u8=# select '0' < 'C'::text collate "cs_CZ";
> >  ?column?
> > ----------
> >  f
> > (1 row)
> >
> > and that breaks "text ordering matches numeric ordering"
> > for both traditional hex and base32hex.  So maybe this
> > is not as big a deal as I first thought.  We need a fix
> > for the new test though.  Probably adding COLLATE "C"
> > would be enough.
>
> Thank you for the report and the analysis.
>
> I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> "C" to the query resolves it. It seems also a good idea to add a note
> in the documentation too as users might face the same issue. For
> example,
>
> To maintain the lexicographical sort order of the encoded data, ensure
> that the text is sorted using the C collation (e.g., using COLLATE
> "C"). Natural language collations may sort characters differently and
> break the ordering.
>

Attached the patch doing the above idea.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] 0001-Fix-UUID-sortability-tests-in-base32hex-encoding.patch (3.6K, 2-0001-Fix-UUID-sortability-tests-in-base32hex-encoding.patch)
  download | inline diff:
From a64f3f64a9f04c1f5da9a51fe760c40480585fd4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Wed, 25 Mar 2026 18:07:11 -0700
Subject: [PATCH] Fix UUID sortability tests in base32hex encoding.

The recently added test for base32hex encoding of UUIDs failed on
buildfarm member hippopotamus using natural language locales (such as
cs_CZ). This happened because those collations may sort characters
differently, which breaks the strict byte-wise lexicographical
ordering expected by base32hex encoding.

This commit fixes the regression tests by explicitly using the C
collation. Additionally, add a note to the documentation to warm users
that they must use the C collation if they want to maintain the
lexicographical sort order of the encoded data.

Per buildfarm member hippopotamus.

Analyzed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
 doc/src/sgml/func/func-binarystring.sgml | 9 +++++++++
 src/test/regress/expected/uuid.out       | 7 +++++--
 src/test/regress/sql/uuid.sql            | 7 +++++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 0aaf9bc68f1..2ad2cdbea82 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -790,6 +790,15 @@
        produces a 26-character string compared to the standard 36-character
        UUID representation.
       </para>
+
+      <note>
+       <para>
+        To maintain the lexicographical sort order of the encoded data,
+        ensure that the text is sorted using the C collation
+        (e.g., using <literal>COLLATE "C"</literal>). Natural language
+        collations may sort characters differently and break the ordering.
+       </para>
+      </note>
      </listitem>
     </varlistentry>
 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 142c529e693..9c5dda9e9ab 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -236,8 +236,11 @@ SELECT array_agg(id ORDER BY guid_field) FROM guid3;
  {1,2,3,4,5,6,7,8,9,10,11,12}
 (1 row)
 
--- make sure base32hex encoding works with UUIDs and preserves ordering
-SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+-- Test base32hex encoding of UUIDs and its lexicographical sorting property.
+-- COLLATE "C" is required to prevent buildfarm failures in non-C locales
+-- where natural language collations (such as cs_CZ) would break strict
+-- byte-wise ordering.
+SELECT array_agg(id ORDER BY guid_encoded COLLATE "C") FROM guid3;
           array_agg           
 ------------------------------
  {1,2,3,4,5,6,7,8,9,10,11,12}
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index f2ff00f5ddd..8cc2ad40614 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -122,8 +122,11 @@ INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
 INSERT INTO guid3 (guid_field) VALUES ('ffffffff-ffff-ffff-ffff-ffffffffffff'::uuid);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
--- make sure base32hex encoding works with UUIDs and preserves ordering
-SELECT array_agg(id ORDER BY guid_encoded) FROM guid3;
+-- Test base32hex encoding of UUIDs and its lexicographical sorting property.
+-- COLLATE "C" is required to prevent buildfarm failures in non-C locales
+-- where natural language collations (such as cs_CZ) would break strict
+-- byte-wise ordering.
+SELECT array_agg(id ORDER BY guid_encoded COLLATE "C") FROM guid3;
 
 -- Check the timestamp offsets for v7.
 --
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 01:09                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 01:21                         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 03:13                           ` Masahiko Sawada <[email protected]>
  0 siblings, 0 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-26 03:13 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Andrey Borodin <[email protected]>

On Wed, Mar 25, 2026 at 6:21 PM Masahiko Sawada <[email protected]> wrote:
>
> On Wed, Mar 25, 2026 at 6:09 PM Masahiko Sawada <[email protected]> wrote:
> >
> > On Wed, Mar 25, 2026 at 5:35 PM Tom Lane <[email protected]> wrote:
> > >
> > > Tomas Vondra <[email protected]> writes:
> > > > On 3/26/26 00:40, Tom Lane wrote:
> > > >> I believe what's happening there is that in cs_CZ locale,
> > > >> "V" doesn't follow simple ASCII sort ordering.
> > >
> > > > With cs_CZ all letters sort *before* numbers, while in en_US it's the
> > > > other way around. V is not special in any way.
> > >
> > > Ah, sorry, I should have researched a bit instead of relying on
> > > fading memory.  The quirk I was thinking of is that in cs_CZ,
> > > "ch" sorts after "h":
> > >
> > > u8=# select 'h' < 'ch'::text collate "en_US";
> > >  ?column?
> > > ----------
> > >  f
> > > (1 row)
> > >
> > > u8=# select 'h' < 'ch'::text collate "cs_CZ";
> > >  ?column?
> > > ----------
> > >  t
> > > (1 row)
> > >
> > > Regular hex encoding isn't bitten by that because it doesn't
> > > use 'h' in the text form ... but this base32hex thingie does.
> > >
> > > However, your point is also correct:
> > >
> > > u8=# select '0' < 'C'::text ;
> > >  ?column?
> > > ----------
> > >  t
> > > (1 row)
> > >
> > > u8=# select '0' < 'C'::text collate "cs_CZ";
> > >  ?column?
> > > ----------
> > >  f
> > > (1 row)
> > >
> > > and that breaks "text ordering matches numeric ordering"
> > > for both traditional hex and base32hex.  So maybe this
> > > is not as big a deal as I first thought.  We need a fix
> > > for the new test though.  Probably adding COLLATE "C"
> > > would be enough.
> >
> > Thank you for the report and the analysis.
> >
> > I've reproduced the issue with "cs_CZ" collation and adding COLLATE
> > "C" to the query resolves it. It seems also a good idea to add a note
> > in the documentation too as users might face the same issue. For
> > example,
> >
> > To maintain the lexicographical sort order of the encoded data, ensure
> > that the text is sorted using the C collation (e.g., using COLLATE
> > "C"). Natural language collations may sort characters differently and
> > break the ordering.
> >
>
> Attached the patch doing the above idea.

Pushed the fix without the documentation changes to make the buildfarm
animals happy first.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 16:37                       ` Andrey Borodin <[email protected]>
  2 siblings, 0 replies; 26+ messages in thread

From: Andrey Borodin @ 2026-03-26 16:37 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>; Sergey Prokhorenko <[email protected]>



> On 26 Mar 2026, at 04:40, Tom Lane <[email protected]> wrote:
> 
> I wonder whether this discovery puts enough of a hole in the
> value-proposition for base32hex that we should just revert
> this patch altogether.  

After thinking more about it, I do not see grounds for reverting.

> "It works except in some locales"

It works per RFC. It adds value. It's documented precisely.

Sortability in cs_CZ only stands in a way for this format to become "the one UUID format to rule them all" instead of canonical in the future.
We should let IETF WG know that digits and letters are not always ordered as they expect. Hopefully Sergey will handle this.

BTW, thanks to Alexander and Masahiko for pushing this to finish line! I'm listed as author, but they done 99.9% of work on making this functionality.


Best regards, Andrey Borodin.




^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 17:30                       ` Masahiko Sawada <[email protected]>
  2026-03-26 17:59                         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Andrey Borodin <[email protected]>
  2 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-26 17:30 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Tom Lane <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Thu, Mar 26, 2026 at 1:32 AM Andrey Borodin <[email protected]> wrote:
>
>
>
> > On 26 Mar 2026, at 04:40, Tom Lane <[email protected]> wrote:
> >
> > I wonder whether this discovery puts enough of a hole in the
> > value-proposition for base32hex that we should just revert
> > this patch altogether.  "It works except in some locales"
> > isn't a very appetizing prospect, so the whole idea is starting
> > to feel more like a foot-gun than a widely-useful feature.
>
> To be precise, this discovery cast shadows on argument "[base32hex is ]lexicographically sortable format that preserves temporal ordering for UUIDv7". And, actually, any UUID. But I do not think it invalidates the argument completely.
>
> It's taken from RFC[0], actually, that states:
>  One property with this alphabet, which the base64 and base32
>  alphabets lack, is that encoded data maintains its sort order when
>  the encoded data is compared bit-wise.
>
>
> RFC does not give any other benefits.
> Personally, I like that it's compact, visually better than base64, and RFC-compliant.
> And IMO argument "base32hex is lexicographically sortable format that preserves ordering for UUID in C locale" is still very strong.
> Though, there's a little footy shooty in last 3 words.

Yeah, I still find that base32hex is useful.

As I mentioned in another email, I think we should make a note the
fact that "base32hex is lexicographically sortable format that
preserves ordering for UUID in C locale" in the documentation. I've
attached the patch. Feedback is very welcome.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] v1-0001-doc-Add-note-about-collation-requirements-for-bas.patch (1.6K, 2-v1-0001-doc-Add-note-about-collation-requirements-for-bas.patch)
  download | inline diff:
From 515c666b60f7f81f6b2a004ebfb91b358188470c Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Thu, 26 Mar 2026 10:17:23 -0700
Subject: [PATCH v1] doc: Add note about collation requirements for base32hex
 sortability.

While fixing the base32hex UUID sortability test in commit 89210037a0a,
it turned out that the expected lexicographical order is only maintained
under the C collation (or an equivalent byte-wise collation).

Since this is not just a testing quirk but could be a real trap users
might fall into when sorting encoded data in their databases, we added
a note to the documentation to make this requirement explicitly clear.

Reviewed-by:
Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 0aaf9bc68f1..9f731d7bca0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -790,6 +790,14 @@
        produces a 26-character string compared to the standard 36-character
        UUID representation.
       </para>
+      <note>
+       <para>
+        To maintain the lexicographical sort order of the encoded data,
+        ensure that the text is sorted using the C collation
+        (e.g., using <literal>COLLATE "C"</literal>). Natural language
+        collations may sort characters differently and break the ordering.
+       </para>
+      </note>
      </listitem>
     </varlistentry>
 
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 17:30                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-26 17:59                         ` Andrey Borodin <[email protected]>
  2026-03-26 22:01                           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Andrey Borodin @ 2026-03-26 17:59 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: Tom Lane <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>



> On 26 Mar 2026, at 22:30, Masahiko Sawada <[email protected]> wrote:
> 
>  Feedback is very welcome.

The patch is fine from my POV.

Please consider these small improvements to the patch. Basically, we reference to formula stated by RFC where possible.
0001 is intact.


Best regards, Andrey Borodin.



Attachments:

  [application/octet-stream] v2-0001-doc-Add-note-about-collation-requirements-for-bas.patch (1.6K, 2-v2-0001-doc-Add-note-about-collation-requirements-for-bas.patch)
  download | inline diff:
From a6d2896079aef1885d34e9cb47da80c302987056 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Thu, 26 Mar 2026 10:17:23 -0700
Subject: [PATCH v2 1/2] doc: Add note about collation requirements for
 base32hex sortability.

While fixing the base32hex UUID sortability test in commit 89210037a0a,
it turned out that the expected lexicographical order is only maintained
under the C collation (or an equivalent byte-wise collation).

Since this is not just a testing quirk but could be a real trap users
might fall into when sorting encoded data in their databases, we added
a note to the documentation to make this requirement explicitly clear.

Reviewed-by:
Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 0aaf9bc68f1..9f731d7bca0 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -790,6 +790,14 @@
        produces a 26-character string compared to the standard 36-character
        UUID representation.
       </para>
+      <note>
+       <para>
+        To maintain the lexicographical sort order of the encoded data,
+        ensure that the text is sorted using the C collation
+        (e.g., using <literal>COLLATE "C"</literal>). Natural language
+        collations may sort characters differently and break the ordering.
+       </para>
+      </note>
      </listitem>
     </varlistentry>
 
-- 
2.51.2



  [application/octet-stream] v2-0002-Small-improvements.patch (1.7K, 3-v2-0002-Small-improvements.patch)
  download | inline diff:
From 6a6f4cc5bc2910c9f22c268d1f10a7ac407f6e05 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <[email protected]>
Date: Thu, 26 Mar 2026 22:55:55 +0500
Subject: [PATCH v2 2/2] Small improvements

---
 doc/src/sgml/func/func-binarystring.sgml | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 9f731d7bca0..dc6b7e57ea7 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -778,14 +778,14 @@
        <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
        RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
        (<literal>0</literal>-<literal>9</literal> and
-       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
-       sort order of the encoded data. The <function>encode</function> function
+       <literal>A</literal>-<literal>V</literal>) which preserves the sort order of
+       the encoded data when compared byte-wise. The <function>encode</function> function
        produces output padded with <literal>'='</literal>, while <function>decode</function>
        accepts both padded and unpadded input. Decoding is case-insensitive and ignores
        whitespace characters.
       </para>
       <para>
-       This format is useful for encoding UUIDs in a compact, sortable format:
+       This format is useful for encoding UUIDs in a compact, byte-wise sortable format:
        <literal>rtrim(encode(uuid_value::bytea, 'base32hex'), '=')</literal>
        produces a 26-character string compared to the standard 36-character
        UUID representation.
-- 
2.51.2



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 17:30                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 17:59                         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Andrey Borodin <[email protected]>
@ 2026-03-26 22:01                           ` Masahiko Sawada <[email protected]>
  2026-03-27 19:16                             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  0 siblings, 1 reply; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-26 22:01 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Tom Lane <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Thu, Mar 26, 2026 at 10:59 AM Andrey Borodin <[email protected]> wrote:
>
>
>
> > On 26 Mar 2026, at 22:30, Masahiko Sawada <[email protected]> wrote:
> >
> >  Feedback is very welcome.
>
> The patch is fine from my POV.
>
> Please consider these small improvements to the patch. Basically, we reference to formula stated by RFC where possible.
> 0001 is intact.

Thank you for the suggestion. It looks good to me.

I've merged these patches and am going to push barring any objections.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Attachments:

  [text/x-patch] v2-0001-doc-Clarify-collation-requirements-for-base32hex-.patch (2.8K, 2-v2-0001-doc-Clarify-collation-requirements-for-base32hex-.patch)
  download | inline diff:
From 35d321a9e3216052c917b4d1a61b93ecb1414e42 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Thu, 26 Mar 2026 10:17:23 -0700
Subject: [PATCH v2] doc: Clarify collation requirements for base32hex
 sortability.

While fixing the base32hex UUID sortability test in commit
89210037a0a, it turned out that the expected lexicographical order is
only maintained under the C collation (or an equivalent byte-wise
collation). Natural language collations may employ different rules,
breaking the sortability.

This commit updates the documentation to explicitly state that
base32hex is "byte-wise sortable", ensuring users do not fall into the
trap of using natural language collations when querying their encoded
data.

Co-Authored-by: Masahiko Sawada <[email protected]>
Co-Authored-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/CAD21AoAwX1D6baSGuQXm0mzPXPWB07kgaoaaahjNHHenbdY24A@mail.gmail.com
---
 doc/src/sgml/func/func-binarystring.sgml | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 0aaf9bc68f1..dc6b7e57ea7 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -778,18 +778,26 @@
        <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-7">
        RFC 4648 Section 7</ulink>.  It uses the extended hex alphabet
        (<literal>0</literal>-<literal>9</literal> and
-       <literal>A</literal>-<literal>V</literal>) which preserves the lexicographical
-       sort order of the encoded data. The <function>encode</function> function
+       <literal>A</literal>-<literal>V</literal>) which preserves the sort order of
+       the encoded data when compared byte-wise. The <function>encode</function> function
        produces output padded with <literal>'='</literal>, while <function>decode</function>
        accepts both padded and unpadded input. Decoding is case-insensitive and ignores
        whitespace characters.
       </para>
       <para>
-       This format is useful for encoding UUIDs in a compact, sortable format:
+       This format is useful for encoding UUIDs in a compact, byte-wise sortable format:
        <literal>rtrim(encode(uuid_value::bytea, 'base32hex'), '=')</literal>
        produces a 26-character string compared to the standard 36-character
        UUID representation.
       </para>
+      <note>
+       <para>
+        To maintain the lexicographical sort order of the encoded data,
+        ensure that the text is sorted using the C collation
+        (e.g., using <literal>COLLATE "C"</literal>). Natural language
+        collations may sort characters differently and break the ordering.
+       </para>
+      </note>
      </listitem>
     </varlistentry>
 
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 26+ messages in thread

* Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
  2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-02-18 14:57 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-14 04:10   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-18 11:14     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-18 17:52       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-19 14:36         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Dagfinn Ilmari Mannsåker <[email protected]>
  2026-03-19 21:33           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-20 13:02             ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-20 14:24               ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 01:17                 ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-24 15:31                   ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
  2026-03-24 17:26                     ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 17:30                       ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
  2026-03-26 17:59                         ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Andrey Borodin <[email protected]>
  2026-03-26 22:01                           ` Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Masahiko Sawada <[email protected]>
@ 2026-03-27 19:16                             ` Masahiko Sawada <[email protected]>
  0 siblings, 0 replies; 26+ messages in thread

From: Masahiko Sawada @ 2026-03-27 19:16 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: Tom Lane <[email protected]>; Aleksander Alekseev <[email protected]>; pgsql-hackers; Chao Li <[email protected]>; Dagfinn Ilmari Mannsåker <[email protected]>

On Thu, Mar 26, 2026 at 3:01 PM Masahiko Sawada <[email protected]> wrote:
>
> On Thu, Mar 26, 2026 at 10:59 AM Andrey Borodin <[email protected]> wrote:
> >
> >
> >
> > > On 26 Mar 2026, at 22:30, Masahiko Sawada <[email protected]> wrote:
> > >
> > >  Feedback is very welcome.
> >
> > The patch is fine from my POV.
> >
> > Please consider these small improvements to the patch. Basically, we reference to formula stated by RFC where possible.
> > 0001 is intact.
>
> Thank you for the suggestion. It looks good to me.
>
> I've merged these patches and am going to push barring any objections.
>

Pushed.

As for the original base32hex encoding commit, I'd like to leave a
note that I've changed the patch to use the lookup table for decoding
before the push as it's more efficient.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 26+ messages in thread


end of thread, other threads:[~2026-03-27 19:16 UTC | newest]

Thread overview: 26+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-09 12:19 Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions Aleksander Alekseev <[email protected]>
2026-02-18 14:57 ` Aleksander Alekseev <[email protected]>
2026-03-14 04:10   ` Masahiko Sawada <[email protected]>
2026-03-18 11:14     ` Aleksander Alekseev <[email protected]>
2026-03-18 17:52       ` Masahiko Sawada <[email protected]>
2026-03-19 11:18         ` Aleksander Alekseev <[email protected]>
2026-03-19 12:12           ` Chengxi Sun <[email protected]>
2026-03-19 12:18             ` Aleksander Alekseev <[email protected]>
2026-03-19 14:14               ` Chengxi Sun <[email protected]>
2026-03-19 19:24               ` Masahiko Sawada <[email protected]>
2026-03-19 14:36         ` Dagfinn Ilmari Mannsåker <[email protected]>
2026-03-19 21:33           ` Masahiko Sawada <[email protected]>
2026-03-20 03:06             ` Chengxi Sun <[email protected]>
2026-03-20 13:02             ` Aleksander Alekseev <[email protected]>
2026-03-20 14:24               ` Aleksander Alekseev <[email protected]>
2026-03-24 01:17                 ` Masahiko Sawada <[email protected]>
2026-03-24 15:31                   ` Aleksander Alekseev <[email protected]>
2026-03-24 17:26                     ` Masahiko Sawada <[email protected]>
2026-03-26 01:09                       ` Masahiko Sawada <[email protected]>
2026-03-26 01:21                         ` Masahiko Sawada <[email protected]>
2026-03-26 03:13                           ` Masahiko Sawada <[email protected]>
2026-03-26 16:37                       ` Andrey Borodin <[email protected]>
2026-03-26 17:30                       ` Masahiko Sawada <[email protected]>
2026-03-26 17:59                         ` Andrey Borodin <[email protected]>
2026-03-26 22:01                           ` Masahiko Sawada <[email protected]>
2026-03-27 19:16                             ` Masahiko Sawada <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox