Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vDVpu-00AdNp-Ic for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Oct 2025 22:37:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vDVps-009d2K-PK for pgsql-hackers@arkaria.postgresql.org; Mon, 27 Oct 2025 22:37:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vDVpr-009d2A-HF for pgsql-hackers@lists.postgresql.org; Mon, 27 Oct 2025 22:37:07 +0000 Received: from sonic303-25.consmr.mail.gq1.yahoo.com ([98.137.64.206]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vDVpm-004aiU-25 for pgsql-hackers@postgresql.org; Mon, 27 Oct 2025 22:37:06 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.au; s=s2048; t=1761604617; bh=CoJRqTKtf665EVA5yDLU/USG1SFEd6wBQYvOXe0RpqA=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From:Subject:Reply-To; b=IjqF2AH+92YDjbAYdkA+J9RsXJpakZWC+NvK7BOtejBW5uNm5ihhYJn21nkBd6SQ0+K1G244HiOqIJk5biUEGhYplJhm7TKbRqGyKziyxVtVuvK1C+YroY1Oj2+IeCc8ljBy1b/4eRhBaJirY25boK9hRZn63FH5AfavmPKmHhweq9W4anZouuatyx0xTvSqyEIiA6mNGlwxOiMhjyyN6nSxr0xq0Apcn9hS8doM2/nQd6+nddeIIi8oEim0aAZTmxPR61pDH3mc/yWmg155qXNUbbdpS6ElM/VLseCEK2u1kZOkpqkBM5EAhJxFj+w639S0EmlvanYJceC0mVMCmg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1761604617; bh=OswA3zwXDjjmWiSclZZ193WfVTbMItUvC3u7kFjFzFX=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=Uq3mbKKSKF4Q2otwcXADfRRT99pyNNgDSr2drg9P2Ur9AifJ6rmBf3usZzeudmuXxuHN4C4KRatKq7SukqV5PSViCYTQ284jnMcBmpSlgFF75HMUOe4WZrhWw7XGc/vj/tdmfcfzYCbfhGm0+0SNIYO40oZyd0pV40GzW0hPjrvPYLPOFLowZx4tDuhWmRsPb09najN+DAmSk3TeEJfRjjCBYOTota6meOXGXKZC8dFOQdLa9NmcRae4wy8eWHZmAhQ+1tmNaWZ0t59/RfWAznSrJE/rXm2BNppv9cDy+WKCVCD20f8y40aewzoz69L0hYZwZ8qlVltsB1CwSszRvQ== X-YMail-OSG: p4ngRz8VM1k4x_bpoUpPlxLANjcdSnGo9xUJ7.rFmYerqgl9nBz3oTeYrSV0Kt. ot7SGD2mI2bBgXsOyGVkASjcxG6QCpTuR74kG_C7rQUzsaDFv1pJJMjcJUyVA64nwoHYJBB_c7N5 pa2GlC2pdElPgE_cufnMXzLWccTSTfmBQlYugRQjYxqAtXcXOBEKi5IDtyeISTL0wTLQPEuFj0iY zel3Kgz6Yk4lU.DRPLlxQED4vuUlnXD2_wkhQ8CNM99gDR4b8VevBrrBK4d1nVM2OfTV4iUaSn95 U5Yxo0qCLv8nImdjDJxDQnxnhTuI5rqmgh6dfxi_DiRnolROftbQ6kub0JQooL2LNfUGsHPXGIo5 BISd8gKz7wqBc7YO8rjgMAypOk0YBKSVh8qc6vJ1Gxa9dAO_YlAlU3woB3EMBft4h8I7FkBx2qvG iAhfGPby3rC5r2cX4vg16iLjyTjVVb.1r6WEITXEst1VSzxt.5CHFsXF8GtWujYM3E69gra_JKrb _mE5vlfKSVAUaUZvwiEAGlGhAaKvpOCJMn9kPLD5dNnm2dqALZPu7p9ygv7TDlG1Zu8oOzcafRfq 5kdnS5vyMasq9ifMTnHIYzuFmoS9az46TVVFyyaL9IVSiFsnUq_Q_nCzng1R9I6k9gTxRkN.qOzJ I6XmMPbZ5EeQ07JDl.SDQzc8Spbf1e4TfeU8fDq2y80zRFXond2Jlgd6VlSQYnyaRVF4cbgXUTdG ICkYj4x4iRTnd5X3MoZT7wMB8NLue3bzvX.0VZdexWhnmVBStOnkDvpZ1A3fW5HrkF6BXZPX44ml MxOJuCLdSt67gDUNJ4jKLUH01Q8vSD5JTZPUGJMLHKUSjmwCaBOIFTJGkStxI7kMLTZ82OXQxYiO NnRhFmIKPDPgJCbQsU2AO.1yuqdcmioeA8j52FmlH1cP5wFYwoSZ.13CO9gEKKeOXQlrR7rRCcJo HywYSJEldzD.kVD1Hu6TTGsbc4fiMkF4GRVMKO6aHa8vfi.SQVB8moiX0kLV2eAmyMEXHiC9H8ub B4hgs5gY6_TSKCfQPYbv3eAGQpQQ74WLC0LxjI8f6wr71ai.jXal2tgpB96uJ0m6IO6H4OhQFek3 xwaRG3xm0EzP889LdY69AX7CotJZUYmceSGrrpj0ZPcOqgjdNh9QWoiWImax6i3L5l5Xn8Oas1Xs U8kGwLcUtI2gQLaZVHHdBhuAStvB.O2wsYdWMt0Ybx9eQLW26CasLESDQFD7acuahYQ1jCnqz2mT Dg006TfPwvQC.u.J1gqHqWvN8cPCpssAYbhA57aR3FDIH26GdRJ4YBwmfC3Nz9WUUWqFjfYVhuu3 n7fwLozdmcMAWKd38gKFm_iy2KkXXIEbEdZoMVC.0qQkpG5lnklFGWFLTsPJ1GUrMZGidQhIRdfi Lzq.W7lsb9viSONpb4M1u4E6hfgKKF40c5KuqaLHbvRogpqgDR._lRmgq6.xggEiCSs5ex.cpEik ikazCs_LNLJ_1AsgS_v0khz_UiBshkP0HlMsZtbdNyvc_ATUHkpALYlFwxOBF.cDK8g0EJ6SpBnC zTKeLeja0bBJ7dd7hcuPWalpvxBQ7FXK1dH8tDSPY7SKTI03w2xdaA5knQXeC9sPyfGPtkpaPUp1 y4GfxOxsK_PxPDl.asbrRKKpNo3X2LLsHOv2DId3Aj_lkrd27Qj9D7gV58ZjbdYxyKcVF.Oy5dnL PF6ycf8ypV1slNRIXEfIeLEBuVwq4rA_puAugbi6glzRcaWVwxS10mitO.wl8Z9iECe7nKhukfy. j2tNcP22LeEyON1Ri8lpwrG7n4Dgu3MEZtptIMGHu7LFBUX30R.cjRficgeeAwnRE9UE9vVXcQJM ZNtdoACc1wBo9UVNexvf8Sq1PrlMLgsHLo2Bwcq5g7UOZc3ZjoUvYzzZvs3B6X3R.F.ya4fH4Pbp THjR483RnKBSzOaJ8obfpN0nb5qovHB0QwWAKmgSAQi.ZLOAAKl50syz9tH2VJWlWLm4lOcNHHys 08Ffl4vHDsHe2wKNk2xhVLEc6PwZM1BkiTvWiBdxqtP_xkWuCqqrwthUctIltd2.e9Jx.qIowedA 5BAL8THoRNqxrpdbz9CsXVtByl73yYddV_sMROxj.pmgaZcS3rvCmxKahQ2yiQIo5.QlocwuUm_W JhRccLwjdpfRYxcSi3A5aEuet0mHeWjl1u0PlPkIMFW3WKE7ClSx3KdR22Z2yprR8XhYb_QsEKAy yR4hqK6B6Z1zbF6jQnr3s6sPXjv9KKKj4Z6zlzD_Id.F0zPtdTD3h2NaLNfENHcOfl0ppBwFduz6 hWZXPUQA4oes4Rx1.C7YUzlzNpubDkfTxLd2S X-Sonic-MF: X-Sonic-ID: c9725a77-99b3-4785-bf67-417a3a8ed2e4 Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Mon, 27 Oct 2025 22:36:57 +0000 Date: Mon, 27 Oct 2025 22:36:51 +0000 (UTC) From: Sergey Prokhorenko To: Andrey Borodin , Masahiko Sawada Cc: pgsql-hackers Message-ID: <1154454839.957923.1761604611424@mail.yahoo.com> In-Reply-To: References: <1791665551.452444.1761209220211.ref@mail.yahoo.com> <1791665551.452444.1761209220211@mail.yahoo.com> <18022523-0F8F-4C07-AFF5-57DC9086D78E@yandex-team.ru> <1895971769.8343.1761240853939@mail.yahoo.com> <574624399.175025.1761290201491@mail.yahoo.com> <953203149.383019.1761345585325@mail.yahoo.com> <6F76FA61-E2DC-44EF-9504-889D9BDB4EBD@yandex-team.ru> Subject: Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_957922_930109192.1761604611422" X-Mailer: WebService/1.1.24652 YMailNorrin Content-Length: 12658 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------=_Part_957922_930109192.1761604611422 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =20 =20 On Sat, Oct 25, 2025 at 11:07=E2=80=AFAM Andrey Borodin wrote: > > > > > On 25 Oct 2025, at 04:31, Masahiko Sawada wrote= : > > > > Or providing > > 'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format > > text) -> uuid' might make sense too, but I'm not sure. > > I like the idea, so I drafted a prototype for discussion. > Though I do not see what else methods should be provided along with added= one... Thank you for drafting the patch! But I find it potentially confusing to have different encoding methods for bytea and UUID types. I don't see a compelling reason why the core should support base32hex exclusively for the UUID data type, nor why base32hex should be the only encoding method that the core provides for UUIDs (while we can use it by default). If we implement uuid_encode() and uuid_decode(), we might end up creating similar encoding and decoding functions for other data types as well, which doesn't seem like the best approach. I still believe that extending the existing encode() and decode() functions is a better starting point. Regards, --=20 Masahiko Sawada Amazon Web Services: https://aws.amazon.com ________________________________________________ =20 Masahiko, I wanted to highlight an important discussion=C2=A0among the authors and c= ontributors of RFC 9562 regarding UUID text encoding: https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussion= s/17#discussioncomment-10614817 The RFC 9562 authors and contributors=C2=A0reached consensus that standard= izing an alternate short text format for UUIDs is important. While the comm= unity debated between base32hex (RFC 4648) and Crockford's Base32, both wer= e recognized for preserving lexicographical sort order, a critical property= for database primary keys and URL-safe identifiers. Time constraints preve= nted inclusion in RFC 9562, but the discussion established that base32hex i= s the existing standard format already defined in RFC 4648, Section 7, spec= ifically designed for sort-preserving encoding. This context is crucial because it underscores that the uuid type, as a fir= st-class concept, deserves its own standardized text encoding. Regarding the proposal to couple UUID encoding with the bytea type through = encode()/decode() functions: I understand the appeal of reusing existing in= frastructure, but this creates a conceptual mismatch. UUID is a distinct se= mantic type in PostgreSQL, not merely binary data. The bytea type has exist= ed for decades without base32hex encoding, and that's worked fine, because = bytea represents arbitrary binary data, not universally unique identifiers = with specific structural properties and needs. Consider PostgreSQL's own design philosophy. The documentation states: "9.5. Binary String Functions and Operators=C2=A0=C2=A0This section describ= es functions and operators for examining and manipulating binary strings, t= hat is values of type bytea. Many of these are equivalent, in purpose and s= yntax, to the text-string functions described in the previous section." PostgreSQL maintains parallel function sets for text strings and bytea pre= cisely because they serve different purposes, despite the implementation ov= erhead. The uuid type deserves the same treatment: it's not just another bi= nary blob, but a type with specific semantics (uniqueness, version bits, va= riant encoding) and use cases (distributed identifiers, sortable keys, URL-= safe representations). Why should uuid be treated as a second-class citizen and forced through byt= ea conversion, when text and bytea each have their own dedicated function f= amilies? You've been very careful in your previous arguments to separate data type c= onversion from encoding/decoding operations. I appreciate that rigor. Howev= er, the current proposal to route UUID encoding through bytea contradicts t= hat principle. It merges two fundamentally different data types for conveni= ence rather than correctness. If someone wants to add base32hex encoding/decoding to bytea for general b= inary data operations, that's a worthwhile but separate discussion. The uui= d type, however, needs native base32hex support to fulfill its role as a fi= rst-class PostgreSQL type with a standardized compact text representation, = as recommended by the RFC 9562 community. I would value your thoughts on these arguments. Best regards, Sergey Prokhorenko =20 ------=_Part_957922_930109192.1761604611422 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable



=20 =20
On Sat, Oct 25, 2025 at 11:07=E2=80= =AFAM Andrey Borodin <x4mmm@yandex-team.ru> wrote:=
>
>
>
> > On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >= ;
> > Or providing
> > 'uui= d_encode(uuid, format text) -> text' and 'uuid_decode(text, format
> > text) -> uuid' might make sense too, but I'm not = sure.
>
> I like the idea, so I d= rafted a prototype for discussion.
> Though I do not s= ee what else methods should be provided along with added one...

Thank you for drafting the patch! But I find it p= otentially confusing
to have different encoding methods f= or bytea and UUID types. I don't
see a compelling reason = why the core should support base32hex
exclusively for the= UUID data type, nor why base32hex should be the
only enc= oding method that the core provides for UUIDs (while we can
use it by default).

If we implement= uuid_encode() and uuid_decode(), we might end up
creatin= g similar encoding and decoding functions for other data types
as well, which doesn't seem like the best approach. I still believethat extending the existing encode() and decode() function= s is a
better starting point.

Regards,


--
Mas= ahiko Sawada

Amazon Web Services: https:/= /aws.amazon.com

___________________________________________= _____




Masahiko,=

I wanted= to highlight an important discussion among the authors and con= tributors of RFC 9562 regarding UUID text encoding:

https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/disc= ussions/17#discussioncomment-10614817

The = RFC 9562 authors and contributors reached consensus that standardizing an alternate short text format= for UUIDs is important. While the community debated between base32hex (RFC= 4648) and Crockford's Base32, both were recognized for preserving lexicogr= aphical sort order, a critical property for database primary keys and URL-s= afe identifiers. Time constraints prevented inclusion in RFC 9562, but the = discussion established that base32hex is the existing standard format alrea= dy defined in RFC 4648, Section 7, specifically designed for sort-preservin= g encoding.

This context is crucial because it underscores tha= t the uuid type, as a first-class concept, deserves its own standardized te= xt encoding.

Regarding the proposal to couple UUI= D encoding with the bytea type through encode()/decode() functions: I under= stand the appeal of reusing existing infrastructure, but this creates a con= ceptual mismatch. UUID is a distinct semantic type in PostgreSQL, not merel= y binary data. The bytea type has existed for decades without base32hex enc= oding, and that's worked fine, because bytea represents arbitrary binary da= ta, not universally unique identifiers with specific structural properties = and needs.

Consider PostgreSQL's own design philos= ophy. The documentation states:

"9.5. Binary Strin= g Functions and Operators  
This section describes func= tions and operators for examining and manipulating binary strings, that is = values of type bytea. Many of these are equivalent, in purpose and synta= x, to the text-string functions described in the previous section."

PostgreS= QL maintains parallel function sets for text strings and bytea precisely be= cause they serve different purposes, despite the implementation overhead. T= he uuid type deserves the same treatment: it's not just another binary blob= , but a type with specific semantics (uniqueness, version bits, variant enc= oding) and use cases (distributed identifiers, sortable keys, URL-safe repr= esentations).

Why should uuid be treated as a seco= nd-class citizen and forced through bytea conversion, when text and bytea e= ach have their own dedicated function families?

<= /div>
You've been very careful in your previous arguments to separate d= ata type conversion from encoding/decoding operations. I appreciate that ri= gor. However, the current proposal to route UUID encoding through bytea con= tradicts that principle. It merges two fundamentally different data types f= or convenience rather than correctness.

If someone wants to add base32hex encoding/d= ecoding to bytea for general binary data operations, that's a worthwhile bu= t separate discussion. The uuid type, however, needs native base32hex suppo= rt to fulfill its role as a first-class PostgreSQL type with a standardized= compact text representation, as recommended by the RFC 9562 community.

I would = value your thoughts on these arguments.


Best= regards,
Sergey Prokhorenko



















------=_Part_957922_930109192.1761604611422--