public inbox for [email protected]  
help / color / mirror / Atom feed
From: Henson Choi <[email protected]>
To: Thomas Munro <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Tatsuo Ishii <[email protected]>
Subject: Re: Experimenting with wider Unicode storage
Date: Thu, 16 Apr 2026 10:23:32 +0900
Message-ID: <CAAAe_zANMo3o280YU96Nt=JK=mq=PfygvgT1GnG=7Wuh+Es1GQ@mail.gmail.com> (raw)
In-Reply-To: <CA+hUKG+VEg7OsbRNbRcakp2k+078PCDhZ6HUJjvGvJ839ivxDQ@mail.gmail.com>
References: <CA+hUKG+VEg7OsbRNbRcakp2k+078PCDhZ6HUJjvGvJ839ivxDQ@mail.gmail.com>

Hi Thomas,

Thank you for sharing this very interesting and creative approach.
Encoding is indeed a crucial factor in capacity planning and
performance benchmarking — I find this direction quite compelling.

I'm currently working on a few other things, so my responses may not
always be quick, but I wanted to let you know I'm genuinely
interested in following this work.

As it happens, I'm currently collaborating with Ishii-san — who, as
you know, is one of the original architects of multibyte/CJK support
in PostgreSQL — on Row Pattern Recognition; that might also be a
thread worth keeping an eye on.

It also strikes me that this is a topic worth considering in the
context of the rapid growth of SNS and AI-generated data. The
pervasive use of emoji — which cannot be represented in legacy
encodings like EUC-KR at all — is in fact accelerating the migration
toward Unicode in Korea and other Asian markets. This makes the
storage efficiency of Unicode for CJK characters an increasingly
practical concern, not just a theoretical one.

I'd like to take some time to analyze the current situation around
character encoding in Korea — where both EUC-KR legacy systems and
UTF-8 coexist in complex ways — review the patches you've attached,
and then share some thoughts and feedback.

Best regards,
Henson


view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Experimenting with wider Unicode storage
  In-Reply-To: <CAAAe_zANMo3o280YU96Nt=JK=mq=PfygvgT1GnG=7Wuh+Es1GQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox