Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDBTt-002gbq-0e for pgsql-hackers@arkaria.postgresql.org; Thu, 16 Apr 2026 01:25:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wDBTs-0037Bp-1e for pgsql-hackers@arkaria.postgresql.org; Thu, 16 Apr 2026 01:25:20 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDBSS-0033Dl-0F for pgsql-hackers@lists.postgresql.org; Thu, 16 Apr 2026 01:23:52 +0000 Received: from mail-pf1-x42e.google.com ([2607:f8b0:4864:20::42e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wDBSK-00000001Ek8-0v2M for pgsql-hackers@lists.postgresql.org; Thu, 16 Apr 2026 01:23:46 +0000 Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-82f431c0ab6so1817637b3a.0 for ; Wed, 15 Apr 2026 18:23:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776302624; cv=none; d=google.com; s=arc-20240605; b=D+gvA7QRhVnvzGLIx/t6Q9okHkqgJgVM6xGOANp7ZTmVlhzGVP7P2aMorMovc7W9pg 7jwn/pwRf+9aQIKglarnjDCVP9olxJ9d9HNVnp3BeLD9gsPYnanjwYAdW1zstV40Ao4R R9pirRm9/Q339iSZkD8OYm9kBndLivY8WA/rKuXlvzfPOcvGdtsR9AdSHXZ+x3OQmioP XK4cQ/JbAjtHNAHk9PzJUd/dBIe8veNXVqB92KQpcukXx1JkNTDSXDzcmLVQ6DNzUNX1 87w5ZPtMwFcCEProA2nJ+iotBDrUFf/MuN59SgdnelW4ZUl4bQYu2C2oWjL8CCq4lr85 8Tbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=hOg+jeNHxX2m/1tCEZuLBMvfrsoz6D5EkH4HmXm7zc4=; fh=xxJZ3twHezo3mDJqgnqk+XQH1JCpUv4/jyrCCOMsXSQ=; b=ABSAhFvAi74hKglZaPtSD8MFshLaZ7fAU9y6o+obzQZX4/Ee5li760A73l9spUldL4 cJSrpsDlmv6QDR0I6XhS+AmvbKeMQeZtdzEnZMq4diUFXNSewwt8c5HUyCFaZy99Wrvt tTz0SWaFsP5NwQCGH/8j96G65+oxWAcISJkxhYcXUDvAyCQhKrJTeRZdmrHT2/zCPOCa Coq5QypYgOCL2GQJeu4CWFJ33FHxZSG9bWGEXmwEfrqhjPyxRaQ360DvLsvQRubfIkES TX4PG7jPgBYX+LfqvCElFa6VKW6NJgfiY9RMeejK1Q/eO41+DHXvb0arjzgtjzFofCu7 8UMw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776302624; x=1776907424; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hOg+jeNHxX2m/1tCEZuLBMvfrsoz6D5EkH4HmXm7zc4=; b=Nv8f5KR+Kc1WNS15uQTNRxjG8vp3d7h/6eIpTQLlq5LKSRS/6WSouqvkCK8KL/Meik jdCvzAyzBdF9Gy6067QWPKoeM1MHCST23DpDumV5XSHgbyQTbqtHgibf+paVStKN8y2S DmL/HdMkLOh22wd/9D8iyJ9OEQWVj+9BhQLWLoeWGMcu6jMACA7s1F0GOCUgSmXvghab txGl3h8XMrjDSqcSrfSfnEjE0hO9geXYJ8ooELUN3A32SlJB8vFXf14JzcujBRXaJxb/ FsHHw6dWm4Nh21cIgi7bzC6/b2HH53VkSZSptkvVpUfGDlFfviF1SQpBYiHCsWEDwki7 K1Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776302624; x=1776907424; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hOg+jeNHxX2m/1tCEZuLBMvfrsoz6D5EkH4HmXm7zc4=; b=WHNGmygkVAXR1z28TTUIyE9XGjotk4WZjQxattl/JCSfNOfAIVZ1HhQt4UF1hpVhf5 W6f2C0xKJ+nmzFI0nfcHjsWf0ZK4gAZZD9aTI/ZGh83MsYBiKn9FTK4xCsTNZ0zYsssY cUZjil9gcgM4w9Adm9RKtMs6zuC5ybrgm/hfSemNbDc8KyADdt7dn5DGpW1wUqqm6Zyb XJVtmZOlw6sdGDS5z9aXGVFUqMdmwnVYuR4n3MfBwD6cu30GpjI17HJ3hmMytaVAg1RD L6jGhloeE2UnZT6CmWFEJPRqYVKXs/BatXOVWIXSP+k1fIdlP0NGXo8eRm7MAtO22ORm tcuQ== X-Gm-Message-State: AOJu0YyRUhEQHH15tP7FmoYD4UVURK9Ufuo41tPAQfyMUENg52J/+L/Y IiVtHHNmDoIc4CHmxyQJzJbj1zOylesLZdgy19bllaaalWwwx49cH0HuPFQ8jpXfJ97GeIiXXCm NPlJ2zKwNA5l36b5t8cZ49muc4XXbzpg= X-Gm-Gg: AeBDiev3V/crITkpofdZSfToHBs/aOBytItQH2fZSdUAgQaxQbfEBMQoN7pXB+USSED 6bkaAH21Y6R1vqO4h0ft5dADijrt+83BOn6Az/rZYQFR5bnElCQwkBe5ctNc86C/WzdBWm4wKeG o8iuttvOWAoobOMa9CZcY1q3sYEezx9P6Tsuv4UI4ZFXPoG26wk0r1FrYjmrWIEmmGlPw8z8FNp q10Hw6qHN98Wh/HwrYrGg/l894YqGOLHzT4oA96uOOWVHlt4tCyeVDQJO+d8rw7ByX9EMptzIPj c6cVMdKDUxshA7n/P3IFDaElCAdhc31xHqtpHTVu/lxE5qQQleERJUoL9mT3 X-Received: by 2002:a05:6a20:918c:b0:398:79a8:5bf4 with SMTP id adf61e73a8af0-39fe3f8abfamr24789573637.37.1776302623679; Wed, 15 Apr 2026 18:23:43 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Reply-To: assam258@gmail.com From: Henson Choi Date: Thu, 16 Apr 2026 10:23:32 +0900 X-Gm-Features: AQROBzACMpNzLLsIgROYhaHRxp6Ra-kTdQGlDasu1zR4D3LOGPjWPltmJbaERcA Message-ID: Subject: Re: Experimenting with wider Unicode storage To: Thomas Munro Cc: PostgreSQL Hackers , Tatsuo Ishii Content-Type: multipart/alternative; boundary="000000000000e5e7bd064f89ae53" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e5e7bd064f89ae53 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Thomas, Thank you for sharing this very interesting and creative approach. Encoding is indeed a crucial factor in capacity planning and performance benchmarking =E2=80=94 I find this direction quite compelling. I'm currently working on a few other things, so my responses may not always be quick, but I wanted to let you know I'm genuinely interested in following this work. As it happens, I'm currently collaborating with Ishii-san =E2=80=94 who, as you know, is one of the original architects of multibyte/CJK support in PostgreSQL =E2=80=94 on Row Pattern Recognition; that might also be a thread worth keeping an eye on. It also strikes me that this is a topic worth considering in the context of the rapid growth of SNS and AI-generated data. The pervasive use of emoji =E2=80=94 which cannot be represented in legacy encodings like EUC-KR at all =E2=80=94 is in fact accelerating the migratio= n toward Unicode in Korea and other Asian markets. This makes the storage efficiency of Unicode for CJK characters an increasingly practical concern, not just a theoretical one. I'd like to take some time to analyze the current situation around character encoding in Korea =E2=80=94 where both EUC-KR legacy systems and UTF-8 coexist in complex ways =E2=80=94 review the patches you've attached, and then share some thoughts and feedback. Best regards, Henson --000000000000e5e7bd064f89ae53 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Thomas,

Thank = you for sharing this very interesting and creative approach.
Encoding is= indeed a crucial factor in capacity planning and
performance benchmarki= ng =E2=80=94 I find this direction quite compelling.

I'm current= ly working on a few other things, so my responses may not
always be quic= k, but I wanted to let you know I'm genuinely
interested in followin= g this work.

As it happens, I'm currently collaborating with Ish= ii-san =E2=80=94 who, as
you know, is one of the original architects of = multibyte/CJK support
in PostgreSQL =E2=80=94 on Row Pattern Recognition= ; that might also be a
thread worth keeping an eye on.

It also st= rikes me that this is a topic worth considering in the
context of the ra= pid growth of SNS and AI-generated data. The
pervasive use of emoji =E2= =80=94 which cannot be represented in legacy
encodings like EUC-KR at al= l =E2=80=94 is in fact accelerating the migration
toward Unicode in Kore= a and other Asian markets. This makes the
storage efficiency of Unicode = for CJK characters an increasingly
practical concern, not just a theoret= ical one.

I'd like to take some time to analyze the current situ= ation around
character encoding in Korea =E2=80=94 where both EUC-KR leg= acy systems and
UTF-8 coexist in complex ways =E2=80=94 review the patch= es you've attached,
and then share some thoughts and feedback.
Best regards,
Henson
--000000000000e5e7bd064f89ae53--