Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vsAzC-009SH1-0q for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Feb 2026 02:38:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vsAz9-007Ljo-2E for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Feb 2026 02:38:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vsAz9-007Ljg-1J for pgsql-hackers@lists.postgresql.org; Tue, 17 Feb 2026 02:38:47 +0000 Received: from mail-dy1-x1329.google.com ([2607:f8b0:4864:20::1329]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vsAz7-000000017ha-2mdk for pgsql-hackers@lists.postgresql.org; Tue, 17 Feb 2026 02:38:47 +0000 Received: by mail-dy1-x1329.google.com with SMTP id 5a478bee46e88-2baacadad3eso168476eec.2 for ; Mon, 16 Feb 2026 18:38:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771295923; cv=none; d=google.com; s=arc-20240605; b=AgJ46Q0Mt275nprP9BOyIB3DQy8y1lOk/pwnlIkFCsi7G/u7dl3wp/OcqN5kWe5QF3 8+TDvznttUi5ENMcAM5+1ncXqbEsOxKmWSCHM9hobneg+8v3Aex4mO7evAUVlFgm4T6h WisCDpL/QW8kCNwOa/B7De2iOCxeea38ch2dzXjZep0Eb+1nCyvOECOd66nilRx0tco3 OBsfiuf9Da340LJMl71p19b/ZwHb3fM6K0ytZCw7uAc1M4VGokuAYZD+X9uid1JVMdwW RrIIiZXdA2g4FrJ+oTb8B8/ixwDDDrj5VybRu5Z8YKZZ4syfgQ8GZg3MADnNucjZED96 nfag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ak/pu3QOc0MWoPSAkxnXkalKIUogWwzXFsVMc/L3/e4=; fh=VmlfxXiYNTs4odeQFUsKwifdldx8yC2ZsHzRVypf74o=; b=UlfEm/BsLElqsPEVEGsM+WtDRsCAgcDJIwxxdmAoj0mGsG6/XgKp+QhVez9SLi3zyi yYY/OXLnHDYxXAzjC+16BXCpRpG/BIlvJIsnIb3CCIfIvpXw7binsaUjY8qbWsnbzG9J uENz/IQGXNsUHo/vv1Vj99R94aLLNfn6FWHfj/77NmpiNwLs1i6SmH3uFOU0QLPi0TOG 4/lcfCQMaShlMzxWcau+Mr73zt99ajVLlhL+ICXXc4xlVPFlbrEGaFqUv4havC+tFDxI PKNMabehsxRtlTKqvaiFt671zYaPIdZbRCJuvrFPcaQUVkZUpZTVGyC9A3BeuYYGBFbU BPoA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771295923; x=1771900723; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ak/pu3QOc0MWoPSAkxnXkalKIUogWwzXFsVMc/L3/e4=; b=bU6/tomoc1C19fGllOkbxeS4J6vEcc7FoStcOgMF0VcudHVdJqKHzLHUtYbVMj9Zx9 e2VCktDoN/sSTh3f0729BKqdbFEoiMKb9+0gm3q4ZFTkYIn/cqaCwdclGTejbFduxfsr HPsgx6JG9T/LOpMf1X+XdTWbF/zIapkWAU2Rg/8gbpHucNSacAqftT1DqYYuv96yBNzj gfbEL9HxcdSmEqrhcCu8BZXLJauoS0yHM1Q2P9o7cR3/RfJc+TeGFYfiJZ354vcBRaxl ltYpUpKw7IxoJwTootf1woQ4XFlq+K6irKbVuRNNBl4myC7JY6Q3FcWPj9zwwlBZPH/8 6yDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771295923; x=1771900723; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ak/pu3QOc0MWoPSAkxnXkalKIUogWwzXFsVMc/L3/e4=; b=Tvo4MghvdiyFkCL4+e+ZaSh3h1wIMJOwKOGSYU/XxHdFaQ74kP0lNKxj8RE0rKwIya jni+SVFp0r37MNvNfh1j828tcuUkmqOQm6HsxCg7G9eaGYt7yLfAZ0le2F5XRyDKLU7O nUix4xGbYDGnKcKNLVkWFsTZ4MniGiAL6YAm/2sSynqf9Jcbc5EFnyI39G5tliy7/MWd VK3edQdApUnOqOP+6OIgPaYVeMmjU0IDIyWjF2Fuj8s23RBovs0zH2B8JC2J8Sh29lS6 LYhVWyIe60HiqzGday7L7hatIO26RxxWBrIFJgQiZb9p1LadJAl/8Es7lJGbsKZqHbEC 56zw== X-Forwarded-Encrypted: i=1; AJvYcCV3uLy8hAK8MAESgocDpC/n7K2m3ftozl6ducqa5+rUO+mBjLYovp5sTXOtQMti7djgIbGXyxhkrXmED4Tt@lists.postgresql.org X-Gm-Message-State: AOJu0Yw8dCDHCsW4+L79SC/25fjJRYebthBazJ0o0ciOwN9h4cZt7kbI TOixYyD6vSVca4STnE2ZXbdf/2q7yhvG4QMBK7Hg0TO8BYhVUpUz0HlvdYqq3GDdgXdYE2sKHrU 5XscVq5u+u0k/HhOJHmZIRs9GLsMnA9ePipy8 X-Gm-Gg: AZuq6aLEuobg0Pt9Kf3hIDj6G/dBSMar70SLcJ58oSV8tdGnElqofbTIYOeEKE9vSeT cWJkqRWslDg/KXhjTSVujyrxUbzM1SJ41xhzUfPt2u7vkkDaNT5DFrH88W6L5GbzdWvsnHLB8KK d7zUIqm7/qARoWaukK9DNKOcwfgmXlbtWxElOlxj7vBevA1wteemjKB5kDbPYJCjRrO6ht8v8Lh 8QrqDSCJrgnMmV71kxZu4TteWxqdyy7hPzqs0+kW7U0gvl7BXr8KVBkrS2nWaE5eUumnMqTgUNQ EvdrctfMuUlmv4MXApsLGjc59TSvX9sl05Seat8csBSkU22Ge4/IrwRSuI5W5MMA X-Received: by 2002:a05:7300:bc0e:b0:2ba:b16f:8092 with SMTP id 5a478bee46e88-2bab9ec5ef6mr2597862eec.0.1771295923364; Mon, 16 Feb 2026 18:38:43 -0800 (PST) MIME-Version: 1.0 References: <20260211.185847.1679085676298121526.ishii@postgresql.org> <29fd7c6b-b3cd-4d45-977c-d9ef2f88378a@proxel.se> <20260214.192033.705419152780150580.ishii@postgresql.org> In-Reply-To: From: Thomas Munro Date: Tue, 17 Feb 2026 15:38:05 +1300 X-Gm-Features: AaiRm50tEj9ORpPrVHlIov6qkHgSFhiOFJtad8vKgN-LW2j4tBcDXjNLG9fC86Y Message-ID: Subject: Re: Questionable description about character sets To: Nico Williams Cc: Tatsuo Ishii , andreas@proxel.se, pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Feb 16, 2026 at 6:07=E2=80=AFPM Nico Williams wrote: > On Mon, Feb 16, 2026 at 05:35:41PM +1300, Thomas Munro wrote: > > [...]. UTF-16 is > > apparently sometimes preferred to save space in other RDBMSs that can > > do it, but I suppose you could achieve the same size most of the time > > with a scheme like that. [...] > > [Off-topic] I think UTF-16 yielding smaller encodings is a truism. It > really depends on what language the text is mostly written in, but > mostly it's a truism that's not true. Anyways, UTF-16 has to go away, > and the sooner the better. But when it's true for your language and that's what your database holds, then it's true all the time, and it's not just outliers, we're talking about nearly all of Asia's languages. That's ... a lot of NAND gates being wasted due to arbitrary choices made probably before UTF-8 even existed. I do agree with you that UTF-16 has turned out to be an odd beast, though, not big enough but also too big. Maybe it's only just right for CJK (or CJ?). I don't see much chance at all of anyone retro-fitting UTF-16 into PostgreSQL anyway, so I wouldn't worry about that. I could more easily see us figuring out how to drop the requirement for high bits in multi-byte sequence tails so that GB18030 could be used to store two-byte Chinese (while also retaining full access to all of Unicode as it does), and I was basically wondering out loud if Japan might be hiding something like that somewhere and imagining what it might look like.