Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wD17i-002WTd-00 for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 14:21:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wD17e-00HLNE-34 for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 14:21:43 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wD13K-00Gbmi-2h for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 14:17:15 +0000 Received: from mail-dy1-x1329.google.com ([2607:f8b0:4864:20::1329]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wCwWn-000000017AL-2qvd for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 09:27:23 +0000 Received: by mail-dy1-x1329.google.com with SMTP id 5a478bee46e88-2d5484aa070so430406eec.0 for ; Wed, 15 Apr 2026 02:27:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776245240; cv=none; d=google.com; s=arc-20240605; b=Cd0h3rIwhlphUCe1iFAlUHVs/hFnIgih+hD7WNGaoJaVuN/F+egewHTByjtJF4Wp+p 1ae8XZmFB+5mAs/9/ukuKHeIeATgRe0OxoBU2xOqjf0VYte0FzJvrVjCwCqUXN+wFFrs bfbMgJ7uRPEhhbvPhdQRiZZ9tVdPYH8AgerTuiASAZwZP7ftytOFjWd98Uiym/JWc+KW HI73cgP6mgSPMSyCG4y6h8V7NnIJS/ah7ND2qU1oa5W+jO/EwGMbrZ6Ury0ICcIR1PsX Cuffywb89O5eoMQLqYgxbk0ypb50hPobVdgdL5iAqcXlso0e3NGTYajQ48nLpD24VxNX B4lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=tR+aMkSRIoXs63a4dsgd4n02QoWow2DixEfces4ei5g=; fh=QvOajeB9dmpl9G2kYqeoMH48w9L0uWUHNLSDu6bdIog=; b=Ss3tT1/uFMCIpra8JJrrfrQXDeysKkEHIm4L8RQD4nFfom9uyJBeMY3PMfBWJReYbW s73jTZ8suT17TQR6Tvi5ukGzjL3nl1Mgvk5c7nM6vzuWHGqD2mtwVkPqxyG9xnmMCuaB vjKufzZIpKPsbz3X2cTCCkVrZxmHLmuIuM2crNSkymGmAUFfKTo2ImqI40TvEJlc+osO Thg9QSIvCszMelQWaMKLAiy1MCoh/TNYCGveTWoWCD8P4KTuWwrMUp7jvRdOJ0MnL1Sy S+3PbykIOcWc8+srhjYOmHH95zMDKFJIpnSQ/zZ9oYQsNVsbFq/uSvgm1xktLukIHWtF E/Aw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776245240; x=1776850040; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tR+aMkSRIoXs63a4dsgd4n02QoWow2DixEfces4ei5g=; b=S+JS8ycF5SPfMENoSKDmoa9SRK3iC+GSOADiW2WnrRNIvpyDJLFO15Vc9VfFvh7ITU sNwbcBid5tyKjQsEFSGkXdDKo0U2RSjNR+RfEvtCR2Bhmeo+/Mv++6ah/Hj4xvt0dwT2 fhMGyeqqoN8VmebINqQ4cOO9JfkkCEG9AFviFhjfTHjO4EHRhBsFuVGfCxndtYUsj64r /LOTlLgTRkvDZeubbZWP3icf3aF3hXw7VtQ6pNBeTk9EktDZLon86kRuoyvWDcRA8iyV UY405COSmWplLzl20gHQAJ9z8eB7p7mVKTIOukLNc6CPxbGq+4+HGKD45ok//A9WCZLU rxwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776245240; x=1776850040; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=tR+aMkSRIoXs63a4dsgd4n02QoWow2DixEfces4ei5g=; b=gBeAYHBGGkgv0VJ1Oe3ZwGhCyzsZkrUxKtZuPaGYsCKT7m0TlmcSw74VvjKm+ZFD9S 1qKKn99P1f8ZxUQ5zihSJOYWJXfGq2AW+1LSVqEJ5EjVHEXpyc8zOqqazdc9HQyWLQA3 +N6v4Q+CGM+ro11rU/AtqoL0sHiebGND/2khHNANOQmf+U49PqqoJ8f5RmU0N038QZwC UyaDqrd0ir7JMBIWSGnNRhaIBmSFKiTYXmBkGHh5r+anBxPENGTVVouVodR/bNd9dcrl g6xwzCi3sHcJsOf0K1Swifl4ST3OayEO2YxVGRfWS4TTp/HEu337zGuQ8tWzyoA5FKkm euIQ== X-Forwarded-Encrypted: i=1; AFNElJ+dR1Wr+ZdsLNDIKv9rL8Xmu4w1hSBaVpA2cVTYVt47Zj0RGvZkD97w6oTfqnjmNJYn/fiiegyhP56mzQso@lists.postgresql.org X-Gm-Message-State: AOJu0Yw96bsCEe3vYPdsyB3vylVehA8grJCdMFhCvOp3HNXl1Ju+ljcx dLMGq5IbufODrKdtHKetxvCJ5P8xwTzG789wSQEWuK4MzctkPGX/0bHIPZyaJdZ1nPmy06X6UC/ TRvvMpGkG1XSYXvl/30NgNrXPle9o0IXAT9/3 X-Gm-Gg: AeBDiesoVVjEUiSuIM6JJGQOsNya0lbgdBKgzs4H0cSvbOeaul9iN/daeOAGE6pTPz5 GE6Jj7QLr39nC9Y0aOBdqByD07v3QaGw7wc29jMFPl/NXTnBooeLYwmGmUOzUwJUWmy8BV1lnQ9 G5rI0R1GdDKDb4qhTn9scpgsmVNDmGjOK8hA78delI8g2zUgtqH5rGPSVXmtJ9g2yFfWuIFWpMW NKjFCXHrP3kdjlQzrJkgVcxc35H7iF3zN7fD/G6OpTC8U4kv8JQDeThEFsD9lgS9n1csS8Xxhli iWoxhWRBlzPDtc1DB03yrFfD166Uu4ciwA1Q0bpyeqmMfHJTe6/+4ZvvVhCzZVwHhEZHRR4DpBI = X-Received: by 2002:a05:693c:2c09:b0:2da:b05a:5a7d with SMTP id 5a478bee46e88-2de769d60abmr365792eec.0.1776245240139; Wed, 15 Apr 2026 02:27:20 -0700 (PDT) MIME-Version: 1.0 References: <20260211.185847.1679085676298121526.ishii@postgresql.org> <29fd7c6b-b3cd-4d45-977c-d9ef2f88378a@proxel.se> <20260214.192033.705419152780150580.ishii@postgresql.org> In-Reply-To: From: Thomas Munro Date: Wed, 15 Apr 2026 21:26:43 +1200 X-Gm-Features: AQROBzCr9wGfs-B6mk0uzOSRee1bINI996wjZClr6F38WgRv3u1hVTk_Es38aWM Message-ID: Subject: Re: Questionable description about character sets To: Tatsuo Ishii Cc: andreas@proxel.se, pgsql-hackers@lists.postgresql.org, Henson Choi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Feb 16, 2026 at 5:35=E2=80=AFPM Thomas Munro wrote: > On Sat, Feb 14, 2026 at 11:20=E2=80=AFPM Tatsuo Ishii wrote: > > > Wouldn't that make the table very wide? > > > > I don't think it would make the table very wide but a little bit > > wider. So I think adding the character sets information to > > "Description" column is better. Some of encodings already have the > > info. See attached patch. If we wanted to follow the SQL standard's terminology, I think we'd call this the "character repertoire". In the standard, a "character set" is the database object representing a repertoire and an encoding of it, or its identifier. But if we put it in the description column, we wouldn't have to name it. Researching the standard led me to src/backend/catalog/information_schema.sql[1]. It currently reports the encoding name as the character set and the repertoire, except s/UTF8/UCS/ for the repertoire. That's the same information as you want to document here. For the character set (in the SQL standard sense), the current view definition seems reasonable given that we don't support CREATE CHARACTER SET or CHARACTER SET generally, and for the character repertoire, the s/UTF8/UCS/ translation makes sense, but you chose to call it "Unicode". Shouldn't those agree? If GB18030 were a valid server encoding, it would surely have to report UCS, like UTF8, since it is also a "Unicode transformation format"[2] (its purpose is to be backwards compatible with legacy 2-byte-per-common-Chinese-character formats while also covering all of Unicode 100% systematically, ie booting stuff they don't often encode into the 3- and 4-byte zone to make room for efficient encoding of stuff they do often encode). So I think that means your new documentation should say UCS (or UNICODE) for that one too. I don't know how other encodings should spell their repertoire though... (CC Henson Choi who might be interested in this topic especially WRT Korean= .) [1] https://www.postgresql.org/docs/current/infoschema-character-sets.html [2] https://en.wikipedia.org/wiki/GB_18030