Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDY0g-0035sW-11 for pgsql-hackers@arkaria.postgresql.org; Fri, 17 Apr 2026 01:28:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wDY0e-007P29-1N for pgsql-hackers@arkaria.postgresql.org; Fri, 17 Apr 2026 01:28:40 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDY0d-007P20-34 for pgsql-hackers@lists.postgresql.org; Fri, 17 Apr 2026 01:28:40 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wDY0b-00000001Oki-1gVJ for pgsql-hackers@lists.postgresql.org; Fri, 17 Apr 2026 01:28:39 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=agp9/Ns+dyNk7Pza10iR5ajgY/29iqYSd1q59TfiBaY=; b=j1zCfTlmdC9t8A9nNo17qMIBhZ FK3g4EOpCmbRCb5H0GMq836NRx3N1rYKZvVmkyv7rsgKgon3G5s3rR2llrl0CAYEzsO/OEcsHmRYp hu+LNKxgg4XlMeIB6XdGWDo388p1fFn2rrPxf1IEVq+WmYqpH7Vbo/raIQrSjJBRz1Uvxgc/m8hXQ 1m6+xggfwACpiTDYIvs7GJSflokSsoEjO04QCpekBzyFu9fYk3TH/tYyM6n51QqrFv/nlBbm4Q7X5 IvVhDzlLWY7mPjypOI/H70Eot9AirtVMc5WXmp5dlZMGr+Yb7kLppTp7HV3nUL2IwdXhxEJLHXCpH USK+hMng==; Received: from [2409:11:4120:300:2ec9:5cca:cb5f:6e86] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wDY0Y-003nXI-1r; Fri, 17 Apr 2026 01:28:36 +0000 Date: Fri, 17 Apr 2026 10:28:24 +0900 (JST) Message-Id: <20260417.102824.927096962510122248.ishii@postgresql.org> To: thomas.munro@gmail.com Cc: andreas@proxel.se, pgsql-hackers@lists.postgresql.org, assam258@gmail.com Subject: Re: Questionable description about character sets From: Tatsuo Ishii In-Reply-To: References: <20260214.192033.705419152780150580.ishii@postgresql.org> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:2ec9:5cca:cb5f:6e86 (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > If we wanted to follow the SQL standard's terminology, I think we'd > call this the "character repertoire". Calling it "character repertoire" works for me. Fortunately the meaning of "character repertoire" in the SQL standard and in other standard (ISO/IEC 2022 or 10646) looks same. > In the standard, a "character > set" is the database object representing a repertoire and an encoding > of it, or its identifier. Yes. Unlike ISO/IEC 2022 or 10646, the SQL standard has no clear distinction between character set (in the sense of ISO/IEC 10646) and encoding. (To me this is quite confusing.) > But if we put it in the description column, > we wouldn't have to name it. Why? > Researching the standard led me to > src/backend/catalog/information_schema.sql[1]. It currently reports > the encoding name as the character set and the repertoire, except > s/UTF8/UCS/ for the repertoire. That's the same information as you > want to document here. For the character set (in the SQL standard > sense), the current view definition seems reasonable given that we > don't support CREATE CHARACTER SET or CHARACTER SET generally, Why? For example, Shouldn't EUC_JP have JIS X 0201, JIS X 0208 and JIS X 0212 as its character repertoire? > and for > the character repertoire, the s/UTF8/UCS/ translation makes sense, but > you chose to call it "Unicode". Shouldn't those agree? I think "UCS" is not a repertoire, but a coded character set. "Unicode" or "Unicode repertoire" [1] is more appropreate, I think. [1] https://www.unicode.org/reports/tr17/tr17-3.html > If GB18030 were a valid server encoding, it would surely have to > report UCS, like UTF8, since it is also a "Unicode transformation > format"[2] (its purpose is to be backwards compatible with legacy > 2-byte-per-common-Chinese-character formats while also covering all of > Unicode 100% systematically, ie booting stuff they don't often encode > into the 3- and 4-byte zone to make room for efficient encoding of > stuff they do often encode). So I think that means your new > documentation should say UCS (or UNICODE) for that one too. Not sure. I heard that the latest GB18030 (GB18030-2022, at this point) does not contain some newer Unicode characters. > I don't > know how other encodings should spell their repertoire though... Need research for me too. Regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp