Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrClh-008kD9-1Q for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 10:20:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vrCle-00HPFP-3C for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 10:20:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrCle-00HPFG-1P for pgsql-hackers@lists.postgresql.org; Sat, 14 Feb 2026 10:20:50 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vrClZ-00000000aHS-1RCd for pgsql-hackers@lists.postgresql.org; Sat, 14 Feb 2026 10:20:49 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=ytMv2UX6516LBUjOtBCyG6C5V33/wuEOB9bYZIbfEq4=; b=vYAoNsmsWOLEgwS875BoG7HoUF MPXlVeXdFWbaIKlAFB3GdIQFYW/hhQQEajIkwJg/QftnB29G8xik+rLfpqbKK1hhJq85T4x8Tb1gh LRh+ZUSqI6MPNIR05C26xsaUODwb/I4D9Gizj2cv2SI8b3I1nNpVFt54Se6xqzEVqO+xXCrHHLWtZ mlbD53fiqMDsdAKgfvBYqLtg40AsrjhNfWsFlu0GDy77o2BytdaLbDecX29AXsCFVBnAt4C8ZrHve rN2mFov9+J3KMQvrbvWiOmP3RMlScXNobUsFbTo9p+vIFJDxCMWb9aLBvq1QpDkhbBup8dXM99Gmx ortlPU2w==; Received: from [2409:11:4120:300:66b4:cfff:1642:b1df] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vrClW-001vrV-26; Sat, 14 Feb 2026 10:20:45 +0000 Date: Sat, 14 Feb 2026 19:20:33 +0900 (JST) Message-Id: <20260214.192033.705419152780150580.ishii@postgresql.org> To: andreas@proxel.se Cc: pgsql-hackers@lists.postgresql.org Subject: Re: Questionable description about character sets From: Tatsuo Ishii In-Reply-To: <29fd7c6b-b3cd-4d45-977c-d9ef2f88378a@proxel.se> References: <20260211.185847.1679085676298121526.ishii@postgresql.org> <29fd7c6b-b3cd-4d45-977c-d9ef2f88378a@proxel.se> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Sat_Feb_14_19_20_33_2026_121)--" Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:66b4:cfff:1642:b1df (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ----Next_Part(Sat_Feb_14_19_20_33_2026_121)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit > Wouldn't that make the table very wide? I don't think it would make the table very wide but a little bit wider. So I think adding the character sets information to "Description" column is better. Some of encodings already have the info. See attached patch. > And for e.g. European > character encodings I am not sure it is that useful since most or > maybe even all of them are subsets of unicode, it mostly gets > interesting for encodings which support characters not in unicode, > right? Choosing UTF8 or not is just one of the use cases. I am thinking about the use case in which user wants to continue to use other encodings (e.g. wants to avoid conversion to UTF8). Example: suppose the user has a legacy system in which EUC_JP is used. The data in the system includes JIS X 0201, JIS X 0208 and JIS X 0212, and he wants to make sure that PostgreSQL supports all those character sets in EUC_JP, because some tools does not support JIS X 0212. Only JIS X 0212 and JIS X 0208 are supported. Currently the info (whether JIS X 0212 is supported or not) does not exist anywhere in our docs. It's only in the source code. I think it's better to have the info in our docs so that user does not need to look into the source code. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp ----Next_Part(Sat_Feb_14_19_20_33_2026_121)-- Content-Type: Text/X-Patch; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="v1-0001-doc-Enhance-PostgreSQL-Character-Sets-table.patch" From 98c97f670ce647003ce467a84f81cec0cb463c18 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii Date: Sat, 14 Feb 2026 16:26:01 +0900 Subject: [PATCH v1] doc: Enhance "PostgreSQL Character Sets" table. Previously some of encoding lacked description of coded character sets being used in the encoding. For most of European encoding this is obvious because there's only or few character sets for encoding, but it's not true for some Asian encodings. For example, EUC_JP encoding corresponds to multiple character sets: Namely, JIS X 0201, JIS X 0208 and JIS X 0212. This commit adds the information to "Description" column. Discussion: https://postgr.es/m/20260211.185847.1679085676298121526.ishii%40postgresql.org --- doc/src/sgml/charset.sgml | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 3aabc798012..32c6280489b 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1831,7 +1831,7 @@ ORDER BY c COLLATE ebcdic; EUC_CN - Extended UNIX Code-CN + Extended UNIX Code-CN, GB 2312 Simplified Chinese Yes Yes @@ -1840,7 +1840,7 @@ ORDER BY c COLLATE ebcdic; EUC_JP - Extended UNIX Code-JP + Extended UNIX Code-JP, JIS X 0201, JIS X 0208, JIS X 0212 Japanese Yes Yes @@ -1849,7 +1849,7 @@ ORDER BY c COLLATE ebcdic; EUC_JIS_2004 - Extended UNIX Code-JP, JIS X 0213 + Extended UNIX Code-JP, JIS X 0201, JIS X 0213 Japanese Yes No @@ -1858,7 +1858,7 @@ ORDER BY c COLLATE ebcdic; EUC_KR - Extended UNIX Code-KR + Extended UNIX Code-KR, KS X 1001 Korean Yes Yes @@ -1867,7 +1867,7 @@ ORDER BY c COLLATE ebcdic; EUC_TW - Extended UNIX Code-TW + Extended UNIX Code-TW, CNS 11643 Traditional Chinese, Taiwanese Yes Yes @@ -2056,7 +2056,7 @@ ORDER BY c COLLATE ebcdic; SJIS - Shift JIS + Shift JIS, JIS X 0201, JIS X 0208 Japanese No No -- 2.43.0 ----Next_Part(Sat_Feb_14_19_20_33_2026_121)----