Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0MPv-001qJu-2M for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 16:28:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0MPs-00A7e1-2v for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 16:28:13 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0MPs-00A7ds-20 for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 16:28:13 +0000 Received: from dverite2024.planet-service.net ([185.16.44.252] helo=mail.verite.pro) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w0MPq-00000002A7U-2Hp0 for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 16:28:12 +0000 Received: by mail.verite.pro (Postfix, from userid 1000) id 734B32C0AB6; Wed, 11 Mar 2026 17:28:08 +0100 (CET) Content-Type: text/plain; charset="iso-8859-15" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: Change initdb default to the builtin collation provider From: "Daniel Verite" To: "Robert Haas" Cc: "Jeff Davis" ,pgsql-hackers@postgresql.org In-Reply-To: Date: Wed, 11 Mar 2026 17:28:05 +0100 Message-Id: X-Mailer: Manitou v1.7.3 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Robert Haas wrote: > To be honest, I'd probably be ready to support making the default > encoding UTF8 regardless of the environment, and you have to use -E > if you want anything else. I think there are still people using > other encodings, but I believe it to be a small minority at this > point. It would be interesting to have the point of view of Asian users about this. Recently, the suggestion to retire GB18030 in favor of UTF-8 was met with the objection that GB18030 was likely preferred by users from China [1]. Another example against UTF-8 that I found notable, is Tatsuo Ishii mentioning that Japanese users tend use --no-locale rather than UTF-8 locales [2]. Also, it's not obvious how initdb could choose an UTF-8 locale regardless of the environment. For instance, let's say it finds LC_ALL=3D"fr_FR.iso885915@euro", what would it do? Maybe look at the UTF-8 locales on the system. Here's a subset of what it would find on my system: C.utf8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 ... tr_TR.utf8 =46rom that kind of list, which locale should it pick and why? Personally I think that ignoring the environment's LC_* for the collations would be fine if we went for builtin/C.UTF-8 by default, as $subject suggests. But the level of enthusiasm for that from the community seems much lower than it would need to be for that kind of change to be acceptable. [1] https://www.postgresql.org/message-id/45b4b689-0e78-4d30-a5f9-1a39d01ab2b7%= 40ww-it.cn [2] https://www.postgresql.org/message-id/20230608.104535.2171011311090815110.t= -ishii%40sranhm.sra.co.jp Best regards, --=20 Daniel V=E9rit=E9=20 https://postgresql.verite.pro/