Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LYo-00BxEI-JP for pgsql-general@arkaria.postgresql.org; Sat, 18 May 2024 15:01:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s8LYo-0077Pq-KN for pgsql-general@arkaria.postgresql.org; Sat, 18 May 2024 15:01:22 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LYo-0077Pi-9d for pgsql-general@lists.postgresql.org; Sat, 18 May 2024 15:01:22 +0000 Received: from fhigh4-smtp.messagingengine.com ([103.168.172.155]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LYk-000tLg-LX for pgsql-general@lists.postgresql.org; Sat, 18 May 2024 15:01:21 +0000 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 8177F11400D6; Sat, 18 May 2024 11:01:18 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Sat, 18 May 2024 11:01:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm3; t=1716044478; x=1716130878; bh=o5/JOpDYIiaB+BAN6D6nBHmlGW95irXZ0aJWnScHIdU=; b= gtZ+OVgCIZ/xuMFt6mzSGVVTE4jMZHJxHwN45Dm04Cn2zJX9eZNuJulgeARuYc1V fk0LrUFO1uFBgwDw/6b1FWIFlLcJ/KgWXJbm8aL7UQXReAlxc3/BP9GsO3vBQbp8 Chk8ANkJc+JxJPiW54VWsS1aty2u8cWn+SP1IaW17gD9WrC8WeII4VVDVQRAX6sk UhvGTS5YiQaTRhNIfZPuwE97PAR0tWqLR2ZRz4eIo9Wrrx4ySSqucj0UmAPXp/RD u4TFDLSZQhnbd7RmBVdjRxWYdGBORvVld/bJhpOpSGer5JapCS4brxXwspiMXYdo KXuasOOoXcFSwHNPSdf2ZA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716044478; x= 1716130878; bh=o5/JOpDYIiaB+BAN6D6nBHmlGW95irXZ0aJWnScHIdU=; b=B rdlAQHhU1c6kGsA4uOUctE0MTvRVjPyL/SsUIm9S+ugDLuY1e2WRig+bQoXeDbIE 5IFrxKABLcFEsYkh6/aPikjGHVW5Mn5HnettY1hSZ+OFNIf6PBGybIaoLOxl0hoc sILqVPfv1UdKXZyLlDX8jknXByX4FfqETNkF2uUkN3RzhpDdxWJamg0pkcLEE0Sr TOYmSSuxnD6yMmMxsQYIFkLGzjqwoH1mZBbWnbnhQ929MwfkYc/iOlE2f9In8KzQ aU9+YyCmUZ/YML0GZKFaE15gUCw8aR1fi3HDbCQs1WK3zH4cCYCmMrB2zhlYpFN9 7Q4WKzfenwk/B5Qns5eYg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdehiedgieejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeetughr ihgrnhcumfhlrghvvghruceorggurhhirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtg homheqnecuggftrfgrthhtvghrnhepffdtjeehffejueefhedtgfdvkeelveeluefhkeei vdevieelueeukeeludelleffnecuffhomhgrihhnpehgnhhurdhorhhgnecuvehluhhsth gvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprggurhhirghnrdhklhgr vhgvrhesrghklhgrvhgvrhdrtghomh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 18 May 2024 11:01:17 -0400 (EDT) Message-ID: Date: Sat, 18 May 2024 08:01:17 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: utf8 vs UTF-8 To: Troels Arvin , pgsql-general@lists.postgresql.org Cc: Tom Lane References: <2388205.1715953909@sss.pgh.pa.us> <89165125-54b6-46a2-9b2c-0a7e275596bf@arvin.dk> Content-Language: en-US From: Adrian Klaver In-Reply-To: <89165125-54b6-46a2-9b2c-0a7e275596bf@arvin.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 5/18/24 07:48, Troels Arvin wrote: > Hello, > > Tom Lane wrote: > >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8 > >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8 > > > > On most if not all platforms, both those spellings of the locale names > > will be taken as valid.  You might try running "locale -a" to get an > > idea of which one is preferred according to your current libc > > installation > > "locale -a" on the Ubuntu system outputs this: > >   C >   C.utf8 >   en_US.utf8 >   POSIX If you expand that to locale -v -a you get: locale: en_US.utf8 archive: /usr/lib/locale/locale-archive ------------------------------------------------------------------------------- title | English locale for the USA source | Free Software Foundation, Inc. address | https://www.gnu.org/software/libc/ email | bug-glibc-locales@gnu.org language | American English territory | United States revision | 1.0 date | 2000-06-24 codeset | UTF-8 > So at first, I thought en_US.utf8 would be the most correct locale > identifier. However, when I look at Postgres' own databases, they have > the slightly different locale string: > >   psql --list | grep -E 'postgres|template' >   postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ... >   template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ... >   template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ... > > Also, when I try to create a database with "en_US.utf8" as locale > without specifying a template: > > troels=# create database test4 locale 'en_US.utf8'; > ERROR:  new collation (en_US.utf8) is incompatible with the collation of > the template database (en_US.UTF-8) > HINT:  Use the same collation as in the template database, or use > template0 as template. I'm going to say that is Postgres being exact to a fault. > > Given the locale of Postgres' own databases and Postgres' error message, > I'm leaning to en_US.UTF-8 being the most correct locale to use. Because > why would Postgres care about it, if utf8/UTF-8 doesn't matter? > > >> but TBH, I doubt it's worth worrying about. > > But couldn't there be an issue, if for example the client's locale and > the server's locale aren't exactly the same? I'm thinking maybe the > client library has to perform unneeded translation of the stream of data > to/from the database? -- Adrian Klaver adrian.klaver@aklaver.com