Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LMZ-00BvlH-73 for pgsql-general@arkaria.postgresql.org; Sat, 18 May 2024 14:48:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s8LMZ-006pfz-3H for pgsql-general@arkaria.postgresql.org; Sat, 18 May 2024 14:48:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LMY-006pfe-Or for pgsql-general@lists.postgresql.org; Sat, 18 May 2024 14:48:42 +0000 Received: from home.borberg.arvin.dk ([194.45.76.153]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s8LMV-000tFl-Ac for pgsql-general@lists.postgresql.org; Sat, 18 May 2024 14:48:41 +0000 Received: from localhost (localhost [127.0.0.1]) by arvin.dk (Postfix) with ESMTP id 6593627B5A; Sat, 18 May 2024 16:48:39 +0200 (CEST) X-Virus-Scanned: amavis at arvin.dk Received: from arvin.dk ([127.0.0.1]) by localhost (arvinserver4.home.borberg.arvin.dk [127.0.0.1]) (amavis, port 10024) with LMTP id TJaymod1AJhG; Sat, 18 May 2024 16:48:36 +0200 (CEST) Received: from [192.168.2.160] (_gateway [192.168.1.1]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by arvin.dk (Postfix) with ESMTPSA id C28AE27B59; Sat, 18 May 2024 16:48:36 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 arvin.dk C28AE27B59 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arvin.dk; s=202311171; t=1716043716; bh=RkWxTS4JELukdo0XPxH/1qQ1kVuVnBpvoytmyntYSqY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=dZI1O0wJV8GgMmI4/DEdDQR4sbrABo8GEyvl43neJJdzKnJADnBpAC5McIuvM9yFJ PRDQ3VvEqWel2gewhnyHXw1lXNRo9nQ5m/YvBoKJ39SA2pTU6XxA3y9Yc+UNWD/WZ3 qXJiPmtfstAJUkHDTqBKTvBNfvQaZpUUbBWr/8tVlYQqhIFX2fW/XWfGYquFu3C0gB NqctS8s/0qIW7KH876uB/FxqpKFvIOkqpVlIOvymif8By5ZAB9sGDgo+BZz/8Z4kL1 ywmfwdcPaI4U19HHKEME//A+s4gjdPDR82VqunojVhVroKXRdlX5Nnt+rjvNyI9T20 fZljG/bLZAp+g== Message-ID: <89165125-54b6-46a2-9b2c-0a7e275596bf@arvin.dk> Date: Sat, 18 May 2024 16:48:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: utf8 vs UTF-8 To: pgsql-general@lists.postgresql.org Cc: Tom Lane References: <2388205.1715953909@sss.pgh.pa.us> Content-Language: en-US From: Troels Arvin In-Reply-To: <2388205.1715953909@sss.pgh.pa.us> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello, Tom Lane wrote: >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8 >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8 > > On most if not all platforms, both those spellings of the locale names > will be taken as valid.  You might try running "locale -a" to get an > idea of which one is preferred according to your current libc > installation "locale -a" on the Ubuntu system outputs this:   C   C.utf8   en_US.utf8   POSIX On a CentOS7 system, it's sort-of the same:   locale -a | grep -i en_us   en_US   en_US.iso88591   en_US.iso885915   en_US.utf8 So at first, I thought en_US.utf8 would be the most correct locale identifier. However, when I look at Postgres' own databases, they have the slightly different locale string:   psql --list | grep -E 'postgres|template'   postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...   template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...   template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ... Also, when I try to create a database with "en_US.utf8" as locale without specifying a template: troels=# create database test4 locale 'en_US.utf8'; ERROR:  new collation (en_US.utf8) is incompatible with the collation of the template database (en_US.UTF-8) HINT:  Use the same collation as in the template database, or use template0 as template. Given the locale of Postgres' own databases and Postgres' error message, I'm leaning to en_US.UTF-8 being the most correct locale to use. Because why would Postgres care about it, if utf8/UTF-8 doesn't matter? > but TBH, I doubt it's worth worrying about. But couldn't there be an issue, if for example the client's locale and the server's locale aren't exactly the same? I'm thinking maybe the client library has to perform unneeded translation of the stream of data to/from the database? -- Kind regards, Troels