public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: Troels Arvin <[email protected]>
To: [email protected]
Cc: Tom Lane <[email protected]>
Subject: Re: utf8 vs UTF-8
Date: Sat, 18 May 2024 08:01:17 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>

On 5/18/24 07:48, Troels Arvin wrote:
> Hello,
> 
> Tom Lane wrote:
>  >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
>  >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
>  >
>  > On most if not all platforms, both those spellings of the locale names
>  > will be taken as valid.  You might try running "locale -a" to get an
>  > idea of which one is preferred according to your current libc
>  > installation
> 
> "locale -a" on the Ubuntu system outputs this:
> 
>    C
>    C.utf8
>    en_US.utf8
>    POSIX

If you expand that to locale -v -a you get:

locale: en_US.utf8      archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
     title | English locale for the USA
    source | Free Software Foundation, Inc.
   address | https://www.gnu.org/software/libc/
     email | [email protected]
  language | American English
territory | United States
  revision | 1.0
      date | 2000-06-24
   codeset | UTF-8



> So at first, I thought en_US.utf8 would be the most correct locale 
> identifier. However, when I look at Postgres' own databases, they have 
> the slightly different locale string:
> 
>    psql --list | grep -E 'postgres|template'
>    postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>    template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>    template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> 
> Also, when I try to create a database with "en_US.utf8" as locale 
> without specifying a template:
> 
> troels=# create database test4 locale 'en_US.utf8';
> ERROR:  new collation (en_US.utf8) is incompatible with the collation of 
> the template database (en_US.UTF-8)
> HINT:  Use the same collation as in the template database, or use 
> template0 as template.

I'm going to say that is Postgres being exact to a fault.

> 
> Given the locale of Postgres' own databases and Postgres' error message, 
> I'm leaning to en_US.UTF-8 being the most correct locale to use. Because 
> why would Postgres care about it, if utf8/UTF-8 doesn't matter?
> 
> 
>> but TBH, I doubt it's worth worrying about.
> 
> But couldn't there be an issue, if for example the client's locale and 
> the server's locale aren't exactly the same? I'm thinking maybe the 
> client library has to perform unneeded translation of the stream of data 
> to/from the database?



-- 
Adrian Klaver
[email protected]







view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: utf8 vs UTF-8
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox