public inbox for [email protected]  
help / color / mirror / Atom feed
From: Troels Arvin <[email protected]>
To: [email protected]
Cc: Tom Lane <[email protected]>
Subject: Re: utf8 vs UTF-8
Date: Sat, 18 May 2024 16:48:36 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

Hello,

Tom Lane wrote:
 >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
 >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
 >
 > On most if not all platforms, both those spellings of the locale names
 > will be taken as valid.  You might try running "locale -a" to get an
 > idea of which one is preferred according to your current libc
 > installation

"locale -a" on the Ubuntu system outputs this:

   C
   C.utf8
   en_US.utf8
   POSIX

On a CentOS7 system, it's sort-of the same:

   locale -a | grep -i en_us
   en_US
   en_US.iso88591
   en_US.iso885915
   en_US.utf8

So at first, I thought en_US.utf8 would be the most correct locale 
identifier. However, when I look at Postgres' own databases, they have 
the slightly different locale string:

   psql --list | grep -E 'postgres|template'
   postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
   template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
   template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...

Also, when I try to create a database with "en_US.utf8" as locale 
without specifying a template:

troels=# create database test4 locale 'en_US.utf8';
ERROR:  new collation (en_US.utf8) is incompatible with the collation of 
the template database (en_US.UTF-8)
HINT:  Use the same collation as in the template database, or use 
template0 as template.

Given the locale of Postgres' own databases and Postgres' error message, 
I'm leaning to en_US.UTF-8 being the most correct locale to use. Because 
why would Postgres care about it, if utf8/UTF-8 doesn't matter?


> but TBH, I doubt it's worth worrying about.

But couldn't there be an issue, if for example the client's locale and 
the server's locale aren't exactly the same? I'm thinking maybe the 
client library has to perform unneeded translation of the stream of data 
to/from the database?

-- 
Kind regards,
Troels








view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: utf8 vs UTF-8
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox