public inbox for [email protected]
help / color / mirror / Atom feedFrom: Adrian Klaver <[email protected]>
To: Troels Arvin <[email protected]>
To: [email protected]
Cc: Tom Lane <[email protected]>
Subject: Re: utf8 vs UTF-8
Date: Sat, 18 May 2024 08:01:17 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
On 5/18/24 07:48, Troels Arvin wrote:
> Hello,
>
> Tom Lane wrote:
> >> test1 | loc_test | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8
> >> test3 | troels | UTF8 | libc | en_US.utf8 | en_US.utf8
> >
> > On most if not all platforms, both those spellings of the locale names
> > will be taken as valid. You might try running "locale -a" to get an
> > idea of which one is preferred according to your current libc
> > installation
>
> "locale -a" on the Ubuntu system outputs this:
>
> C
> C.utf8
> en_US.utf8
> POSIX
If you expand that to locale -v -a you get:
locale: en_US.utf8 archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
title | English locale for the USA
source | Free Software Foundation, Inc.
address | https://www.gnu.org/software/libc/
email | [email protected]
language | American English
territory | United States
revision | 1.0
date | 2000-06-24
codeset | UTF-8
> So at first, I thought en_US.utf8 would be the most correct locale
> identifier. However, when I look at Postgres' own databases, they have
> the slightly different locale string:
>
> psql --list | grep -E 'postgres|template'
> postgres | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>
> Also, when I try to create a database with "en_US.utf8" as locale
> without specifying a template:
>
> troels=# create database test4 locale 'en_US.utf8';
> ERROR: new collation (en_US.utf8) is incompatible with the collation of
> the template database (en_US.UTF-8)
> HINT: Use the same collation as in the template database, or use
> template0 as template.
I'm going to say that is Postgres being exact to a fault.
>
> Given the locale of Postgres' own databases and Postgres' error message,
> I'm leaning to en_US.UTF-8 being the most correct locale to use. Because
> why would Postgres care about it, if utf8/UTF-8 doesn't matter?
>
>
>> but TBH, I doubt it's worth worrying about.
>
> But couldn't there be an issue, if for example the client's locale and
> the server's locale aren't exactly the same? I'm thinking maybe the
> client library has to perform unneeded translation of the stream of data
> to/from the database?
--
Adrian Klaver
[email protected]
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: utf8 vs UTF-8
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox