public inbox for [email protected]
help / color / mirror / Atom feedFrom: Daniel Verite <[email protected]>
To: Dominique Devienne <[email protected]>
Cc: Laurenz Albe <[email protected]>
Cc: [email protected]
Subject: Re: LOCALE C.UTF-8 on EDB Windows v17 server
Date: Thu, 05 Jun 2025 22:57:24 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAFCRh-8KoLDppV0K21rLZc=dR9_hgvzgoYf=g5PszbARKZ9RjQ@mail.gmail.com>
Dominique Devienne wrote:
> So you're saying datcollate and datctype from pg_database are
> irrelevant to PostgreSQL itself, and only extensions might be affects?
Almost. An exception that still exists in v18, as far as I can see [1],
is the default full text search parser still using libc functions like
iswdigit(), iswpunct(), iswspace()... that depend on LC_CTYPE.
So you could see differences between OSes in tsvector contents
in a database with the builtin provider.
Unless using LC_CTYPE=C. But then the parsing is suboptimal, since the
parser does not recognize Unicode fancy punctuation signs or spaces as
such.
Personally I would still care to set LC_CTYPE to a reasonable UTF-8 locale
with v17 or v18.
[1]
https://doxygen.postgresql.org/wparser__def_8c.html#a420ea398a8a11db92412a2af7bf45e40
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
view thread (2+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: LOCALE C.UTF-8 on EDB Windows v17 server
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox