public inbox for [email protected]help / color / mirror / Atom feed
Re: LOCALE C.UTF-8 on EDB Windows v17 server 8+ messages / 4 participants [nested] [flat]
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 08:53 Dominique Devienne <[email protected]> 0 siblings, 3 replies; 8+ messages in thread From: Dominique Devienne @ 2025-06-05 08:53 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: pgsql-general On Thu, Jun 5, 2025 at 3:01 AM Laurenz Albe <[email protected]> wrote: > On Wed, 2025-06-04 at 14:23 +0200, Dominique Devienne wrote: > > The command I'm using (from a libpq trace) is: > > > > create database "dd_v168" encoding 'UTF8' locale 'C.UTF-8' > > locale_provider 'builtin' template template0 > > > > On Windows, I'm getting > > > > 2025-06-04 14:07:41.227419 B 155 ErrorResponse S "ERROR" V "ERROR" C > > "42809" M "invalid LC_COLLATE locale name: "C.UTF-8"" H "If the locale > > name is specific to ICU, use ICU_LOCALE." F "dbcommands.c" L "1057" R > > "createdb" \x00 > > Pilot error. If you use "LOCALE_PROVIDER builtin", you have to specify > BUILTIN LOCALE too: > > CREATE DATABASE b > TEMPLATE template0 > LOCALE_PROVIDER builtin > BUILTIN_LOCALE 'C.UTF-8' > /* used for aspects other than collation and character type */ > LOCALE 'C'; Thanks Laurenz. Indeed, Using LOCALE vs BUILTIN_LOCALE matters. On Linux, no error unlike on Windows (still inconsistent there IMHO), but the result is slightly different for datcollate and datctype (C vs en_US), while the same for datlocprovider and datlocale, what I looked at. Thus I kinda persist that there *is* a portability issue here. Also, note what the doc says: If locale_provider is builtin, then locale or builtin_locale must be specified and set to either C or C.UTF-8. It clearly says "locale or builtin_locale", emphasis on the OR. So two issues here. 1) the doc is wrong or misleading on this. 2) the same command works on Linux, but not Windows. FWIW. --DD C:\Users\ddevienne>psql service=pau17 psql (17.4, server 17.5) ddevienne=> select version(); version --------------------------------------------------------------------------------------------------------- PostgreSQL 17.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit (1 row) ddevienne=> create database "dd_v168" encoding 'UTF8' locale 'C.UTF-8' ddevienne-> locale_provider 'builtin' template template0; CREATE DATABASE ddevienne=> select datlocprovider, datlocale, datcollate, datctype from pg_database where datname = 'dd_v168'; datlocprovider | datlocale | datcollate | datctype ----------------+-----------+------------+---------- b | C.UTF-8 | C.UTF-8 | C.UTF-8 (1 row) ddevienne=> create database "dd_v168b" encoding 'UTF8' builtin_locale 'C.UTF-8' ddevienne-> locale_provider 'builtin' template template0; CREATE DATABASE ddevienne=> select datlocprovider, datlocale, datcollate, datctype from pg_database where datname = 'dd_v168b'; datlocprovider | datlocale | datcollate | datctype ----------------+-----------+-------------+------------- b | C.UTF-8 | en_US.UTF-8 | en_US.UTF-8 (1 row) ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 11:40 Laurenz Albe <[email protected]> parent: Dominique Devienne <[email protected]> 2 siblings, 1 reply; 8+ messages in thread From: Laurenz Albe @ 2025-06-05 11:40 UTC (permalink / raw) To: Dominique Devienne <[email protected]>; +Cc: pgsql-general On Thu, 2025-06-05 at 10:53 +0200, Dominique Devienne wrote: > Thanks Laurenz. Indeed, Using LOCALE vs BUILTIN_LOCALE matters. > > On Linux, no error unlike on Windows (still inconsistent there IMHO), > but the result is slightly different for datcollate and datctype (C vs en_US), > while the same for datlocprovider and datlocale, what I looked at. > > Thus I kinda persist that there *is* a portability issue here. Perhaps, if omitting BUILTIN_LOCALE actually fails on Windows (I cannot test it, no Windows nearby). > Also, note what the doc says: > > If locale_provider is builtin, then locale or builtin_locale must be > specified and set to either C or C.UTF-8. > > It clearly says "locale or builtin_locale", emphasis on the OR. You are right, and that's how it works on Linux. BUILTIN_LOCALE is not required. > So two issues here. > 1) the doc is wrong or misleading on this. Perhaps the problem is in the implementation, not the documentation. > 2) the same command works on Linux, but not Windows. Unfortunately I am not in a position to get to the bottom of that. In principle, it is acceptable for commands to fail on Windows and work elsewhere, if operating system things like collations are involved. But I agree that the "builtin" locale provider should work the same everywhere. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 11:54 Dominique Devienne <[email protected]> parent: Laurenz Albe <[email protected]> 0 siblings, 0 replies; 8+ messages in thread From: Dominique Devienne @ 2025-06-05 11:54 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: pgsql-general On Thu, Jun 5, 2025 at 1:40 PM Laurenz Albe <[email protected]> wrote: > On Thu, 2025-06-05 at 10:53 +0200, Dominique Devienne wrote: > > It clearly says "locale or builtin_locale", emphasis on the OR. > > You are right, and that's how it works on Linux. > BUILTIN_LOCALE is not required. Still, required or not, they yield different results (for datcollate and datctype), which is hardly expected. ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 12:40 Daniel Verite <[email protected]> parent: Dominique Devienne <[email protected]> 2 siblings, 1 reply; 8+ messages in thread From: Daniel Verite @ 2025-06-05 12:40 UTC (permalink / raw) To: Dominique Devienne <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general Dominique Devienne wrote: > On Linux, no error unlike on Windows (still inconsistent there IMHO), > but the result is slightly different for datcollate and datctype (C vs > en_US), > while the same for datlocprovider and datlocale, what I looked at. > > Thus I kinda persist that there *is* a portability issue here. "datcollate" and "datctype" refer to operating system locale names. locale 'C.UTF-8' or lc_collate 'C.UTF-8' lc_ctype 'C.UTF-8' cannot work on Windows because Windows does not have a locale named C.UTF-8, whereas a Linux system does (well at least recent Linuxes. Some old Linuxes don't). What you are seeing is the effect of OS locales not being portable across systems. That's confusing but not a Postgres bug. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 13:07 Dominique Devienne <[email protected]> parent: Daniel Verite <[email protected]> 0 siblings, 2 replies; 8+ messages in thread From: Dominique Devienne @ 2025-06-05 13:07 UTC (permalink / raw) To: Daniel Verite <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general On Thu, Jun 5, 2025 at 2:40 PM Daniel Verite <[email protected]> wrote: > Dominique Devienne wrote: > > On Linux, no error unlike on Windows (still inconsistent there IMHO), > > but the result is slightly different for datcollate and datctype (C vs > > en_US), > > while the same for datlocprovider and datlocale, what I looked at. > > > > Thus I kinda persist that there *is* a portability issue here. > > "datcollate" and "datctype" refer to operating system locale names. > > locale 'C.UTF-8' or lc_collate 'C.UTF-8' lc_ctype 'C.UTF-8' > cannot work on Windows because Windows does not have a locale > named C.UTF-8, whereas a Linux system does (well at least recent > Linuxes. Some old Linuxes don't). But isn't the point of the new-in-v17 builtin provider is to be system independent??? > What you are seeing is the effect of OS locales not being portable > across systems. That's confusing but not a Postgres bug. Thus builtin SHOULD be portable IMHO. --DD ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 15:01 Daniel Verite <[email protected]> parent: Dominique Devienne <[email protected]> 1 sibling, 0 replies; 8+ messages in thread From: Daniel Verite @ 2025-06-05 15:01 UTC (permalink / raw) To: Dominique Devienne <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general Dominique Devienne wrote: > > locale 'C.UTF-8' or lc_collate 'C.UTF-8' lc_ctype 'C.UTF-8' > > cannot work on Windows because Windows does not have a locale > > named C.UTF-8, whereas a Linux system does (well at least recent > > Linuxes. Some old Linuxes don't). > > But isn't the point of the new-in-v17 builtin provider is to be system > independent??? Yes, definitely. But suppose your database has an extension that calls local-dependent code, such as strxfrm() [1] for instance. The linked MSVC doc says: "The transformation is made using the locale's LC_COLLATE category setting. For more information on LC_COLLATE, see setlocale. strxfrm uses the current locale for its locale-dependent behavior" But what will be the value in LC_COLLATE when this extension code is running in a database using the builtin provider? It's the value found in pg_database.datcollate that was specified when creating the database with the lc_collate or locale option. The builtin provider routines are used for code inside Postgres core, but code outside its perimeter can still call libc functions that depend on lc_collate and lc_ctype. [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strxfrm-wcsxfrm-strxfrm-l-wcsxfrm-... Best regards, -- Daniel Vérité https://postgresql.verite.pro/ ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 21:53 Jeff Davis <[email protected]> parent: Dominique Devienne <[email protected]> 1 sibling, 0 replies; 8+ messages in thread From: Jeff Davis @ 2025-06-05 21:53 UTC (permalink / raw) To: Dominique Devienne <[email protected]>; Daniel Verite <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general On Thu, 2025-06-05 at 15:07 +0200, Dominique Devienne wrote: > But isn't the point of the new-in-v17 builtin provider is to be > system > independent??? Yes, a major part of the builtin provider is complete consistency across platforms for the entire collation system -- anything affected by the database default collation or a COLLATE clause, including comparisons, casing behavior, pattern matching, etc. New major versions of Postgres may update Unicode, but those updates will never affect comparisons in the builtin C.UTF-8 locale; and will only affect other behaviors (like casing) subject to the (rather strict) Unicode stability policy[1]. Regarding datcollate and datctype: those affect the LC_COLLATE and LC_CTYPE environment variables, and Postgres does a setlocale() upon a new database connection. That only affects libc functions like strcoll(), so it won't affect the builtin provider or ICU which don't use strcoll(). You're right to ask why those matter at all, then. It's hard for me to guarantee that datcollate/datctype won't affect some other part of the system or an extension (I see that Daniel offered some more details). I'd like to force LC_COLLATE=C and LC_CTYPE=C, and then there'd be no question, but I won't promise when that will happen. I'd suggest just forcing those to "C" in your database. Regards, Jeff Davis [1] https://www.unicode.org/policies/stability_policy.html ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: LOCALE C.UTF-8 on EDB Windows v17 server @ 2025-06-05 22:09 Jeff Davis <[email protected]> parent: Dominique Devienne <[email protected]> 2 siblings, 0 replies; 8+ messages in thread From: Jeff Davis @ 2025-06-05 22:09 UTC (permalink / raw) To: Dominique Devienne <[email protected]>; Laurenz Albe <[email protected]>; +Cc: pgsql-general On Thu, 2025-06-05 at 10:53 +0200, Dominique Devienne wrote: > If locale_provider is builtin, then locale or builtin_locale must be > specified and set to either C or C.UTF-8. > > It clearly says "locale or builtin_locale", emphasis on the OR. > > So two issues here. > 1) the doc is wrong or misleading on this. The code in dbcommands.c:createdb(): if (localeEl && localeEl->arg) { dbcollate = defGetString(localeEl); dbctype = defGetString(localeEl); dblocale = defGetString(localeEl); } if (builtinlocaleEl && builtinlocaleEl->arg) dblocale = defGetString(builtinlocaleEl); if (collateEl && collateEl->arg) dbcollate = defGetString(collateEl); if (ctypeEl && ctypeEl->arg) dbctype = defGetString(ctypeEl); if (iculocaleEl && iculocaleEl->arg) dblocale = defGetString(iculocaleEl); So LC_COLLATE, LC_CTYPE, and BUILTIN_LOCALE all fall back to LOCALE if they aren't set. On windows, it fails because LC_COLLATE and LC_CTYPE fall back to LOCALE, which is "C.UTF-8", which doesn't exist. (I know the CREATE DATABASE command is confusing, but it grew historically and we needed to support previously-working commands.) If you have specific doc suggestions to clarify this, please let me know. > 2) the same command works on Linux, but not Windows. As long as we accept the libc provider, or allow the user to set LC_COLLATE/LC_CTYPE, then there will be some commands that succeed on some platforms and fail on others. Even with ICU, there may be versions that accept a locale and versions that don't. I'd like to make libc less "special" so that users who don't want to use it aren't confronted with errors about things that don't matter to them. I welcome suggestions that can move us closer to that goal without breaking previously-working commands. Regards, Jeff Davis ^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2025-06-05 22:09 UTC | newest] Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-06-05 08:53 Re: LOCALE C.UTF-8 on EDB Windows v17 server Dominique Devienne <[email protected]> 2025-06-05 11:40 ` Laurenz Albe <[email protected]> 2025-06-05 11:54 ` Dominique Devienne <[email protected]> 2025-06-05 12:40 ` Daniel Verite <[email protected]> 2025-06-05 13:07 ` Dominique Devienne <[email protected]> 2025-06-05 15:01 ` Daniel Verite <[email protected]> 2025-06-05 21:53 ` Jeff Davis <[email protected]> 2025-06-05 22:09 ` Jeff Davis <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox