public inbox for [email protected]  
help / color / mirror / Atom feed
ICU Collations and Collation Updates
3+ messages / 3 participants
[nested] [flat]

* ICU Collations and Collation Updates
@ 2025-04-14 08:28 Thomas Michael Engelke <[email protected]>
  2025-04-14 11:05 ` Re: ICU Collations and Collation Updates Laurenz Albe <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Thomas Michael Engelke @ 2025-04-14 08:28 UTC (permalink / raw)
  To: [email protected]

Good morning,

long time reader, first time writer.

Where I currently work my colleagues used libc collations before I
arrived. While using libc collations, they stumbled upon the collation
update problem after SLES updates (15.4 to 15.5) (collation version
difference for database and operating system) (paraphrased, don't have
the english message at the hand).

For an easy solution I suggested to switch to ICU collations. While
documenting the problem for older systems I realized that I did not
know enough about the problem to document why ICU collations would
solve this problem.

After reading https://www.postgresql.org/docs/17/collation.html this is
how I understand it:

When initdb creates a cluster the OS available collations are copied to
the database as database objects, listable using

select * from pg_collation;

Now, an OS collation update as part of the OS update will change the
collations available on the OS level, but not the collations that the
database uses.

Is my understanding correct then in that this way the database
collations never change, unless a manual intervention reinitialises the
collations and reindexes the database (or appropriate indexes)? How
does that process compare to other RDBMS?

Are regular collation updates deemed unnecessary for long running
database installations? Or do you people have maintenance workflows
that incorporate regular collation updates to the databases?

Thanks,

Thomas






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: ICU Collations and Collation Updates
  2025-04-14 08:28 ICU Collations and Collation Updates Thomas Michael Engelke <[email protected]>
@ 2025-04-14 11:05 ` Laurenz Albe <[email protected]>
  2025-04-14 14:36   ` Re: ICU Collations and Collation Updates Tom Lane <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Laurenz Albe @ 2025-04-14 11:05 UTC (permalink / raw)
  To: Thomas Michael Engelke <[email protected]>; [email protected]

On Mon, 2025-04-14 at 08:28 +0000, Thomas Michael Engelke wrote:
> Where I currently work my colleagues used libc collations before I
> arrived. While using libc collations, they stumbled upon the collation
> update problem after SLES updates (15.4 to 15.5) (collation version
> difference for database and operating system) (paraphrased, don't have
> the english message at the hand).
> 
> For an easy solution I suggested to switch to ICU collations. While
> documenting the problem for older systems I realized that I did not
> know enough about the problem to document why ICU collations would
> solve this problem.
> 
> After reading https://www.postgresql.org/docs/17/collation.html this is
> how I understand it:
> 
> When initdb creates a cluster the OS available collations are copied to
> the database as database objects, listable using
> 
> select * from pg_collation;
> 
> Now, an OS collation update as part of the OS update will change the
> collations available on the OS level, but not the collations that the
> database uses.
> 
> Is my understanding correct then in that this way the database
> collations never change, unless a manual intervention reinitialises the
> collations and reindexes the database (or appropriate indexes)? How
> does that process compare to other RDBMS?
> 
> Are regular collation updates deemed unnecessary for long running
> database installations? Or do you people have maintenance workflows
> that incorporate regular collation updates to the databases?

PostgreSQL just copies the names and versions of the collations to the
catalog.  The actual collating is done by the C or ICU library.

When you update the C library or ICU library and the version changes,
you get warned by PostgreSQL and have to rebuild indexes.

So the collations can change whenever you update the respective libraries.
You would have to build PostgreSQL yourself with a fixed version of ICU
that you never upgrade if you want to avoid the problem.

Or you start using the POSIX collation.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: ICU Collations and Collation Updates
  2025-04-14 08:28 ICU Collations and Collation Updates Thomas Michael Engelke <[email protected]>
  2025-04-14 11:05 ` Re: ICU Collations and Collation Updates Laurenz Albe <[email protected]>
@ 2025-04-14 14:36   ` Tom Lane <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Tom Lane @ 2025-04-14 14:36 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: Thomas Michael Engelke <[email protected]>; [email protected]

Laurenz Albe <[email protected]> writes:
> On Mon, 2025-04-14 at 08:28 +0000, Thomas Michael Engelke wrote:
>> Is my understanding correct then in that this way the database
>> collations never change, unless a manual intervention reinitialises the
>> collations and reindexes the database (or appropriate indexes)? How
>> does that process compare to other RDBMS?

> When you update the C library or ICU library and the version changes,
> you get warned by PostgreSQL and have to rebuild indexes.
> So the collations can change whenever you update the respective libraries.
> You would have to build PostgreSQL yourself with a fixed version of ICU
> that you never upgrade if you want to avoid the problem.

Yeah.  AIUI there are two things that ICU does better than libc here:

1. ICU has a fairly well-defined scheme for identifying collation
versions, glibc not so much.  So the collation-changed warnings that
Laurenz mentions are a lot more trustworthy for ICU collations.

2. It's at least *possible* to use your own fixed-version ICU
library if you're desperate enough.  I don't think that would work
too well for libc; you're stuck with what the platform provides.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2025-04-14 14:36 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-14 08:28 ICU Collations and Collation Updates Thomas Michael Engelke <[email protected]>
2025-04-14 11:05 ` Laurenz Albe <[email protected]>
2025-04-14 14:36   ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox