public inbox for [email protected]  
help / color / mirror / Atom feed
From: Daniel Verite <[email protected]>
To: [email protected]
Subject: Re: Supporting non-deterministic collations with tailoring rules.
Date: Sun, 15 Mar 2026 19:52:24 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

[resent for the list]

       Peter Eisentraut wrote:

> > To me, the most plausible fix on the Postgres side would be to pass
> > UCOL_DEFAULT instead of UCOL_DEFAULT_STRENGTH as in the attached,
> > which lets the user specify the strength in the rule, as the OP did in [1].
> 
> With this change, I don't see that the bug reported in ICU-22456 is 
> fixed.  See attached my test case.
> 
> What change of behavior are you expecting from your patch?  Should there 
> be test cases?

Indeed it does not fix the behavior reported in ICU-22456
(=collation properties being "reset" when adding rules)
It fixes the fact that specificying the strength in the rule itself
will be taken into account instead of the strength being
forced to tertiary.

PFA a new patch with a specific test case added.

With regards to reports on pgsql-bugs, it fixes #18771
and the second part of #19425.
It does not fix #19045, which looks the same as ICU-22456
It also does not fix the first part of #19425 (which looks the
same as #19045). We cannot fix that in Postgres AFAIU.
However adding the strength or other collation properties in
the rules can be used as a workaround against the ICU-22456
issue, provided the attached change is made.


Best regards,
-- 
Daniel Vérité 
https://postgresql.verite.pro/


Attachments:

  [text/x-patch] v2-rules-ucol-default-strength.diff (2.1K, 2-v2-rules-ucol-default-strength.diff)
  download | inline diff:
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 352b4c3885f..5ad05fcd016 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -587,7 +587,7 @@ make_icu_collator(const char *iculocstr, const char *icurules)
 
 		status = U_ZERO_ERROR;
 		collator_all_rules = ucol_openRules(all_rules, u_strlen(all_rules),
-											UCOL_DEFAULT, UCOL_DEFAULT_STRENGTH,
+											UCOL_DEFAULT, UCOL_DEFAULT,
 											NULL, &status);
 		if (U_FAILURE(status))
 		{
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 1325e123877..e4eff391946 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1297,6 +1297,16 @@ DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
 NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
+-- strength specified in the rules
+CREATE COLLATION strength_in_rule ( provider = icu, locale='und',
+ deterministic=false, rules='[strength 1]');
+SELECT 'a'='à' COLLATE strength_in_rule; -- true because of the rule
+ ?column? 
+----------
+ t
+(1 row)
+
+--
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
 NOTICE:  using standard form "und" for ICU locale ""
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index b6c54503d21..c0613b238b8 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -513,6 +513,12 @@ DROP TABLE test7;
 
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
 
+-- strength specified in the rules
+CREATE COLLATION strength_in_rule ( provider = icu, locale='und',
+ deterministic=false, rules='[strength 1]');
+SELECT 'a'='à' COLLATE strength_in_rule; -- true because of the rule
+
+--
 
 -- nondeterministic collations
 


view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Supporting non-deterministic collations with tailoring rules.
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox