Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufFIr-004RNb-Qd for pgsql-general@arkaria.postgresql.org; Fri, 25 Jul 2025 10:05:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ufFIq-00Gpmo-Nq for pgsql-general@arkaria.postgresql.org; Fri, 25 Jul 2025 10:05:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufFIq-00Gpmg-96 for pgsql-general@lists.postgresql.org; Fri, 25 Jul 2025 10:05:24 +0000 Received: from shark1.inbox.lv ([194.152.32.81]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1ufFIk-000l5H-1k for pgsql-general@lists.postgresql.org; Fri, 25 Jul 2025 10:05:21 +0000 Received: by shark1.inbox.lv (Postfix, from userid 2004) id 4bpNly0wXmz2d6Kq; Fri, 25 Jul 2025 13:05:18 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=inbox.lv; s=p20220324; t=1753437918; x=1753439718; bh=L3wf2cCgJKu1+ZP3bXHY+h7tGgLFSyKWzb/yV9DxOfU=; h=Date:From:To:Message-ID:Subject:Content-Type:X-ESPOL:From:Date:To: Cc:Message-ID:Subject:Reply-To; b=QVJnp2mcOR2eyOEKNvTBtxTNZ9PIQ3XzuqlzMvXWTdKmAytwy0UAFmL3dnHqB/B1G mxOKJTqbXiSbP8dQWhoAydLgIrv8wPEH6i5cYh+qx99fiMbKB2sKj3/BEDoasGJkSR V8+OJ+2dYthDcHLEc6+o3a9lF3bdmHtEWQonkap8= Received: from shark1.inbox.lv (localhost [127.0.0.1]) by shark1-in.inbox.lv (Postfix) with ESMTP id 4bpNly0Ht9z2d6KT for ; Fri, 25 Jul 2025 13:05:18 +0300 (EEST) Received: from w8 (w8 [127.0.0.1]) by shark1-in.inbox.lv (Postfix) with ESMTP id 4bpNlx5n1cz2d6Kj for ; Fri, 25 Jul 2025 13:05:17 +0300 (EEST) Date: Fri, 25 Jul 2025 13:05:17 +0300 (EEST) From: =?UTF-8?Q?J=C4=81nisE?= To: "pgsql-general@lists.postgresql.org" Message-ID: <1814742357.238678.1753437917809@w8> Subject: Sorting by respecting diacritics/accents MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_238677_595599243.1753437917809" X-ESPOL: EZeEHyhdwHFay7OzV+JwnoPpx9+2SFgnuTn7y7ku63FFwMCyvqITbA3mFoXzb3vZf2LW/d/fagNU587qY0AvbA== List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------=_Part_238677_595599243.1753437917809 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello! I seem to not be able to get PostgreSQL to sort rows by a string col= umn respecting the diacritics. I read [1] that it's possible to define a cu= stom collation having collation strength "ks" set to "level2", which would = mean that it's accent-sensitive. However, when I try to actually sort using= that collation, the order seem to be accent-insensitive. For example: CREA= TE TABLE test (string text); INSERT INTO test VALUES ('bar'), ('bat'), ('b= =C3=A4r'); CREATE COLLATION "und1" (provider =3D icu, deterministic =3D fal= se, locale =3D 'und-u-ks-level1'); CREATE COLLATION "und2" (provider =3D ic= u, deterministic =3D false, locale =3D 'und-u-ks-level2'); CREATE COLLATION= "und3" (provider =3D icu, deterministic =3D false, locale =3D 'und-u-ks-le= vel3'); SELECT * FROM test ORDER BY string collate "und1"; SELECT * FROM te= st ORDER BY string collate "und2"; SELECT * FROM test ORDER BY string colla= te "und3"; All three collations give me the same order: bar < b=C3=A4r < ba= t, although an accent-sensitive order would be bar < bat < b=C3=A4r This do= es lose "b=C3=A4r", meaning that those strength levels do have some kind of= an effect on "DISTINCT": SELECT DISTINCT string COLLATE "und1" FROM test; = But it's not working on "ORDER BY". Do I misunderstand the collation capabi= lities? Is there a way to actually get an accent-sensitive order? Also, is = there a way to see what options are there for the default built-in collatio= ns? I don't see, for example, the used "ks" level in the "pg_collation" tab= le data. Best regards, Janis [1]=C2=A0 https://www.postgresql.org/docs/curr= ent/collation.html#ICU-COLLATION-COMPARISON-LEVELS ------=_Part_238677_595599243.1753437917809 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello!
=20
I seem to not be able to get PostgreSQL to sort rows by a string col= umn respecting the diacritics.
=20
I read [1] that it's possible to define a custom collation having co= llation strength "ks" set to "level2", which would mean that it's accent-se= nsitive.
=20
However, when I try to actually sort using that collation, the order= seem to be accent-insensitive.
=20
For example:
=20
CREATE TABLE test (string text);
INSERT INTO test VALUES ('bar'), ('bat'), ('b=C3=A4r');
CREATE COLLATION "und1" (provider =3D icu, deterministic =3D false, = locale =3D 'und-u-ks-level1');
CREATE COLLATION "und2" (provider =3D icu, deterministic =3D false, = locale =3D 'und-u-ks-level2');
CREATE COLLATION "und3" (provider =3D icu, deterministic =3D false, = locale =3D 'und-u-ks-level3');
SELECT * FROM test ORDER BY string collate "und1";
SELECT * FROM test ORDER BY string collate "und2";
SELECT * FROM test ORDER BY string collate "und3";
=20
All three collations give me the same order: bar < b=C3=A4r < = bat, although an accent-sensitive order would be bar < bat < b=C3=A4r
=20
This does lose "b=C3=A4r", meaning that those strength levels do hav= e some kind of an effect on "DISTINCT":
SELECT DISTINCT string COLLATE "und1" FROM test;
=20
But it's not working on "ORDER BY".
=20
Do I misunderstand the collation capabilities? Is there a way to act= ually get an accent-sensitive order?
=20
Also, is there a way to see what options are there for the default b= uilt-in collations? I don't see, for example, the used "ks" level in the "p= g_collation" table data.
=20
Best regards,
Janis
=20
[1]  htt= ps://www.postgresql.org/docs/current/collation.html#ICU-COLLATION-COMPARISO= N-LEVELS ------=_Part_238677_595599243.1753437917809--