Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaTa6-0095K5-1w for pgsql-general@arkaria.postgresql.org; Tue, 30 Dec 2025 06:51:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vaTa4-003AFm-1M for pgsql-general@arkaria.postgresql.org; Tue, 30 Dec 2025 06:51:45 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaTa4-003AFd-0E for pgsql-general@lists.postgresql.org; Tue, 30 Dec 2025 06:51:44 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaTa2-003WgT-13 for pgsql-general@lists.postgresql.org; Tue, 30 Dec 2025 06:51:44 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 5BU6peO63287815; Tue, 30 Dec 2025 01:51:40 -0500 From: Tom Lane To: Rahman Duran cc: pgsql-general@lists.postgresql.org Subject: Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance In-reply-to: References: Comments: In-reply-to Rahman Duran message dated "Tue, 30 Dec 2025 09:30:27 +0300" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3287813.1767077500.1@sss.pgh.pa.us> Date: Tue, 30 Dec 2025 01:51:40 -0500 Message-ID: <3287814.1767077500@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Rahman Duran writes: > After the release of the PostgreSQL 18 version, I am trying non > determisinstic collation with LIKE pattern matching support. I am mostly > searching with "LIKE %search_term%" on about 10 text columns. As I use > wildcard prefix and suffix, I can't use btree index anyways. So I decided > to try non deterministic collation support so I can simplify application > code. I am testing this on a table with ~60K rows. With this row count and > search pattern, non deterministic collation seems at least 10 times slower > than LOWER LIKE and ILIKE. This is not terribly surprising: non-deterministic collations disable a lot of lower-level optimizations in pattern matching. I think the particular one that is biting you is probably this bit in src/backend/utils/adt/like_match.c: * ... With a nondeterministic collation, we can't * rely on the first bytes being equal, so we have to recurse in * any case. or possibly the later bit * For nondeterministic locales, we find the next substring of the * pattern that does not contain wildcards and try to find a * matching substring in the text. Crucially, we cannot do this * character by character, as in the normal case, but must do it * substring by substring, partitioned by the wildcard characters. * (This is per SQL standard.) The fundamental problem here is not wanting to make assumptions about which character strings a non-deterministic collation will consider equal to which other character strings. If you have concrete ideas about how to improve that, let's hear them. regards, tom lane