Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
2+ messages / 2 participants
[nested] [flat]

* Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
@ 2025-12-30 06:51 Tom Lane <[email protected]>
  2025-12-30 07:07 ` Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance Rahman Duran <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Tom Lane @ 2025-12-30 06:51 UTC (permalink / raw)
  To: Rahman Duran <[email protected]>; +Cc: [email protected]

Rahman Duran <[email protected]> writes:
> After the release of the PostgreSQL 18 version, I am trying non
> determisinstic collation with LIKE pattern matching support. I am mostly
> searching with "LIKE %search_term%" on about 10 text columns. As I use
> wildcard prefix and suffix, I can't use btree index anyways. So I decided
> to try non deterministic collation support so I can simplify application
> code. I am testing this on a table with ~60K rows. With this row count and
> search pattern, non deterministic collation seems at least 10 times slower
> than LOWER LIKE and ILIKE.

This is not terribly surprising: non-deterministic collations disable
a lot of lower-level optimizations in pattern matching.  I think the
particular one that is biting you is probably this bit in
src/backend/utils/adt/like_match.c:

             * ... With a nondeterministic collation, we can't
             * rely on the first bytes being equal, so we have to recurse in
             * any case.

or possibly the later bit

             * For nondeterministic locales, we find the next substring of the
             * pattern that does not contain wildcards and try to find a
             * matching substring in the text.  Crucially, we cannot do this
             * character by character, as in the normal case, but must do it
             * substring by substring, partitioned by the wildcard characters.
             * (This is per SQL standard.)

The fundamental problem here is not wanting to make assumptions about
which character strings a non-deterministic collation will consider
equal to which other character strings.  If you have concrete ideas
about how to improve that, let's hear them.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
  2025-12-30 06:51 Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance Tom Lane <[email protected]>
@ 2025-12-30 07:07 ` Rahman Duran <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Rahman Duran @ 2025-12-30 07:07 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

Tom Lane <[email protected]>, 30 Ara 2025 Sal, 09:51 tarihinde şunu yazdı:
>The fundamental problem here is not wanting to make assumptions about
>which character strings a non-deterministic collation will consider
>equal to which other character strings.  If you have concrete ideas
>about how to improve that, let's hear them.

Thanks for the explanations. I am just a user, not a DB developer :) So no
ideas I am afraid. I was just surprised that running the lower() function
on every column of every row is way faster than using nondeterministic
collation. I will test GIN index with trigram to see if it helps. If not I
will leave the code as is with lower() functions.

Regards,
Rahman Duran


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2025-12-30 07:07 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-12-30 06:51 Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance Tom Lane <[email protected]>
2025-12-30 07:07 ` Rahman Duran <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox