public inbox for [email protected]
help / color / mirror / Atom feedFrom: jian he <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Jacob Champion <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Paul A Jungwirth <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Fri, 15 Nov 2024 12:26:24 +0800
Message-ID: <CACJufxFeOuBbkHfp=0-0rwamydjYY4ky1A+CPr6s3WUABC9_Rg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
On Tue, Nov 12, 2024 at 3:45 PM Peter Eisentraut <[email protected]> wrote:
>
> On 11.11.24 14:25, Heikki Linnakangas wrote:
> > Sadly the algorithm is O(n^2) with non-deterministic collations.Is there
> > any way this could be optimized? We make no claims on how expensive any
> > functions or operators are, so I suppose a slow implementation is
> > nevertheless better than throwing an error.
>
> Yeah, maybe someone comes up with new ideas in the future.
>
/*
* Now build a substring of the text and try to match it against
* the subpattern. t is the start of the text, t1 is one past the
* last byte. We start with a zero-length string.
*/
t1 = t
t1len = tlen;
for (;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);
select '.foo.' LIKE '_oo' COLLATE ign_punct;
pg_strncoll's iteration of the first 4 argument values.
oo 2 foo. 0
oo 2 foo. 1
oo 2 foo. 2
oo 2 foo. 3
oo 2 foo. 4
seems there is a shortcut/optimization.
if subpat don't have wildcard(percent sign, underscore)
then we can have less pg_strncoll calls?
minimum case to trigger error within GenericMatchText
since no related tests.
create table t1(a text collate case_insensitive, b text collate "C");
insert into t1 values ('a','a');
select a like b from t1;
at 9.7.1. LIKE section, we still don't know what "wildcard" is.
we mentioned it at 9.7.2.
maybe we can add a sentence at the end of:
<para>
If <replaceable>pattern</replaceable> does not contain percent
signs or underscores, then the pattern only represents the string
itself; in that case <function>LIKE</function> acts like the
equals operator. An underscore (<literal>_</literal>) in
<replaceable>pattern</replaceable> stands for (matches) any single
character; a percent sign (<literal>%</literal>) matches any sequence
of zero or more characters.
</para>
saying underscore and percent sign are wildcards in LIKE.
other than that, I can understand the doc.
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Support LIKE with nondeterministic collations
In-Reply-To: <CACJufxFeOuBbkHfp=0-0rwamydjYY4ky1A+CPr6s3WUABC9_Rg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox