public inbox for [email protected]
help / color / mirror / Atom feedFrom: jian he <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Jacob Champion <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Paul A Jungwirth <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Mon, 18 Nov 2024 11:30:20 +0800
Message-ID: <CACJufxHVcgt6ybYLX+R6YYcK=Hc0ctTD_wFfJvrR37yrjYyrww@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<CACJufxFeOuBbkHfp=0-0rwamydjYY4ky1A+CPr6s3WUABC9_Rg@mail.gmail.com>
<[email protected]>
On Fri, Nov 15, 2024 at 11:42 PM Peter Eisentraut <[email protected]> wrote:
>
> On 15.11.24 05:26, jian he wrote:
> > /*
> > * Now build a substring of the text and try to match it against
> > * the subpattern. t is the start of the text, t1 is one past the
> > * last byte. We start with a zero-length string.
> > */
> > t1 = t
> > t1len = tlen;
> > for (;;)
> > {
> > int cmp;
> > CHECK_FOR_INTERRUPTS();
> > cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);
> >
> > select '.foo.' LIKE '_oo' COLLATE ign_punct;
> > pg_strncoll's iteration of the first 4 argument values.
> > oo 2 foo. 0
> > oo 2 foo. 1
> > oo 2 foo. 2
> > oo 2 foo. 3
> > oo 2 foo. 4
> >
> > seems there is a shortcut/optimization.
> > if subpat don't have wildcard(percent sign, underscore)
> > then we can have less pg_strncoll calls?
>
> How would you do that? You need to try all combinations to find one
> that matches.
>
we can optimize when trailing (last character) is not wildcards.
SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
within the for loop
for(;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
....
}
pg_strncoll comparison will become
Ha12foo foo
a12foo foo
12foo foo
2foo foo
foo foo
it's safe because in MatchText we have:
else if (*p == '%')
{
while (tlen > 0)
{
if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
{
int matched = MatchText(t, tlen, p, plen, locale);
if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
}
NextChar(t, tlen);
}
}
please check attached.
> > minimum case to trigger error within GenericMatchText
> > since no related tests.
> > create table t1(a text collate case_insensitive, b text collate "C");
> > insert into t1 values ('a','a');
> > select a like b from t1;
>
> This results in
>
> ERROR: 42P22: could not determine which collation to use for LIKE
> HINT: Use the COLLATE clause to set the collation explicitly.
>
> which is the expected behavior.
>
sorry, didn't mention it clearly, i mean we can add it to the regress test.
Attachments:
[application/octet-stream] v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot (2.6K, 2-v8-0001-LIKE-with-nondeterministic-collations-no-trail.no-cfbot)
download
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Support LIKE with nondeterministic collations
In-Reply-To: <CACJufxHVcgt6ybYLX+R6YYcK=Hc0ctTD_wFfJvrR37yrjYyrww@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox