public inbox for [email protected]  
help / color / mirror / Atom feed
From: jian he <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Jacob Champion <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Paul A Jungwirth <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Wed, 20 Nov 2024 15:29:22 +0800
Message-ID: <CACJufxGuBNQzx1LFBpEP01A1SuSndfzMWXHR9vr9bV3A6dB84g@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CACJufxFeOuBbkHfp=0-0rwamydjYY4ky1A+CPr6s3WUABC9_Rg@mail.gmail.com>
	<[email protected]>
	<CACJufxHVcgt6ybYLX+R6YYcK=Hc0ctTD_wFfJvrR37yrjYyrww@mail.gmail.com>
	<[email protected]>

On Tue, Nov 19, 2024 at 9:51 PM Peter Eisentraut <[email protected]> wrote:
>
> On 18.11.24 04:30, jian he wrote:
> > we can optimize when trailing (last character) is not  wildcards.
> >
> > SELECT 'Ha12foo' LIKE '%foo' COLLATE ignore_accents;
> > within the for loop
> > for(;;)
> > {
> > int            cmp;
> > CHECK_FOR_INTERRUPTS();
> > ....
> > }
> >
> > pg_strncoll comparison will become
> > Ha12foo    foo
> > a12foo      foo
> > 12foo        foo
> > 2foo          foo
> > foo            foo
> >
> > it's safe because in MatchText we have:
> > else if (*p == '%')
> > {
> > while (tlen > 0)
> > {
> >      if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
> >      {
> >          int            matched = MatchText(t, tlen, p, plen, locale);
> >          if (matched != LIKE_FALSE)
> >              return matched; /* TRUE or ABORT */
> >      }
> >      NextChar(t, tlen);
> > }
> > }
> >
> > please check attached.
>
> I see, good idea.  I implemented it a bit differently.  See "Shortcut:
> If this is the end of the pattern ..." in this patch.  Please check if
> this is what you had in mind.

your implementation is far more simpler than mine.
I think I understand it.

i am trying to optimize case where pattern is begin_with like `pattern%`
but failed on case like:
SELECT U&'\0061\0308bc' LIKE U&'\00E4bc%' COLLATE ignore_accents;
basically the_string like the_pattern%. the length of the_string  and
length of the_pattern
can vary, we can not just do one pg_strncoll.


in match_pattern_prefix maybe change
    if (expr_coll && !get_collation_isdeterministic(expr_coll))
        return NIL;
to
    if (OidIsValid(expr_coll) && !get_collation_isdeterministic(expr_coll))
        return NIL;

other than that, I didn't find any issue.






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <CACJufxGuBNQzx1LFBpEP01A1SuSndfzMWXHR9vr9bV3A6dB84g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox