public inbox for [email protected]  
help / color / mirror / Atom feed
From: Heikki Linnakangas <[email protected]>
To: Peter Eisentraut <[email protected]>
To: Jacob Champion <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Paul A Jungwirth <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Mon, 11 Nov 2024 15:25:29 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com>
	<[email protected]>

On 04/11/2024 10:26, Peter Eisentraut wrote:
> On 29.10.24 18:15, Jacob Champion wrote:
>> libfuzzer is unhappy about the following code in MatchText:
>>
>>> +            while (p1len > 0)
>>> +            {
>>> +                if (*p1 == '\\')
>>> +                {
>>> +                    found_escape = true;
>>> +                    NextByte(p1, p1len);
>>> +                }
>>> +                else if (*p1 == '_' || *p1 == '%')
>>> +                    break;
>>> +                NextByte(p1, p1len);
>>> +            }
>>
>> If the pattern ends with a backslash, we'll call NextByte() twice,
>> p1len will wrap around to INT_MAX, and we'll walk off the end of the
>> buffer. (I fixed it locally by duplicating the ERROR case that's
>> directly above this.)
> 
> Thanks.  Here is an updated patch with that fixed.

Sadly the algorithm is O(n^2) with non-deterministic collations.Is there 
any way this could be optimized? We make no claims on how expensive any 
functions or operators are, so I suppose a slow implementation is 
nevertheless better than throwing an error.

Let's at least add some CHECK_FOR_INTERRUPTS(). For example, this takes 
a very long time and is uninterruptible:

  SELECT repeat('x', 100000) LIKE '%xxxy%' COLLATE ignore_accents;

-- 
Heikki Linnakangas
Neon (https://neon.tech)







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox