public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jacob Champion <[email protected]>
To: Peter Eisentraut <[email protected]>
Cc: pgsql-hackers <[email protected]>
Cc: Daniel Verite <[email protected]>
Cc: Paul A Jungwirth <[email protected]>
Subject: Re: Support LIKE with nondeterministic collations
Date: Tue, 29 Oct 2024 10:15:20 -0700
Message-ID: <CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CA+renyWd-_sAj3YqBRaQVOOMr5uQoeBcA3tjCSyQFzvnbGrMYA@mail.gmail.com>
	<[email protected]>
	<[email protected]>

On Sun, Sep 15, 2024 at 11:26 PM Peter Eisentraut <[email protected]> wrote:
>
> Here is an updated patch.  It is rebased over the various recent changes
> in the locale APIs.  No other changes.

libfuzzer is unhappy about the following code in MatchText:

> +            while (p1len > 0)
> +            {
> +                if (*p1 == '\\')
> +                {
> +                    found_escape = true;
> +                    NextByte(p1, p1len);
> +                }
> +                else if (*p1 == '_' || *p1 == '%')
> +                    break;
> +                NextByte(p1, p1len);
> +            }

If the pattern ends with a backslash, we'll call NextByte() twice,
p1len will wrap around to INT_MAX, and we'll walk off the end of the
buffer. (I fixed it locally by duplicating the ERROR case that's
directly above this.)

So far that's the only thing reported, but fuzzing is slow. The fuzzer
is incentivized to find more and more horrible call stacks, which in
this case means it's finding inefficient patterns with a lot of
backtracking. (Performance drops from 25000+ iterations per second, to
roughly 50 per second, pretty quickly, and that's not fast enough to
make good progress.) I haven't dug in yet to see whether there are
optimizations that would avoid the worst cases.

Thanks,
--Jacob






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Support LIKE with nondeterministic collations
  In-Reply-To: <CAOYmi+nqr4xCe9-g4BAupnu2rZmvLy1T3qq3ejOUWOCsoJ4ZdA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox