public inbox for [email protected]
help / color / mirror / Atom feedRe: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
4+ messages / 2 participants
[nested] [flat]
* Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
@ 2026-06-05 09:28 Nitin Motiani <[email protected]>
2026-06-05 11:26 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Ewan Young <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Nitin Motiani @ 2026-06-05 09:28 UTC (permalink / raw)
To: Zsolt Parragi <[email protected]>; +Cc: [email protected]
Hi,
I have created a commitfest entry for this patch here
https://commitfest.postgresql.org/patch/6844/. Please take a look.
Thanks,
Nitin Motiani
Google
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
2026-06-05 09:28 Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
@ 2026-06-05 11:26 ` Ewan Young <[email protected]>
2026-06-06 12:25 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Ewan Young @ 2026-06-05 11:26 UTC (permalink / raw)
To: Nitin Motiani <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; [email protected]
Hi,
I reviewed the v2 patch.
The patch applies cleanly on the current master (4cb2a9863d8). I built with
--enable-cassert and ICU, and confirmed that the three backslash test
cases give wrong results on unpatched master and the expected results
with the patch applied. The full regression suite passes (245/245),
including the updated collate.icu.utf8 test.
Two comments:
1. The commit message describes the symptom as "an incorrect match
failure", but the bug also causes incorrect matches in the other
direction. Since the unescaping logic dropped the literal backslash
from the pattern, a text *without* a backslash could wrongly match a
pattern that requires one:
SELECT 'backslash' COLLATE ignore_accents LIKE 'back\\slash%';
-- unpatched: t (wrong), patched: f (correct)
I think it's worth mentioning this false-positive side of the bug in
the commit message, since silently-too-permissive LIKE filters are
arguably the more dangerous symptom for applications.
2. A small typo in the new comment in like_match.c:
"occurences" should be "occurrences".
Best regards
On Fri, Jun 5, 2026 at 7:21 PM Nitin Motiani <[email protected]> wrote:
>
> Hi,
>
> I have created a commitfest entry for this patch here
> https://commitfest.postgresql.org/patch/6844/. Please take a look.
>
> Thanks,
> Nitin Motiani
> Google
>
>
>
>
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
2026-06-05 09:28 Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
2026-06-05 11:26 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Ewan Young <[email protected]>
@ 2026-06-06 12:25 ` Nitin Motiani <[email protected]>
2026-06-08 07:59 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Ewan Young <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Nitin Motiani @ 2026-06-06 12:25 UTC (permalink / raw)
To: Ewan Young <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; [email protected]
> I reviewed the v2 patch.
>
Thanks for the feedback.
>
> 1. The commit message describes the symptom as "an incorrect match
> failure", but the bug also causes incorrect matches in the other
> direction. Since the unescaping logic dropped the literal backslash
> from the pattern, a text *without* a backslash could wrongly match a
> pattern that requires one:
>
> SELECT 'backslash' COLLATE ignore_accents LIKE 'back\\slash%';
> -- unpatched: t (wrong), patched: f (correct)
>
> I think it's worth mentioning this false-positive side of the bug in
> the commit message, since silently-too-permissive LIKE filters are
> arguably the more dangerous symptom for applications.
>
I have updated the commit message. I also added another test for this
scenario in v3.
> 2. A small typo in the new comment in like_match.c:
> "occurences" should be "occurrences".
>
Fixed the typo.
Thanks,
Nitin Motiani
Google
Attachments:
[application/x-patch] v3-0001-Fix-LIKE-matching-with-nondeterministic-collation.patch (5.1K, 2-v3-0001-Fix-LIKE-matching-with-nondeterministic-collation.patch)
download | inline diff:
From 7e951783602e1d0e50ee06c3be4e142691a7af94 Mon Sep 17 00:00:00 2001
From: Nitin Motiani <[email protected]>
Date: Thu, 14 May 2026 10:49:54 +0000
Subject: [PATCH v3] Fix LIKE matching with nondeterministic collations and
backslashes
Commit 85b7efa1cd added support for LIKE with nondeterministic
collations, but it included a bug in the unescaping logic for pattern
partitions. When the pattern contained a literal backslash (which is
represented as '\\' in the internal pattern), the code would skip both
backslashes, resulting in an incorrect match failure against the
original text.
This logic also can cause a false positive match. If the pattern
has a literal backslash but the text doesn't have one, skipping the
backslashes would lead to a false positive.
This fix ensures that an escape backslash correctly causes the following
character to be copied literally into the subpattern before comparison.
A few regression tests are added to verify the fix and prevent future
regressions.
Reported-by: b/19474 on pgsql-bugs
---
src/backend/utils/adt/like_match.c | 37 ++++++++++++++++---
.../regress/expected/collate.icu.utf8.out | 31 ++++++++++++++++
src/test/regress/sql/collate.icu.utf8.sql | 7 ++++
3 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index f5f72b82e21..5eae2b8a03e 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -252,14 +252,39 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
if (found_escape)
{
char *b;
+ const char *c = p;
+ const char *start; /* used in the loop whenever we are copying a
+ * multibyte character */
+ int clen = p1 - p;
+ bool afterescape = false;
- b = buf = palloc(p1 - p);
- for (const char *c = p; c < p1; c++)
+ b = buf = palloc(clen);
+
+ /*
+ * Remove occurrences of a single '\'. And if we have a '\\',
+ * keep one '\'.
+ */
+ while (clen > 0)
{
- if (*c == '\\')
- ;
- else
- *(b++) = *c;
+ if (*c == '\\' && !afterescape)
+ {
+ afterescape = true;
+ NextByte(c, clen);
+ continue;
+ }
+
+ /*
+ * Copy the entire character (1-4 bytes) and advance. This
+ * ensures we stay aligned on character boundaries for
+ * multibyte encodings.
+ */
+ start = c;
+
+ NextChar(c, clen);
+ while (start < c)
+ *(b++) = *(start++);
+
+ afterescape = false;
}
subpat = buf;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 04e2f6df037..2f00fdb9b52 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2741,6 +2741,37 @@ SELECT U&'\0061\0308bc' LIKE U&'_\00e4bc' COLLATE ignore_accents;
-- escape character at end of pattern
SELECT 'foox' LIKE 'foo\' COLLATE ignore_accents;
ERROR: LIKE pattern must not end with escape character
+-- literal backslash with nondeterministic collation (bug #19474)
+SELECT 'back\slash' COLLATE ignore_accents LIKE 'back\slash%' ESCAPE '#';
+ ?column?
+----------
+ t
+(1 row)
+
+SELECT 'aäb' COLLATE ignore_accents LIKE 'a#äb' ESCAPE '#' AS multibyte_escape;
+ multibyte_escape
+------------------
+ t
+(1 row)
+
+SELECT 'a\äb' COLLATE ignore_accents LIKE 'a\äb%' ESCAPE '#' AS backslash_multibyte;
+ backslash_multibyte
+---------------------
+ t
+(1 row)
+
+SELECT 'a\b%c' COLLATE ignore_accents LIKE 'a#\b#%%c' ESCAPE '#' AS mixed_escapes;
+ mixed_escapes
+---------------
+ t
+(1 row)
+
+SELECT 'backslash' COLLATE ignore_accents LIKE 'back\\slash%';
+ ?column?
+----------
+ f
+(1 row)
+
-- foreign keys (mixing different nondeterministic collations not allowed)
CREATE TABLE test10pk (x text COLLATE case_sensitive PRIMARY KEY);
CREATE TABLE test10fk (x text COLLATE case_insensitive REFERENCES test10pk (x) ON UPDATE CASCADE ON DELETE CASCADE); -- error
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 18c47e6e05a..9f0ab98cf66 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -960,6 +960,13 @@ SELECT U&'\0061\0308bc' LIKE U&'_\00e4bc' COLLATE ignore_accents;
-- escape character at end of pattern
SELECT 'foox' LIKE 'foo\' COLLATE ignore_accents;
+-- literal backslash with nondeterministic collation (bug #19474)
+SELECT 'back\slash' COLLATE ignore_accents LIKE 'back\slash%' ESCAPE '#';
+SELECT 'aäb' COLLATE ignore_accents LIKE 'a#äb' ESCAPE '#' AS multibyte_escape;
+SELECT 'a\äb' COLLATE ignore_accents LIKE 'a\äb%' ESCAPE '#' AS backslash_multibyte;
+SELECT 'a\b%c' COLLATE ignore_accents LIKE 'a#\b#%%c' ESCAPE '#' AS mixed_escapes;
+SELECT 'backslash' COLLATE ignore_accents LIKE 'back\\slash%';
+
-- foreign keys (mixing different nondeterministic collations not allowed)
CREATE TABLE test10pk (x text COLLATE case_sensitive PRIMARY KEY);
CREATE TABLE test10fk (x text COLLATE case_insensitive REFERENCES test10pk (x) ON UPDATE CASCADE ON DELETE CASCADE); -- error
--
2.54.0.1032.g2f8565e1d1-goog
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations
2026-06-05 09:28 Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
2026-06-05 11:26 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Ewan Young <[email protected]>
2026-06-06 12:25 ` Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
@ 2026-06-08 07:59 ` Ewan Young <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Ewan Young @ 2026-06-08 07:59 UTC (permalink / raw)
To: Nitin Motiani <[email protected]>; +Cc: Zsolt Parragi <[email protected]>; [email protected]
Hi Nitin,
Thanks for v3.
I re-applied it on top of current master (4cb2a986) and re-ran the
regression suite: all 245 tests pass, and the collate.icu.utf8 set now
covers both directions of the bug, including the false-positive case
('backslash' LIKE 'back\\slash%' -> f).
LGTM, and changed commitfest status to "Ready for Commiter".
Regards,
Ewan Young
On Sat, Jun 6, 2026 at 8:25 PM Nitin Motiani <[email protected]> wrote:
>
> > I reviewed the v2 patch.
> >
>
> Thanks for the feedback.
>
> >
> > 1. The commit message describes the symptom as "an incorrect match
> > failure", but the bug also causes incorrect matches in the other
> > direction. Since the unescaping logic dropped the literal backslash
> > from the pattern, a text *without* a backslash could wrongly match a
> > pattern that requires one:
> >
> > SELECT 'backslash' COLLATE ignore_accents LIKE 'back\\slash%';
> > -- unpatched: t (wrong), patched: f (correct)
> >
> > I think it's worth mentioning this false-positive side of the bug in
> > the commit message, since silently-too-permissive LIKE filters are
> > arguably the more dangerous symptom for applications.
> >
>
> I have updated the commit message. I also added another test for this
> scenario in v3.
>
> > 2. A small typo in the new comment in like_match.c:
> > "occurences" should be "occurrences".
> >
>
> Fixed the typo.
>
> Thanks,
> Nitin Motiani
> Google
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-06-08 07:59 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-05 09:28 Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations Nitin Motiani <[email protected]>
2026-06-05 11:26 ` Ewan Young <[email protected]>
2026-06-06 12:25 ` Nitin Motiani <[email protected]>
2026-06-08 07:59 ` Ewan Young <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox