public inbox for [email protected]
help / color / mirror / Atom feedRe: Latest patches break one of our unit-test, related to RLS
9+ messages / 5 participants
[nested] [flat]
* Re: Latest patches break one of our unit-test, related to RLS
@ 2025-09-12 13:34 Dominique Devienne <[email protected]>
2025-09-12 13:53 ` Re: Latest patches break one of our unit-test, related to RLS jian he <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
0 siblings, 2 replies; 9+ messages in thread
From: Dominique Devienne @ 2025-09-12 13:34 UTC (permalink / raw)
To: Laurenz Albe <[email protected]>; +Cc: pgsql-general
On Fri, Sep 12, 2025 at 3:29 PM Dominique Devienne <[email protected]> wrote:
> On Fri, Sep 12, 2025 at 3:24 PM Dominique Devienne <[email protected]> wrote:
> > On Fri, Sep 12, 2025 at 3:11 PM Dominique Devienne <[email protected]> wrote:
>
> > So I don't see how my `... where v similar to 'foo[\d\w]_%'` is incorrect.
> > So again, is this a bug / regression or not? Thanks, --DD
>
> If I use (x|y) instead of [xy] it seems to behave correctly.
> Whether x is the full-length POSIX class, or the shorthand notation.
> This DOES look like a bug, no? I've done regexes for a long time,
> and these two forms should be equivalent IMHO. --DD
>
> postgres=# show server_version;
> server_version
> ----------------
> 18rc1
> (1 row)
>
>
> postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> select v from t where v similar to 'foo[\d\w]_%';
> v
> ---
> (0 rows)
>
>
> postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> select v from t where v similar to 'foo[[[:digit:]][[:word:]]]_%';
> v
> ---
> (0 rows)
>
>
> postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> select v from t where v similar to 'foo([[:digit:]]|[[:word:]])_%';
> v
> ---------
> foo0bar
> (1 row)
>
>
> postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> select v from t where v similar to 'foo(\d|\w)_%';
> v
> ---------
> foo0bar
> (1 row)
For completeness:
postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
select v from t where v similar to 'foo[0-9a-zA-Z]_%';
v
---------
foo0bar
(1 row)
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
@ 2025-09-12 13:53 ` jian he <[email protected]>
2025-09-12 13:57 ` Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
1 sibling, 1 reply; 9+ messages in thread
From: jian he @ 2025-09-12 13:53 UTC (permalink / raw)
To: Dominique Devienne <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general
On Fri, Sep 12, 2025 at 9:35 PM Dominique Devienne <[email protected]> wrote:
>
> On Fri, Sep 12, 2025 at 3:29 PM Dominique Devienne <[email protected]> wrote:
> > On Fri, Sep 12, 2025 at 3:24 PM Dominique Devienne <[email protected]> wrote:
> > > On Fri, Sep 12, 2025 at 3:11 PM Dominique Devienne <[email protected]> wrote:
> >
> > > So I don't see how my `... where v similar to 'foo[\d\w]_%'` is incorrect.
> > > So again, is this a bug / regression or not? Thanks, --DD
> >
> > If I use (x|y) instead of [xy] it seems to behave correctly.
> > Whether x is the full-length POSIX class, or the shorthand notation.
> > This DOES look like a bug, no? I've done regexes for a long time,
> > and these two forms should be equivalent IMHO. --DD
> >
> > postgres=# show server_version;
> > server_version
> > ----------------
> > 18rc1
> > (1 row)
> >
> >
> > postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> > select v from t where v similar to 'foo[\d\w]_%';
> > v
> > ---
> > (0 rows)
> >
> >
> > postgres=# with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> > select v from t where v similar to 'foo[[[:digit:]][[:word:]]]_%';
> > v
> > ---
> > (0 rows)
> >
The above two examples are the same, per
(Table 9.21. Regular Expression Class-Shorthand Escapes)
https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP
my guess why 'foo0bar' will fail for 'foo[[[:digit:]][[:word:]]]_%';
1. process character 0, it does meet [[:digits]] character class.
2. process character b, it does not meet [[:digits]], then fails,
it won't check again whether character b is satisfied with [[:word:]] or not.
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 13:53 ` Re: Latest patches break one of our unit-test, related to RLS jian he <[email protected]>
@ 2025-09-12 13:57 ` Dominique Devienne <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: Dominique Devienne @ 2025-09-12 13:57 UTC (permalink / raw)
To: jian he <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general
On Fri, Sep 12, 2025 at 3:54 PM jian he <[email protected]> wrote:
> > > select v from t where v similar to 'foo[\d\w]_%';
> > > select v from t where v similar to 'foo[[[:digit:]][[:word:]]]_%';
> The above two examples are the same, per
> (Table 9.21. Regular Expression Class-Shorthand Escapes)
Of course.
> my guess why 'foo0bar' will fail for 'foo[[[:digit:]][[:word:]]]_%';
> 1. process character 0, it does meet [[:digits]] character class.
> 2. process character b, it does not meet [[:digits]], then fails,
> it won't check again whether character b is satisfied with [[:word:]] or not.
Then you don't know what [...] means in regexes I'm afraid. --DD
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
@ 2025-09-12 14:07 ` Tom Lane <[email protected]>
2025-09-12 16:32 ` Re: Latest patches break one of our unit-test, related to RLS Álvaro Herrera <[email protected]>
2025-09-12 23:34 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
1 sibling, 2 replies; 9+ messages in thread
From: Tom Lane @ 2025-09-12 14:07 UTC (permalink / raw)
To: Dominique Devienne <[email protected]>; +Cc: Laurenz Albe <[email protected]>; pgsql-general
Dominique Devienne <[email protected]> writes:
>> This DOES look like a bug, no? I've done regexes for a long time,
>> and these two forms should be equivalent IMHO. --DD
Yeah, I agree it's busted. You can use EXPLAIN VERBOSE to see the
translated-to-POSIX pattern, and it's wrong:
regression=# explain verbose with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
select v from t where v similar to 'foo[\d\w]_%';
QUERY PLAN
--------------------------------------------------------------
Values Scan on "*VALUES*" (cost=0.00..0.05 rows=1 width=32)
Output: "*VALUES*".column1
Filter: ("*VALUES*".column1 ~ '^(?:foo[\d\w]_%)$'::text)
(3 rows)
The _ and % are not getting converted to their POSIX equivalents
("." and ".*"). Your other example still does that correctly:
regression=# explain verbose with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
select v from t where v similar to 'foo[0-9a-zA-Z]_%';
QUERY PLAN
------------------------------------------------------------------
Values Scan on "*VALUES*" (cost=0.00..0.05 rows=1 width=32)
Output: "*VALUES*".column1
Filter: ("*VALUES*".column1 ~ '^(?:foo[0-9a-zA-Z]..*)$'::text)
(3 rows)
So e3ffc3e91 was at least one brick shy of a load.
regards, tom lane
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
@ 2025-09-12 16:32 ` Álvaro Herrera <[email protected]>
1 sibling, 0 replies; 9+ messages in thread
From: Álvaro Herrera @ 2025-09-12 16:32 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; Laurenz Albe <[email protected]>; pgsql-general; Michael Paquier <[email protected]>
On 2025-Sep-12, Tom Lane wrote:
> Dominique Devienne <[email protected]> writes:
> >> This DOES look like a bug, no? I've done regexes for a long time,
> >> and these two forms should be equivalent IMHO. --DD
>
> Yeah, I agree it's busted. You can use EXPLAIN VERBOSE to see the
> translated-to-POSIX pattern, and it's wrong:
And now we also have
https://postgr.es/m/[email protected]
CC'ed the committer here.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
@ 2025-09-12 23:34 ` Laurenz Albe <[email protected]>
2025-09-13 00:12 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
1 sibling, 1 reply; 9+ messages in thread
From: Laurenz Albe @ 2025-09-12 23:34 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Dominique Devienne <[email protected]>; +Cc: pgsql-general
On Fri, 2025-09-12 at 10:07 -0400, Tom Lane wrote:
> Dominique Devienne <[email protected]> writes:
> > > This DOES look like a bug, no? I've done regexes for a long time,
> > > and these two forms should be equivalent IMHO. --DD
>
> Yeah, I agree it's busted. You can use EXPLAIN VERBOSE to see the
> translated-to-POSIX pattern, and it's wrong:
>
> regression=# explain verbose with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar'))
> select v from t where v similar to 'foo[\d\w]_%';
> QUERY PLAN
> --------------------------------------------------------------
> Values Scan on "*VALUES*" (cost=0.00..0.05 rows=1 width=32)
> Output: "*VALUES*".column1
> Filter: ("*VALUES*".column1 ~ '^(?:foo[\d\w]_%)$'::text)
> (3 rows)
>
> The _ and % are not getting converted to their POSIX equivalents
> ("." and ".*").
Indeed, and I have to take the blame for introducing a bug in a minor
release :^(
The attached patch should fix the problem.
Yours,
Laurenz Albe
Attachments:
[text/x-patch] v1-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch (3.1K, 2-v1-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch)
download | inline diff:
From 8d0f8ebac6c42fe7da36ec8c30ee091d20270068 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <[email protected]>
Date: Sat, 13 Sep 2025 01:32:55 +0200
Subject: [PATCH v1] Amend recent fix for SIMILAR TO regex conversion
Commit e3ffc3e91d fixed the translation of character classes in
SIMILAR TO regular expressions. Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket,
(for example in "[\q]") a closing bracket right after the escape
sequence would not be seen as closing the character class.
Author: Laurenz Albe <[email protected]>
Reported-By: Dominique Devienne <[email protected]>
Reported-By: Stephan Springl <[email protected]>
Backpatch-through: 13
---
src/backend/utils/adt/regexp.c | 6 ++++++
src/test/regress/expected/strings.out | 9 +++++++++
src/test/regress/sql/strings.sql | 3 +++
3 files changed, 18 insertions(+)
diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 6e2864cbbda..29692fb1a9d 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -948,6 +948,12 @@ similar_escape_internal(text *pat_text, text *esc_text)
*/
*r++ = '\\';
*r++ = pchar;
+ /*
+ * If we encounter an escaped character in a character class,
+ * we must be past an initial ^ or ].
+ */
+ if (charclass_depth > 0)
+ charclass_start = 3;
}
afterescape = false;
}
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index ba302da51e7..2d6cb02ad60 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -693,6 +693,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^';
Filter: (f1 ~ '^(?:[^^]\^)$'::text)
(2 rows)
+-- Closing square bracket after an escape sequence at the beginning of
+-- a character closes the character class
+EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|';
+ QUERY PLAN
+---------------------------------------
+ Seq Scan on text_tbl
+ Filter: (f1 ~ '^(?:[\a].*)$'::text)
+(2 rows)
+
-- Test backslash escapes in regexp_replace's replacement string
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3');
regexp_replace
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b94004cc08c..5ed421d6205 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -218,6 +218,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[]%][^]%][^%]%';
-- Closing square bracket effective after two carets at the beginning
-- of character class.
EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^';
+-- Closing square bracket after an escape sequence at the beginning of
+-- a character closes the character class
+EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|';
-- Test backslash escapes in regexp_replace's replacement string
SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3');
--
2.51.0
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
2025-09-12 23:34 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
@ 2025-09-13 00:12 ` Tom Lane <[email protected]>
2025-09-13 01:17 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: Tom Lane @ 2025-09-13 00:12 UTC (permalink / raw)
To: Laurenz Albe <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general
Laurenz Albe <[email protected]> writes:
> On Fri, 2025-09-12 at 10:07 -0400, Tom Lane wrote:
>> The _ and % are not getting converted to their POSIX equivalents
>> ("." and ".*").
> Indeed, and I have to take the blame for introducing a bug in a minor
> release :^(
> The attached patch should fix the problem.
I had not particularly studied the new charclass-parsing logic.
Looking at it now, this bit further down (lines 871ff) looks
fishy:
if (pchar == ']' && charclass_start > 2)
charclass_depth--;
else if (pchar == '[')
charclass_depth++;
/*
* If there is a caret right after the opening bracket, it negates
* the character class, but a following closing bracket should
* still be treated as a normal character. That holds only for
* the first caret, so only the values 1 and 2 mean that closing
* brackets should be taken literally.
*/
if (pchar == '^')
charclass_start++;
else
charclass_start = 3; /* definitely past the start */
Should not we be setting charclass_start to 1 after incrementing
charclass_depth? That is, I'd be more comfortable if this logic
looked like
if (pchar == ']' && charclass_start > 2)
charclass_depth--;
else if (pchar == '[')
{
/* start of a nested character class */
charclass_depth++;
charclass_start = 1;
}
else if (pchar == '^')
charclass_start++;
else
charclass_start = 3; /* definitely past the start */
I haven't experimented, but it looks like this might misprocess
^ or ] at the start of a nested character class.
regards, tom lane
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
2025-09-12 23:34 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
2025-09-13 00:12 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
@ 2025-09-13 01:17 ` Laurenz Albe <[email protected]>
2025-09-13 01:53 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: Laurenz Albe @ 2025-09-13 01:17 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general
On Fri, 2025-09-12 at 20:12 -0400, Tom Lane wrote:
> I had not particularly studied the new charclass-parsing logic.
> Looking at it now, this bit further down (lines 871ff) looks
> fishy:
>
> if (pchar == ']' && charclass_start > 2)
> charclass_depth--;
> else if (pchar == '[')
> charclass_depth++;
>
> /*
> * If there is a caret right after the opening bracket, it negates
> * the character class, but a following closing bracket should
> * still be treated as a normal character. That holds only for
> * the first caret, so only the values 1 and 2 mean that closing
> * brackets should be taken literally.
> */
> if (pchar == '^')
> charclass_start++;
> else
> charclass_start = 3; /* definitely past the start */
>
> Should not we be setting charclass_start to 1 after incrementing
> charclass_depth?
What I call "charclass depth" is misleading, I am afraid.
Really, it should be "bracket depth". Only the outermost pair of brackets
starts an actual character class. Example:
[]abc[:digit:]]
A caret or closing bracket right after the inner opening bracket wouldn't
be a special character, and I think it would never be legal.
Unfortunately, this is all pretty complicated.
Perhaps s/charclass_depth/bracket_depth/ would be a good idea.
Yours,
Laurenz Albe
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
2025-09-12 23:34 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
2025-09-13 00:12 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
2025-09-13 01:17 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
@ 2025-09-13 01:53 ` Tom Lane <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: Tom Lane @ 2025-09-13 01:53 UTC (permalink / raw)
To: Laurenz Albe <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general
Laurenz Albe <[email protected]> writes:
> On Fri, 2025-09-12 at 20:12 -0400, Tom Lane wrote:
>> Should not we be setting charclass_start to 1 after incrementing
>> charclass_depth?
> What I call "charclass depth" is misleading, I am afraid.
> Really, it should be "bracket depth". Only the outermost pair of brackets
> starts an actual character class. Example:
> []abc[:digit:]]
> A caret or closing bracket right after the inner opening bracket wouldn't
> be a special character, and I think it would never be legal.
Ah, got it. But this logic definitely deserves more comments.
What do you think of something like
if (pchar == ']' && charclass_start > 2)
{
/* found the real end of a bracket pair */
charclass_depth--;
/* past start of outer brackets, so keep charclass_start > 2 */
}
else if (pchar == '[')
{
/* start of a nested bracket pair */
charclass_depth++;
/* leading ^ or ] in this context is not special */
charclass_start = 3;
}
else if (pchar == '^')
{
/* okay to increment charclass_start even if already > 1 */
charclass_start++;
}
else
{
/* otherwise (including case of leading ']') */
charclass_start = 3; /* definitely past the start */
}
> Perhaps s/charclass_depth/bracket_depth/ would be a good idea.
Wouldn't object to that. Maybe we can think of a better name for
"charclass_start" too --- that sounds like a boolean condition.
The best I'm coming up with right now is "charclass_count", but
that's not that helpful.
regards, tom lane
^ permalink raw reply [nested|flat] 9+ messages in thread
end of thread, other threads:[~2025-09-13 01:53 UTC | newest]
Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-12 13:34 Re: Latest patches break one of our unit-test, related to RLS Dominique Devienne <[email protected]>
2025-09-12 13:53 ` jian he <[email protected]>
2025-09-12 13:57 ` Dominique Devienne <[email protected]>
2025-09-12 14:07 ` Tom Lane <[email protected]>
2025-09-12 16:32 ` Álvaro Herrera <[email protected]>
2025-09-12 23:34 ` Laurenz Albe <[email protected]>
2025-09-13 00:12 ` Tom Lane <[email protected]>
2025-09-13 01:17 ` Laurenz Albe <[email protected]>
2025-09-13 01:53 ` Tom Lane <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox