public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Latest patches break one of our unit-test, related to RLS
4+ messages / 3 participants
[nested] [flat]

* Re: Latest patches break one of our unit-test, related to RLS
@ 2025-09-13 09:11 Laurenz Albe <[email protected]>
  2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Laurenz Albe @ 2025-09-13 09:11 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general

On Fri, 2025-09-12 at 21:53 -0400, Tom Lane wrote:
> 
> Ah, got it.  But this logic definitely deserves more comments.
> What do you think of something like
> 
>             if (pchar == ']' && charclass_start > 2)
>             {
>                 /* found the real end of a bracket pair */
>                 charclass_depth--;
>                 /* past start of outer brackets, so keep charclass_start > 2 */
>             }
>             else if (pchar == '[')
>             {
>                 /* start of a nested bracket pair */
>                 charclass_depth++;
>                 /* leading ^ or ] in this context is not special */
>                 charclass_start = 3;
>             }
>             else if (pchar == '^')
>             {
>                 /* okay to increment charclass_start even if already > 1 */
>                 charclass_start++;
>             }
>             else
>             {
>                 /* otherwise (including case of leading ']') */
>                 charclass_start = 3;    /* definitely past the start */
>             }
> 
> > Perhaps s/charclass_depth/bracket_depth/ would be a good idea.
> 
> Wouldn't object to that.  Maybe we can think of a better name for
> "charclass_start" too --- that sounds like a boolean condition.
> The best I'm coming up with right now is "charclass_count", but
> that's not that helpful.

I came up with the attached patch set.

0001 is the actual bug fix: in addition to my previous patch, I fixed two
more cases in which a closing bracket might not be recognized as ending
the character class (one is from your patch above).  I think that these
could not lead to bad query results, but I am not certain.

0002 is the cosmetic improvement: I renamed "charclass_depth" to "bracket_depth"
and "bracket_depth" to "bracket_pos", rewrote the "else if" cascade as you
suggested above and put some love into additional comments.

I used two separate patches for clarity and ease of review, but both
should get backpatched.

Yours,
Laurenz Albe


Attachments:

  [text/x-patch] v2-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch (4.1K, 2-v2-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch)
  download | inline diff:
From dd37acb30b637c306f2ae7f041c164e892e92a22 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <[email protected]>
Date: Sat, 13 Sep 2025 09:59:47 +0200
Subject: [PATCH v2 1/2] Amend recent fix for SIMILAR TO regex conversion

Commit e3ffc3e91d fixed the translation of character classes in
SIMILAR TO regular expressions.  Unfortunately the fix broke a corner
case: if there is an escape character right after the opening bracket,
(for example in "[\q]") a closing bracket right after the escape
sequence would not be seen as closing the character class.

There were two more oversights: a backslash or a nested opening bracket
right at the beginning of a character class should remove the special
meaning from any following caret or closing bracket.

Author: Laurenz Albe <[email protected]>
Reported-By: Dominique Devienne <[email protected]>
Reported-By: Stephan Springl <[email protected]>
Discussion: https://postgr.es/m/41a37137-f8bb-8fc5-2948-31b528f166dc%40bfw-online.de
Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P%3DR3qhHyqU7hw0BTor3W0SvUcii24et%2BzAw%40mail.gmail.com
Backpatch-through: 13
---
 src/backend/utils/adt/regexp.c        | 14 ++++++++++++++
 src/test/regress/expected/strings.out |  9 +++++++++
 src/test/regress/sql/strings.sql      |  3 +++
 3 files changed, 26 insertions(+)

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index 6e2864cbbda..b62d67a5a98 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -948,6 +948,12 @@ similar_escape_internal(text *pat_text, text *esc_text)
 				 */
 				*r++ = '\\';
 				*r++ = pchar;
+				/*
+				 * If we encounter an escaped character in a character class,
+				 * we are no longer at the beginning.
+				 */
+				if (charclass_depth > 0)
+					charclass_start = 3;
 			}
 			afterescape = false;
 		}
@@ -959,7 +965,11 @@ similar_escape_internal(text *pat_text, text *esc_text)
 		else if (charclass_depth > 0)
 		{
 			if (pchar == '\\')
+			{
 				*r++ = '\\';
+				/* we are no longer at the beginning of a character class */
+				charclass_start = 3;
+			}
 			*r++ = pchar;
 
 			/*
@@ -971,7 +981,11 @@ similar_escape_internal(text *pat_text, text *esc_text)
 			if (pchar == ']' && charclass_start > 2)
 				charclass_depth--;
 			else if (pchar == '[')
+			{
 				charclass_depth++;
+				/* we are no longer at the beginning of a character class */
+				charclass_start = 3;
+			}
 
 			/*
 			 * If there is a caret right after the opening bracket, it negates
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index ba302da51e7..2d6cb02ad60 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -693,6 +693,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^';
    Filter: (f1 ~ '^(?:[^^]\^)$'::text)
 (2 rows)
 
+-- Closing square bracket after an escape sequence at the beginning of
+-- a character closes the character class
+EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|';
+              QUERY PLAN               
+---------------------------------------
+ Seq Scan on text_tbl
+   Filter: (f1 ~ '^(?:[\a].*)$'::text)
+(2 rows)
+
 -- Test backslash escapes in regexp_replace's replacement string
 SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3');
  regexp_replace 
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b94004cc08c..5ed421d6205 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -218,6 +218,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[]%][^]%][^%]%';
 -- Closing square bracket effective after two carets at the beginning
 -- of character class.
 EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^';
+-- Closing square bracket after an escape sequence at the beginning of
+-- a character closes the character class
+EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|';
 
 -- Test backslash escapes in regexp_replace's replacement string
 SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3');
-- 
2.51.0



  [text/x-patch] v2-0002-Cosmetic-improvements-for-SIMILAR-TO-regex-conver.patch (5.1K, 3-v2-0002-Cosmetic-improvements-for-SIMILAR-TO-regex-conver.patch)
  download | inline diff:
From 3c18418ccc3a8fe2e0276d229ff51d3f29e766a4 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <[email protected]>
Date: Sat, 13 Sep 2025 10:51:35 +0200
Subject: [PATCH v2 2/2] Cosmetic improvements for SIMILAR TO regex conversion

The bug introduced in e3ffc3e91d indicates that the code in that area
should be more readable.  This patch renames the variables
"charclass_depth" and "charclass_start" to something more meaningful,
adds and improves comments and rewrites an "if" cascade to be more
consistent.

This patch is purely cosmetic and should not change the logic.

Author: Laurenz Albe <[email protected]>
Discussion: https://postgr.es/m/flat/2673230.1757722323%40sss.pgh.pa.us#831fd545c0faeebbfb08714eb4cc307d
Backpatch-through: 13
---
 src/backend/utils/adt/regexp.c | 76 ++++++++++++++++++++--------------
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index b62d67a5a98..dcdb2e53f67 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -774,10 +774,8 @@ similar_escape_internal(text *pat_text, text *esc_text)
 				elen;
 	bool		afterescape = false;
 	int			nquotes = 0;
-	int			charclass_depth = 0;	/* Nesting level of character classes,
-										 * encompassed by square brackets */
-	int			charclass_start = 0;	/* State of the character class start,
-										 * for carets */
+	int			bracket_depth = 0;	/* square bracket nesting level */
+	int			charclass_pos = 0;	/* position inside a character class */
 
 	p = VARDATA_ANY(pat_text);
 	plen = VARSIZE_ANY_EXHDR(pat_text);
@@ -907,7 +905,7 @@ similar_escape_internal(text *pat_text, text *esc_text)
 		/* fast path */
 		if (afterescape)
 		{
-			if (pchar == '"' && charclass_depth < 1)	/* escape-double-quote? */
+			if (pchar == '"' && bracket_depth < 1)	/* escape-double-quote? */
 			{
 				/* emit appropriate part separator, per notes above */
 				if (nquotes == 0)
@@ -948,12 +946,13 @@ similar_escape_internal(text *pat_text, text *esc_text)
 				 */
 				*r++ = '\\';
 				*r++ = pchar;
+
 				/*
 				 * If we encounter an escaped character in a character class,
 				 * we are no longer at the beginning.
 				 */
-				if (charclass_depth > 0)
-					charclass_start = 3;
+				if (bracket_depth > 0)
+					charclass_pos = 3;
 			}
 			afterescape = false;
 		}
@@ -962,49 +961,64 @@ similar_escape_internal(text *pat_text, text *esc_text)
 			/* SQL escape character; do not send to output */
 			afterescape = true;
 		}
-		else if (charclass_depth > 0)
+		else if (bracket_depth > 0)
 		{
 			if (pchar == '\\')
 			{
+				/* if backslash is no SIMILAR TO escape character, double it */
 				*r++ = '\\';
 				/* we are no longer at the beginning of a character class */
-				charclass_start = 3;
+				charclass_pos = 3;
 			}
 			*r++ = pchar;
 
-			/*
-			 * Ignore a closing bracket at the start of a character class.
-			 * Such a bracket is taken literally rather than closing the
-			 * class.  "charclass_start" is 1 right at the beginning of a
-			 * class and 2 after an initial caret.
+			/*----------
+			 * "charclass_pos" describes where in a character class we are:
+			 *   0: not in a character class
+			 *   1: right after the opening '[' (a following '^' will negate
+			 *      the class, and ']' is a normal character)
+			 *   2: right after a '^' after the opening '[' (']' is a normal
+			 *      character)
+			 *   3 or more: further inside the character class
 			 */
-			if (pchar == ']' && charclass_start > 2)
-				charclass_depth--;
+			if (pchar == ']' && charclass_pos > 2)
+			{
+				/* found the real end of a bracket pair */
+				bracket_depth--;
+				/* don't reset charclass_pos, this may be an inner bracket */
+			}
 			else if (pchar == '[')
 			{
-				charclass_depth++;
+				/* start of a nested bracket pair */
+				bracket_depth++;
 				/* we are no longer at the beginning of a character class */
-				charclass_start = 3;
+				charclass_pos = 3;
+			}
+			else if (pchar == '^')
+			{
+				/*
+				 * A caret right after the opening bracket negates the
+				 * character class.  In that case, the following will
+				 * increment charclass_pos from 1 to 2, so that a following
+				 * ']' is still a normal character and does not close the
+				 * character class. If we are further inside a character
+				 * class, charclass_pos might get incremented past 3, which is
+				 * fine.
+				 */
+				charclass_pos++;
 			}
-
-			/*
-			 * If there is a caret right after the opening bracket, it negates
-			 * the character class, but a following closing bracket should
-			 * still be treated as a normal character.  That holds only for
-			 * the first caret, so only the values 1 and 2 mean that closing
-			 * brackets should be taken literally.
-			 */
-			if (pchar == '^')
-				charclass_start++;
 			else
-				charclass_start = 3;	/* definitely past the start */
+			{
+				/* a leading ']' or anything else take us past the beginning */
+				charclass_pos = 3;
+			}
 		}
 		else if (pchar == '[')
 		{
 			/* start of a character class */
 			*r++ = pchar;
-			charclass_depth++;
-			charclass_start = 1;
+			bracket_depth = 1;
+			charclass_pos = 1;
 		}
 		else if (pchar == '%')
 		{
-- 
2.51.0



^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Latest patches break one of our unit-test, related to RLS
  2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
@ 2025-09-13 21:00 ` Tom Lane <[email protected]>
  2025-09-13 22:38   ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
  2025-09-15 23:59   ` Re: Latest patches break one of our unit-test, related to RLS Michael Paquier <[email protected]>
  0 siblings, 2 replies; 4+ messages in thread

From: Tom Lane @ 2025-09-13 21:00 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general

Laurenz Albe <[email protected]> writes:
> I came up with the attached patch set.

I did some more work on the comments, adjusted a couple of places that
could be simplified, and pushed it.

> I used two separate patches for clarity and ease of review, but both
> should get backpatched.

I didn't really love the "fix it and then explain it afterward"
approach.  It's hard to review a patch if you don't understand the
logic.  I considered swapping the order of the two patches, but
eventually just merged them into one.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Latest patches break one of our unit-test, related to RLS
  2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
  2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
@ 2025-09-13 22:38   ` Laurenz Albe <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Laurenz Albe @ 2025-09-13 22:38 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general

On Sat, 2025-09-13 at 17:00 -0400, Tom Lane wrote:
> Laurenz Albe <[email protected]> writes:
> > I came up with the attached patch set.
> 
> I did some more work on the comments, adjusted a couple of places that
> could be simplified, and pushed it.

Thank you!

> > I used two separate patches for clarity and ease of review, but both
> > should get backpatched.
> 
> I didn't really love the "fix it and then explain it afterward"
> approach.  It's hard to review a patch if you don't understand the
> logic.  I considered swapping the order of the two patches, but
> eventually just merged them into one.

Yes, having the refactoring patch first might have been better.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Latest patches break one of our unit-test, related to RLS
  2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
  2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]>
@ 2025-09-15 23:59   ` Michael Paquier <[email protected]>
  1 sibling, 0 replies; 4+ messages in thread

From: Michael Paquier @ 2025-09-15 23:59 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Dominique Devienne <[email protected]>; pgsql-general

On Sat, Sep 13, 2025 at 05:00:20PM -0400, Tom Lane wrote:
> Laurenz Albe <[email protected]> writes:
> > I came up with the attached patch set.
> 
> I did some more work on the comments, adjusted a couple of places that
> could be simplified, and pushed it.

My apologies for the silence here.

The timing of the events is interesting.  I have been pinged about
this issue on Friday evening/night time here:
https://www.postgresql.org/message-id/[email protected]

For a fix applied as of cdf7feb96562 roughly 24 hours after this ping.
The timing was a bit bad for me, because I was already gone for what
was a long weekend in Japan, just back today.  So it was a bit hard
for me to look at anything

> I didn't really love the "fix it and then explain it afterward"
> approach.  It's hard to review a patch if you don't understand the
> logic.  I considered swapping the order of the two patches, but
> eventually just merged them into one.

Merging both things makes sense as well here.

Ugh, yes.  That was wrong.  With the fix, '%' is translated:
-   Filter: (f1 ~ '^(?:[\a].*)$'::text)
Before the fix, not translated.
+   Filter: (f1 ~ '^(?:[\a]%)$'::text)

Thanks for the report, the analysis, and the commit.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2025-09-15 23:59 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]>
2025-09-13 21:00 ` Tom Lane <[email protected]>
2025-09-13 22:38   ` Laurenz Albe <[email protected]>
2025-09-15 23:59   ` Michael Paquier <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox