public inbox for [email protected]help / color / mirror / Atom feed
Re: Latest patches break one of our unit-test, related to RLS 4+ messages / 3 participants [nested] [flat]
* Re: Latest patches break one of our unit-test, related to RLS @ 2025-09-13 09:11 Laurenz Albe <[email protected]> 2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Laurenz Albe @ 2025-09-13 09:11 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general On Fri, 2025-09-12 at 21:53 -0400, Tom Lane wrote: > > Ah, got it. But this logic definitely deserves more comments. > What do you think of something like > > if (pchar == ']' && charclass_start > 2) > { > /* found the real end of a bracket pair */ > charclass_depth--; > /* past start of outer brackets, so keep charclass_start > 2 */ > } > else if (pchar == '[') > { > /* start of a nested bracket pair */ > charclass_depth++; > /* leading ^ or ] in this context is not special */ > charclass_start = 3; > } > else if (pchar == '^') > { > /* okay to increment charclass_start even if already > 1 */ > charclass_start++; > } > else > { > /* otherwise (including case of leading ']') */ > charclass_start = 3; /* definitely past the start */ > } > > > Perhaps s/charclass_depth/bracket_depth/ would be a good idea. > > Wouldn't object to that. Maybe we can think of a better name for > "charclass_start" too --- that sounds like a boolean condition. > The best I'm coming up with right now is "charclass_count", but > that's not that helpful. I came up with the attached patch set. 0001 is the actual bug fix: in addition to my previous patch, I fixed two more cases in which a closing bracket might not be recognized as ending the character class (one is from your patch above). I think that these could not lead to bad query results, but I am not certain. 0002 is the cosmetic improvement: I renamed "charclass_depth" to "bracket_depth" and "bracket_depth" to "bracket_pos", rewrote the "else if" cascade as you suggested above and put some love into additional comments. I used two separate patches for clarity and ease of review, but both should get backpatched. Yours, Laurenz Albe Attachments: [text/x-patch] v2-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch (4.1K, 2-v2-0001-Amend-recent-fix-for-SIMILAR-TO-regex-conversion.patch) download | inline diff: From dd37acb30b637c306f2ae7f041c164e892e92a22 Mon Sep 17 00:00:00 2001 From: Laurenz Albe <[email protected]> Date: Sat, 13 Sep 2025 09:59:47 +0200 Subject: [PATCH v2 1/2] Amend recent fix for SIMILAR TO regex conversion Commit e3ffc3e91d fixed the translation of character classes in SIMILAR TO regular expressions. Unfortunately the fix broke a corner case: if there is an escape character right after the opening bracket, (for example in "[\q]") a closing bracket right after the escape sequence would not be seen as closing the character class. There were two more oversights: a backslash or a nested opening bracket right at the beginning of a character class should remove the special meaning from any following caret or closing bracket. Author: Laurenz Albe <[email protected]> Reported-By: Dominique Devienne <[email protected]> Reported-By: Stephan Springl <[email protected]> Discussion: https://postgr.es/m/41a37137-f8bb-8fc5-2948-31b528f166dc%40bfw-online.de Discussion: https://postgr.es/m/CAFCRh-8NwJd0jq6P%3DR3qhHyqU7hw0BTor3W0SvUcii24et%2BzAw%40mail.gmail.com Backpatch-through: 13 --- src/backend/utils/adt/regexp.c | 14 ++++++++++++++ src/test/regress/expected/strings.out | 9 +++++++++ src/test/regress/sql/strings.sql | 3 +++ 3 files changed, 26 insertions(+) diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index 6e2864cbbda..b62d67a5a98 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -948,6 +948,12 @@ similar_escape_internal(text *pat_text, text *esc_text) */ *r++ = '\\'; *r++ = pchar; + /* + * If we encounter an escaped character in a character class, + * we are no longer at the beginning. + */ + if (charclass_depth > 0) + charclass_start = 3; } afterescape = false; } @@ -959,7 +965,11 @@ similar_escape_internal(text *pat_text, text *esc_text) else if (charclass_depth > 0) { if (pchar == '\\') + { *r++ = '\\'; + /* we are no longer at the beginning of a character class */ + charclass_start = 3; + } *r++ = pchar; /* @@ -971,7 +981,11 @@ similar_escape_internal(text *pat_text, text *esc_text) if (pchar == ']' && charclass_start > 2) charclass_depth--; else if (pchar == '[') + { charclass_depth++; + /* we are no longer at the beginning of a character class */ + charclass_start = 3; + } /* * If there is a caret right after the opening bracket, it negates diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out index ba302da51e7..2d6cb02ad60 100644 --- a/src/test/regress/expected/strings.out +++ b/src/test/regress/expected/strings.out @@ -693,6 +693,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^'; Filter: (f1 ~ '^(?:[^^]\^)$'::text) (2 rows) +-- Closing square bracket after an escape sequence at the beginning of +-- a character closes the character class +EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|'; + QUERY PLAN +--------------------------------------- + Seq Scan on text_tbl + Filter: (f1 ~ '^(?:[\a].*)$'::text) +(2 rows) + -- Test backslash escapes in regexp_replace's replacement string SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3'); regexp_replace diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql index b94004cc08c..5ed421d6205 100644 --- a/src/test/regress/sql/strings.sql +++ b/src/test/regress/sql/strings.sql @@ -218,6 +218,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[]%][^]%][^%]%'; -- Closing square bracket effective after two carets at the beginning -- of character class. EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^'; +-- Closing square bracket after an escape sequence at the beginning of +-- a character closes the character class +EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|'; -- Test backslash escapes in regexp_replace's replacement string SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3'); -- 2.51.0 [text/x-patch] v2-0002-Cosmetic-improvements-for-SIMILAR-TO-regex-conver.patch (5.1K, 3-v2-0002-Cosmetic-improvements-for-SIMILAR-TO-regex-conver.patch) download | inline diff: From 3c18418ccc3a8fe2e0276d229ff51d3f29e766a4 Mon Sep 17 00:00:00 2001 From: Laurenz Albe <[email protected]> Date: Sat, 13 Sep 2025 10:51:35 +0200 Subject: [PATCH v2 2/2] Cosmetic improvements for SIMILAR TO regex conversion The bug introduced in e3ffc3e91d indicates that the code in that area should be more readable. This patch renames the variables "charclass_depth" and "charclass_start" to something more meaningful, adds and improves comments and rewrites an "if" cascade to be more consistent. This patch is purely cosmetic and should not change the logic. Author: Laurenz Albe <[email protected]> Discussion: https://postgr.es/m/flat/2673230.1757722323%40sss.pgh.pa.us#831fd545c0faeebbfb08714eb4cc307d Backpatch-through: 13 --- src/backend/utils/adt/regexp.c | 76 ++++++++++++++++++++-------------- 1 file changed, 45 insertions(+), 31 deletions(-) diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index b62d67a5a98..dcdb2e53f67 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -774,10 +774,8 @@ similar_escape_internal(text *pat_text, text *esc_text) elen; bool afterescape = false; int nquotes = 0; - int charclass_depth = 0; /* Nesting level of character classes, - * encompassed by square brackets */ - int charclass_start = 0; /* State of the character class start, - * for carets */ + int bracket_depth = 0; /* square bracket nesting level */ + int charclass_pos = 0; /* position inside a character class */ p = VARDATA_ANY(pat_text); plen = VARSIZE_ANY_EXHDR(pat_text); @@ -907,7 +905,7 @@ similar_escape_internal(text *pat_text, text *esc_text) /* fast path */ if (afterescape) { - if (pchar == '"' && charclass_depth < 1) /* escape-double-quote? */ + if (pchar == '"' && bracket_depth < 1) /* escape-double-quote? */ { /* emit appropriate part separator, per notes above */ if (nquotes == 0) @@ -948,12 +946,13 @@ similar_escape_internal(text *pat_text, text *esc_text) */ *r++ = '\\'; *r++ = pchar; + /* * If we encounter an escaped character in a character class, * we are no longer at the beginning. */ - if (charclass_depth > 0) - charclass_start = 3; + if (bracket_depth > 0) + charclass_pos = 3; } afterescape = false; } @@ -962,49 +961,64 @@ similar_escape_internal(text *pat_text, text *esc_text) /* SQL escape character; do not send to output */ afterescape = true; } - else if (charclass_depth > 0) + else if (bracket_depth > 0) { if (pchar == '\\') { + /* if backslash is no SIMILAR TO escape character, double it */ *r++ = '\\'; /* we are no longer at the beginning of a character class */ - charclass_start = 3; + charclass_pos = 3; } *r++ = pchar; - /* - * Ignore a closing bracket at the start of a character class. - * Such a bracket is taken literally rather than closing the - * class. "charclass_start" is 1 right at the beginning of a - * class and 2 after an initial caret. + /*---------- + * "charclass_pos" describes where in a character class we are: + * 0: not in a character class + * 1: right after the opening '[' (a following '^' will negate + * the class, and ']' is a normal character) + * 2: right after a '^' after the opening '[' (']' is a normal + * character) + * 3 or more: further inside the character class */ - if (pchar == ']' && charclass_start > 2) - charclass_depth--; + if (pchar == ']' && charclass_pos > 2) + { + /* found the real end of a bracket pair */ + bracket_depth--; + /* don't reset charclass_pos, this may be an inner bracket */ + } else if (pchar == '[') { - charclass_depth++; + /* start of a nested bracket pair */ + bracket_depth++; /* we are no longer at the beginning of a character class */ - charclass_start = 3; + charclass_pos = 3; + } + else if (pchar == '^') + { + /* + * A caret right after the opening bracket negates the + * character class. In that case, the following will + * increment charclass_pos from 1 to 2, so that a following + * ']' is still a normal character and does not close the + * character class. If we are further inside a character + * class, charclass_pos might get incremented past 3, which is + * fine. + */ + charclass_pos++; } - - /* - * If there is a caret right after the opening bracket, it negates - * the character class, but a following closing bracket should - * still be treated as a normal character. That holds only for - * the first caret, so only the values 1 and 2 mean that closing - * brackets should be taken literally. - */ - if (pchar == '^') - charclass_start++; else - charclass_start = 3; /* definitely past the start */ + { + /* a leading ']' or anything else take us past the beginning */ + charclass_pos = 3; + } } else if (pchar == '[') { /* start of a character class */ *r++ = pchar; - charclass_depth++; - charclass_start = 1; + bracket_depth = 1; + charclass_pos = 1; } else if (pchar == '%') { -- 2.51.0 ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS 2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]> @ 2025-09-13 21:00 ` Tom Lane <[email protected]> 2025-09-13 22:38 ` Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]> 2025-09-15 23:59 ` Re: Latest patches break one of our unit-test, related to RLS Michael Paquier <[email protected]> 0 siblings, 2 replies; 4+ messages in thread From: Tom Lane @ 2025-09-13 21:00 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general Laurenz Albe <[email protected]> writes: > I came up with the attached patch set. I did some more work on the comments, adjusted a couple of places that could be simplified, and pushed it. > I used two separate patches for clarity and ease of review, but both > should get backpatched. I didn't really love the "fix it and then explain it afterward" approach. It's hard to review a patch if you don't understand the logic. I considered swapping the order of the two patches, but eventually just merged them into one. regards, tom lane ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS 2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]> 2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]> @ 2025-09-13 22:38 ` Laurenz Albe <[email protected]> 1 sibling, 0 replies; 4+ messages in thread From: Laurenz Albe @ 2025-09-13 22:38 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Dominique Devienne <[email protected]>; pgsql-general On Sat, 2025-09-13 at 17:00 -0400, Tom Lane wrote: > Laurenz Albe <[email protected]> writes: > > I came up with the attached patch set. > > I did some more work on the comments, adjusted a couple of places that > could be simplified, and pushed it. Thank you! > > I used two separate patches for clarity and ease of review, but both > > should get backpatched. > > I didn't really love the "fix it and then explain it afterward" > approach. It's hard to review a patch if you don't understand the > logic. I considered swapping the order of the two patches, but > eventually just merged them into one. Yes, having the refactoring patch first might have been better. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Latest patches break one of our unit-test, related to RLS 2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]> 2025-09-13 21:00 ` Re: Latest patches break one of our unit-test, related to RLS Tom Lane <[email protected]> @ 2025-09-15 23:59 ` Michael Paquier <[email protected]> 1 sibling, 0 replies; 4+ messages in thread From: Michael Paquier @ 2025-09-15 23:59 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Dominique Devienne <[email protected]>; pgsql-general On Sat, Sep 13, 2025 at 05:00:20PM -0400, Tom Lane wrote: > Laurenz Albe <[email protected]> writes: > > I came up with the attached patch set. > > I did some more work on the comments, adjusted a couple of places that > could be simplified, and pushed it. My apologies for the silence here. The timing of the events is interesting. I have been pinged about this issue on Friday evening/night time here: https://www.postgresql.org/message-id/[email protected] For a fix applied as of cdf7feb96562 roughly 24 hours after this ping. The timing was a bit bad for me, because I was already gone for what was a long weekend in Japan, just back today. So it was a bit hard for me to look at anything > I didn't really love the "fix it and then explain it afterward" > approach. It's hard to review a patch if you don't understand the > logic. I considered swapping the order of the two patches, but > eventually just merged them into one. Merging both things makes sense as well here. Ugh, yes. That was wrong. With the fix, '%' is translated: - Filter: (f1 ~ '^(?:[\a].*)$'::text) Before the fix, not translated. + Filter: (f1 ~ '^(?:[\a]%)$'::text) Thanks for the report, the analysis, and the commit. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2025-09-15 23:59 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-09-13 09:11 Re: Latest patches break one of our unit-test, related to RLS Laurenz Albe <[email protected]> 2025-09-13 21:00 ` Tom Lane <[email protected]> 2025-09-13 22:38 ` Laurenz Albe <[email protected]> 2025-09-15 23:59 ` Michael Paquier <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox