From 160b6367df968c3247ad8a35a26175b01acd8a9d Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii Date: Sun, 15 Feb 2026 17:47:49 +0900 Subject: [PATCH v43 6/8] Row pattern recognition patch (docs). --- doc/src/sgml/advanced.sgml | 143 ++++++++++++++++++++++++++++- doc/src/sgml/func/func-window.sgml | 53 +++++++++++ doc/src/sgml/ref/select.sgml | 88 +++++++++++++++++- 3 files changed, 277 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 451bcb202ec..a76fb263a94 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -540,13 +540,148 @@ WHERE pos < 3; two rows for each department). + + Row pattern common syntax can be used to perform row pattern recognition + in a query. The row pattern common syntax includes two sub + clauses: DEFINE + and PATTERN. DEFINE defines + definition variables along with an expression. The expression must be a + logical expression, which means it must + return TRUE, FALSE + or NULL. The expression may comprise column references + and functions. Window functions, aggregate functions and subqueries are + not allowed. An example of DEFINE is as follows. + + +DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) + + + Note that PREV returns the price + column in the previous row if it's called in a context of row pattern + recognition. Thus in the second line the definition variable "UP" + is TRUE when the price column in the current row is + greater than the price column in the previous row. Likewise, "DOWN" + is TRUE when the + price column in the current row is lower than + the price column in the previous row. + + + Once DEFINE exists, PATTERN can be + used. PATTERN defines a sequence of rows that satisfies + conditions defined in the DEFINE clause. For example + following PATTERN defines a sequence of rows starting + with the a row satisfying "LOWPRICE", then one or more rows satisfying + "UP" and finally one or more rows satisfying "DOWN". Pattern variables + can be followed by quantifiers: "+" means one or more matches, + "*" means zero or more matches, "?" means zero or one match (optional), + "{n}" means exactly n matches, "{n,}" means at least n matches, + "{,m}" means at most m matches, and "{n,m}" means between n and m matches. + Patterns can be grouped using parentheses and combined using alternation + (the vertical bar "|" for OR). For example, "(UP DOWN)+" matches one or + more repetitions of UP followed by DOWN. + If a sequence of rows which satisfies the PATTERN is found, in + the starting row all columns or functions are shown in the target + list. Note that aggregations only look into the matched rows, rather than + the whole frame. On the second or subsequent rows all window functions are + shown as NULL. Aggregates are NULL or 0 depending on its aggregation + definition. A count() aggregate shows 0. For rows that do not match on the + PATTERN, columns are shown AS NULL too. Example of + a SELECT using the DEFINE + and PATTERN clause is as follows. + + +SELECT company, tdate, price, + first_value(price) OVER w, + max(price) OVER w, + count(price) OVER w +FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + + + company | tdate | price | first_value | max | count +----------+------------+-------+-------------+-----+------- + company1 | 2023-07-01 | 100 | 100 | 200 | 4 + company1 | 2023-07-02 | 200 | | | 0 + company1 | 2023-07-03 | 150 | | | 0 + company1 | 2023-07-04 | 140 | | | 0 + company1 | 2023-07-05 | 150 | | | 0 + company1 | 2023-07-06 | 90 | 90 | 130 | 4 + company1 | 2023-07-07 | 110 | | | 0 + company1 | 2023-07-08 | 130 | | | 0 + company1 | 2023-07-09 | 120 | | | 0 + company1 | 2023-07-10 | 130 | | | 0 +(10 rows) + + + + + Row pattern recognition internally uses a nondeterministic finite + automaton (NFA) to match patterns. For patterns with unbounded + quantifiers (e.g., A+ or (A B)+), + the NFA may need to track many active matching contexts simultaneously, + which could potentially lead to O(n2) + complexity as the number of rows increases. + + + + Before execution, PostgreSQL automatically + optimizes patterns to simplify their structure. This includes flattening + nested sequences and alternations, merging consecutive identical variables + (e.g., A{2,3} A{1,2} becomes A{3,5}), + removing duplicate alternatives + (e.g., (A | B | A) becomes (A | B)), + and simplifying nested quantifiers + (e.g., (A*)* becomes A*). + These optimizations reduce pattern complexity and also decrease + nesting depth, making the 253-level depth limit rarely encountered. + They are applied transparently and can be observed + in EXPLAIN output. + + + + To mitigate this, PostgreSQL employs + a context absorption optimization. When a pattern starts with a greedy + unbounded element, newer matching contexts cannot produce longer matches + than older contexts. By detecting and eliminating these redundant + contexts, the matching complexity is reduced from + O(n2) to O(n) for many common patterns. + + + + When examining query plans for row pattern recognition with + EXPLAIN, the pattern output may include special + markers that indicate optimization opportunities. A double quote + " marks where pattern absorption can occur, + and a single quote ' marks absorbable elements + within a branch. For example, a+" indicates that + repeated matches of a can be absorbed, while + (a' b')+" shows that both a + and b within the group are absorbable. + These markers are primarily useful for understanding internal + optimization behavior. + + When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is - duplicative and error-prone if the same windowing behavior is wanted - for several functions. Instead, each windowing behavior can be named - in a WINDOW clause and then referenced in OVER. - For example: + duplicative and error-prone if the same windowing behavior is wanted for + several functions. Instead, each windowing behavior can be named in + a WINDOW clause and then referenced + in OVER. For example: SELECT sum(salary) OVER w, avg(salary) OVER w diff --git a/doc/src/sgml/func/func-window.sgml b/doc/src/sgml/func/func-window.sgml index bcf755c9ebc..ae36e0f3135 100644 --- a/doc/src/sgml/func/func-window.sgml +++ b/doc/src/sgml/func/func-window.sgml @@ -278,6 +278,59 @@ nth_value. + + Row pattern recognition navigation functions are listed in + . These functions + can be used to describe DEFINE clause of Row pattern recognition. + + + + Row Pattern Navigation Functions + + + + + Function + + + Description + + + + + + + + + prev + + prev ( value anyelement ) + anyelement + + + Returns the column value at the previous row; + returns NULL if there is no previous row in the window frame. + + + + + + + next + + next ( value anyelement ) + anyelement + + + Returns the column value at the next row; + returns NULL if there is no next row in the window frame. + + + + + +
+ The SQL standard defines a FROM FIRST or FROM LAST diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index ca5dd14d627..f0676bf6f2c 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -979,8 +979,8 @@ WINDOW window_name AS ( frame_clause can be one of -{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ] -{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ] +{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ] [ row_pattern_common_syntax ] +{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ] [ row_pattern_common_syntax ] where frame_start @@ -1087,9 +1087,91 @@ EXCLUDE NO OTHERS a given peer group will be in the frame or excluded from it. + + The + optional row_pattern_common_syntax + defines the row pattern recognition condition for + this + window. row_pattern_common_syntax + includes following subclauses. + + +[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ] +[ INITIAL | SEEK ] +PATTERN ( pattern_variable_name [ quantifier ] [, ...] ) +DEFINE definition_variable_name AS expression [, ...] + + AFTER MATCH SKIP PAST LAST ROW or AFTER MATCH + SKIP TO NEXT ROW controls how to proceed to next row position + after a match found. With AFTER MATCH SKIP PAST LAST + ROW (the default) next row position is next to the last row of + previous match. On the other hand, with AFTER MATCH SKIP TO NEXT + ROW next row position is next to the first row of previous + match. INITIAL or SEEK defines how a + successful pattern matching starts from which row in a + frame. If INITIAL is specified, the match must start + from the first row in the frame. If SEEK is specified, + the set of matching rows do not necessarily start from the first row. The + default is INITIAL. Currently + only INITIAL is supported. DEFINE + defines definition variables along with a boolean + expression. PATTERN defines a sequence of rows that + satisfies certain conditions using variables defined + in DEFINE clause. Each pattern variable can be + followed by a quantifier to specify how many times it should match: + * (zero or more), + + (one or more), + ? (zero or one), + {n} (exactly n times), + {n,} (at least n times), + {,m} (at most m times), or + {n,m} + (between n and m times). + Reluctant quantifiers (e.g., *?, +?, + ??, {n,m}?) + are not supported. + Patterns can be grouped using parentheses, and alternation (OR) can be + expressed using the vertical bar |. + For example, (A B)+ matches one or more repetitions + of the sequence A followed by B, and A | B matches + either A or B. + If a pattern variable is not defined in + the DEFINE clause, it is not automatically added + to the DEFINE clause. Instead, the executor evaluates + the variable as TRUE at execution time, behaving as if + the following definition existed. + + +variable_name AS TRUE + + + Conversely, variables defined in the DEFINE clause + but not used in the PATTERN clause are filtered out + during query planning. + + + + Note that the maximum number of unique pattern variables + used in PATTERN clause is 251. + If this limit is exceeded, an error will be raised. + Additionally, the maximum nesting depth of pattern groups + (parentheses) is 253 levels. + However, pattern optimizations such as flattening nested sequences + and simplifying nested quantifiers may reduce the effective depth, + so this limit is rarely reached in practice. + + + + The SQL standard defines more subclauses: MEASURES + and SUBSET. They are not currently supported + in PostgreSQL. Also in the standard there are + more variations in AFTER MATCH clause. + + The purpose of a WINDOW clause is to specify the - behavior of window functions appearing in the query's + behavior of window functions appearing in the + query's SELECT list or ORDER BY clause. These functions -- 2.43.0