From 9c7e5bc9dde1adafcff995370340975661890d3c Mon Sep 17 00:00:00 2001 From: Henson Choi Date: Fri, 19 Jun 2026 13:43:58 +0900 Subject: [PATCH 12/13] Improve comments, documentation, and naming for row pattern recognition A batch of clarity cleanups for row pattern recognition, none of which change behavior or query output: - Call the parser-built PATTERN structure a "parse tree" rather than an "AST", which is not PostgreSQL's usual term (parsenodes.h, parse_rpr.c, plan/rpr.c, and the executor README). - Trim the transformDefineClause header comment, whose step list merely duplicated the inline comments. - Drop a duplicated standard-citation sentence from the contain_rpr_walker header; transformWithClause already carries it. - Fix a stale "ALT_START" reference to name the ALT marker, and normalize an "or -1" ParseLoc comment to the usual "or -1 if unknown". - Move the EMPTY-alternative explanation in row_pattern_quantifier_opt into the action block, keeping the /*EMPTY*/ marker. - Reword the Run Condition pushdown test comment to explain in SQL terms why count(*) is DECREASING over the required frame. - Rename RPR_COUNT_MAX to RPR_COUNT_INF, defined from RPR_QUANTITY_INF. The old "MAX" name suggested a configurable repetition limit -- that is elem->max, a different thing -- which led to questions about erroring when a count reaches it. The value is the int32 saturation ceiling, and a saturated count reads as "unbounded". --- src/backend/executor/README.rpr | 44 +++++++++++------------ src/backend/executor/execRPR.c | 23 ++++++------ src/backend/optimizer/plan/rpr.c | 23 ++++++------ src/backend/parser/gram.y | 7 ++-- src/backend/parser/parse_cte.c | 4 +-- src/backend/parser/parse_rpr.c | 19 ++-------- src/include/nodes/parsenodes.h | 12 +++---- src/include/optimizer/rpr.h | 9 ++++- src/test/regress/expected/rpr_explain.out | 12 ++++--- src/test/regress/sql/rpr_explain.sql | 12 ++++--- 10 files changed, 84 insertions(+), 81 deletions(-) diff --git a/src/backend/executor/README.rpr b/src/backend/executor/README.rpr index 50d1ff87f7e..9275e265d4b 100644 --- a/src/backend/executor/README.rpr +++ b/src/backend/executor/README.rpr @@ -96,16 +96,16 @@ Chapter II Overall Processing Pipeline RPR processing is divided into three phases: - +------------------------------------------------------------+ - | 1. Parsing (Parser) | - | SQL text -> PATTERN AST + DEFINE expression tree | - | | - | 2. Compilation (Optimizer/Planner) | - | PATTERN AST -> optimization -> flat NFA element array | - | | - | 3. Execution (Executor) | - | Row-by-row matching via NFA simulation | - +------------------------------------------------------------+ + +--------------------------------------------------------------+ + | 1. Parsing (Parser) | + | SQL text -> PATTERN parse tree + DEFINE expression tree | + | | + | 2. Compilation (Optimizer/Planner) | + | PATTERN parse tree -> optimization -> flat NFA elements | + | | + | 3. Execution (Executor) | + | Row-by-row matching via NFA simulation | + +--------------------------------------------------------------+ Each phase uses independent data structures, and the interfaces between phases are well-defined: @@ -137,7 +137,7 @@ following: (3) DEFINE clause transformation (transformDefineClause) -III-2. PATTERN AST (Abstract Syntax Tree) +III-2. PATTERN parse tree The parser transforms the PATTERN clause into an RPRPatternNode tree. Each node has one of the following four types: @@ -192,16 +192,16 @@ IV-1. Entry Point IV-2. The 6 Phases of buildRPRPattern() - Phase 1: AST optimization (optimizeRPRPattern) + Phase 1: parse tree optimization (optimizeRPRPattern) Phase 2: Statistics collection (scanRPRPattern) Phase 3: Memory allocation (makeRPRPattern) Phase 4: NFA element fill (fillRPRPattern) Phase 5: Finalization (finalizeRPRPattern) Phase 6: Absorbability analysis (computeAbsorbability) -IV-3. Phase 1: AST Optimization +IV-3. Phase 1: Parse Tree Optimization -After copying the parser-generated AST, the following optimizations are +After copying the parser-generated parse tree, the following optimizations are applied: (a) SEQ flattening: Unwrap nested SEQ nodes @@ -236,7 +236,7 @@ applied: IV-4. Phase 4: NFA Element Array Generation -Transforms the optimized AST into a flat array of RPRPatternElement. +Transforms the optimized parse tree into a flat array of RPRPatternElement. This is the core data structure used for NFA simulation at runtime. RPRPatternElement struct (16 bytes): @@ -298,7 +298,7 @@ Element flags (1 byte, bitmask): Example: PATTERN (A+ B | C) - AST: ALT(SEQ(VAR(A,1,INF), VAR(B,1,1)), VAR(C,1,1)) + Parse tree: ALT(SEQ(VAR(A,1,INF), VAR(B,1,1)), VAR(C,1,1)) Compilation result: @@ -345,9 +345,9 @@ Example: PATTERN ((A B)+) IV-4a. Reluctant Flag (RPR_ELEM_RELUCTANT) -The reluctant flag is set during Phase 4 (fillRPRPattern) when the AST node -has reluctant == true. It reverses the priority of quantifier expansion at -runtime: +The reluctant flag is set during Phase 4 (fillRPRPattern) when the parse +tree node has reluctant == true. It reverses the priority of quantifier +expansion at runtime: Greedy (default): try loop-back first, then exit (prefer longer match) Reluctant: try exit first, then loop-back (prefer shorter match) @@ -1363,9 +1363,9 @@ XII-5. Execution Optimization Summary -- Compile-time -- - (1) AST Optimization (IV-3) + (1) Parse Tree Optimization (IV-3) - Simplifies the AST before converting the pattern to an NFA. + Simplifies the parse tree before converting the pattern to an NFA. Reduces the number of NFA elements through consecutive variable merging (A A -> A{2}), SEQ flattening, quantifier multiplication, and other transformations. @@ -1468,7 +1468,7 @@ Appendix A. Key Function Index transformRPR parse_rpr.c Parser entry point transformDefineClause parse_rpr.c DEFINE transformation buildRPRPattern rpr.c NFA compilation main - optimizeRPRPattern rpr.c AST optimization + optimizeRPRPattern rpr.c parse tree optimization fillRPRPattern rpr.c NFA element generation finalizeRPRPattern rpr.c Finalization computeAbsorbability rpr.c Absorption analysis diff --git a/src/backend/executor/execRPR.c b/src/backend/executor/execRPR.c index 90e3a068f04..e5e30d79f01 100644 --- a/src/backend/executor/execRPR.c +++ b/src/backend/executor/execRPR.c @@ -821,8 +821,11 @@ nfa_match(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched) if (matched) { - /* Increment count */ - if (count < RPR_COUNT_MAX) + /* + * Increment count, saturating at RPR_COUNT_INF to avoid int32 + * overflow; a saturated count then compares as "unbounded". + */ + if (count < RPR_COUNT_INF) count++; /* Max constraint should not be exceeded */ @@ -857,7 +860,7 @@ nfa_match(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched) int32 endCount = state->counts[endDepth]; /* Increment group count */ - if (endCount < RPR_COUNT_MAX) + if (endCount < RPR_COUNT_INF) endCount++; Assert(endElem->max == RPR_QUANTITY_INF || endCount <= endElem->max); @@ -900,7 +903,7 @@ nfa_match(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched) state->counts[endDepth] = 0; /* Increment outer group count */ - if (outerCount < RPR_COUNT_MAX) + if (outerCount < RPR_COUNT_INF) outerCount++; Assert(outerEnd->max == RPR_QUANTITY_INF || outerCount <= outerEnd->max); @@ -1180,7 +1183,7 @@ nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx, /* END->END: increment outer END's count */ if (RPRElemIsEnd(nextElem) && - ffState->counts[nextElem->depth] < RPR_COUNT_MAX) + ffState->counts[nextElem->depth] < RPR_COUNT_INF) ffState->counts[nextElem->depth]++; } @@ -1235,7 +1238,7 @@ nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx, RPRElemIsAbsorbableBranch(nextElem); /* END->END: increment outer END's count */ - if (RPRElemIsEnd(nextElem) && state->counts[nextElem->depth] < RPR_COUNT_MAX) + if (RPRElemIsEnd(nextElem) && state->counts[nextElem->depth] < RPR_COUNT_INF) state->counts[nextElem->depth]++; nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos); @@ -1262,7 +1265,7 @@ nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx, nextElem = &elements[exitState->elemIdx]; /* END->END: increment outer END's count */ - if (RPRElemIsEnd(nextElem) && exitState->counts[nextElem->depth] < RPR_COUNT_MAX) + if (RPRElemIsEnd(nextElem) && exitState->counts[nextElem->depth] < RPR_COUNT_INF) exitState->counts[nextElem->depth]++; /* Prepare loop state */ @@ -1355,7 +1358,7 @@ nfa_advance_var(WindowAggState *winstate, RPRNFAContext *ctx, /* When exiting directly to an outer END, increment its count */ if (RPRElemIsEnd(nextElem)) { - if (cloneState->counts[nextElem->depth] < RPR_COUNT_MAX) + if (cloneState->counts[nextElem->depth] < RPR_COUNT_INF) cloneState->counts[nextElem->depth]++; } @@ -1404,7 +1407,7 @@ nfa_advance_var(WindowAggState *winstate, RPRNFAContext *ctx, */ if (RPRElemIsEnd(nextElem)) { - if (state->counts[nextElem->depth] < RPR_COUNT_MAX) + if (state->counts[nextElem->depth] < RPR_COUNT_INF) state->counts[nextElem->depth]++; } @@ -1438,7 +1441,7 @@ nfa_advance_var(WindowAggState *winstate, RPRNFAContext *ctx, /* See comment above: increment outer END count for quantified VARs */ if (RPRElemIsEnd(nextElem)) { - if (state->counts[nextElem->depth] < RPR_COUNT_MAX) + if (state->counts[nextElem->depth] < RPR_COUNT_INF) state->counts[nextElem->depth]++; } diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c index ebba8e50b1d..62292508aad 100644 --- a/src/backend/optimizer/plan/rpr.c +++ b/src/backend/optimizer/plan/rpr.c @@ -3,13 +3,13 @@ * rpr.c * Row Pattern Recognition pattern compilation for planner * - * This file contains functions for optimizing RPR pattern AST and + * This file contains functions for optimizing the RPR pattern parse tree and * compiling it to a flat element array for NFA execution by WindowAgg. * * Key components: * 1. Pattern Optimization: Simplifies patterns before compilation * (e.g., flatten nested SEQ/ALT, merge consecutive vars) - * 2. Pattern Compilation: Converts AST to flat element array for NFA + * 2. Pattern Compilation: Converts parse tree to flat element array for NFA * 3. Absorption Analysis: Computes flags for O(n^2)->O(n) optimization * * Context Absorption Optimization: @@ -995,7 +995,7 @@ collectDefineVariables(List *defineVariableList, char **varNames) /* * scanRPRPatternRecursive - * Recursively scan pattern AST (pass 1 internal). + * Recursively scan pattern parse tree (pass 1 internal). * * Collects unique variable names and counts elements while tracking depth. * Variables from DEFINE clause are already in varNames; this adds any @@ -1092,7 +1092,7 @@ scanRPRPatternRecursive(RPRPatternNode *node, char **varNames, int *numVars, /* * scanRPRPattern - * Scan pattern AST (pass 1 entry point). + * Scan pattern parse tree (pass 1 entry point). * * Collects unique variable names (appending to those from DEFINE clause), * counts total elements (including FIN marker), and tracks maximum depth. @@ -1297,7 +1297,7 @@ fillRPRPatternGroup(RPRPatternNode *node, RPRPattern *pat, int *idx, RPRDepth de * fillRPRPatternAlt * Fill an ALT pattern and its alternatives. * - * Creates ALT_START marker, fills each alternative at increased depth, + * Creates the ALT marker, fills each alternative at increased depth, * sets jump pointers for backtracking, and next pointers for successful paths. * * Returns true if any branch is nullable (OR semantics: one nullable @@ -1383,9 +1383,10 @@ fillRPRPatternAlt(RPRPatternNode *node, RPRPattern *pat, int *idx, RPRDepth dept /* * fillRPRPattern - * Fill pattern elements array from AST (pass 2). + * Fill pattern elements array from parse tree (pass 2). * - * Recursively traverses AST and populates pre-allocated elements array. + * Recursively traverses the parse tree and populates pre-allocated elements + * array. * Dispatches to type-specific fill functions. * * Returns true if the pattern is nullable (can match zero rows). @@ -1767,7 +1768,7 @@ computeAbsorbabilityRecursive(RPRPattern *pattern, RPRElemIdx startIdx, } else { - /* Should never reach END - structural invariant of pattern AST */ + /* Should never reach END - structural invariant of pattern parse tree */ Assert(!RPRElemIsEnd(elem)); /* Non-ALT, non-BEGIN: check if unbounded start */ @@ -1894,13 +1895,13 @@ validate_rpr_define_volatility(List *defineClause) /* * buildRPRPattern - * Compile pattern AST to flat bytecode array. + * Compile pattern parse tree to flat bytecode array. * * Compilation phases: - * 1. Optimize AST (flatten, merge, deduplicate) + * 1. Optimize parse tree (flatten, merge, deduplicate) * 2. Scan: collect variables, count elements (pass 1) * 3. Allocate result structure - * 4. Fill elements from AST (pass 2) + * 4. Fill elements from parse tree (pass 2) * 5. Finalize pattern structure * 6. Compute context absorption eligibility * diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4ae951aaeba..6eb01ea7f0b 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -17734,9 +17734,12 @@ row_pattern_primary: ; row_pattern_quantifier_opt: - /* EMPTY - no quantifier means exactly once; @$ is unused since - * min=max=1 never produces an error */ + /*EMPTY*/ { + /* + * no quantifier means exactly once; @$ is unused since + * min=max=1 never produces an error + */ $$ = (Node *) makeRPRQuantifier(1, 1, false, @$); } | '*' diff --git a/src/backend/parser/parse_cte.c b/src/backend/parser/parse_cte.c index 3e493beba0b..c35feeae6fd 100644 --- a/src/backend/parser/parse_cte.c +++ b/src/backend/parser/parse_cte.c @@ -1303,9 +1303,7 @@ checkWellFormedSelectStmt(SelectStmt *stmt, CteState *cstate) /* * contain_rpr_walker * Returns true if the raw parse tree contains any -- i.e., any WindowDef with PATTERN/DEFINE attached. Used - * by transformWithClause() to enforce ISO/IEC 9075-2:2016 7.17 SR 3)f) - * on WITH RECURSIVE elements. + * syntax> -- i.e., any WindowDef with PATTERN/DEFINE attached. */ static bool contain_rpr_walker(Node *node, void *context) diff --git a/src/backend/parser/parse_rpr.c b/src/backend/parser/parse_rpr.c index 7c201d55164..ed12190cb06 100644 --- a/src/backend/parser/parse_rpr.c +++ b/src/backend/parser/parse_rpr.c @@ -8,7 +8,7 @@ * - Validates frame options (must start at CURRENT ROW, no EXCLUDE) * - Validates PATTERN variable count (max RPR_VARID_MAX + 1) * - Transforms DEFINE clause - * - Stores the PATTERN AST and the SKIP TO/INITIAL flags + * - Stores the PATTERN parse tree and the SKIP TO/INITIAL flags * * Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California @@ -67,7 +67,7 @@ static bool define_walker(Node *node, void *context); * - Set AFTER MATCH SKIP TO flag * - Set SEEK/INITIAL flag * - Transforms DEFINE clause into TargetEntry list - * - Stores PATTERN AST for deparsing (optimization happens in planner) + * - Stores PATTERN parse tree for deparsing (optimization happens in planner) * * Returns early if windef has no rpCommonSyntax (non-RPR window). */ @@ -180,7 +180,7 @@ transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, /* Transform DEFINE clause into list of TargetEntry's */ wc->defineClause = transformDefineClause(pstate, windef, targetlist); - /* Store PATTERN AST for deparsing */ + /* Store PATTERN parse tree for deparsing */ wc->rpPattern = windef->rpCommonSyntax->rpPattern; } @@ -264,19 +264,6 @@ validateRPRPatternVarCount(ParseState *pstate, RPRPatternNode *node, * transformDefineClause * Process DEFINE clause and transform ResTarget into list of TargetEntry. * - * First: - * 1. Validates PATTERN variable count and collects RPR variable names - * 2. Rejects DEFINE variables not used in PATTERN - * 3. Checks for duplicate variable names in DEFINE clause - * - * Then for each DEFINE variable: - * 4. Transforms expression via transformExpr() and coerces it to boolean - * 5. Creates defineClause entry with proper resname (pattern variable name) - * 6. Ensures referenced Var nodes are present in the query targetlist (via - * pull_var_clause) - * - * Finally marks column origins and assigns collation information. - * * Note: Variables not in DEFINE are evaluated as TRUE by the executor. * Variables in DEFINE but not in PATTERN are rejected as an error. * diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 947e668020e..f28215b8e83 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -621,7 +621,7 @@ typedef enum RPRPatternNodeType } RPRPatternNodeType; /* - * RPRPatternNode - Row Pattern Recognition pattern AST node + * RPRPatternNode - Row Pattern Recognition pattern parse tree node */ typedef struct RPRPatternNode { @@ -630,7 +630,7 @@ typedef struct RPRPatternNode int32 min; /* minimum repetitions (0 for *, ?) */ int32 max; /* maximum repetitions (PG_INT32_MAX for *, +) */ bool reluctant; /* true for reluctant (non-greedy) */ - ParseLoc location; /* token location, or -1 */ + ParseLoc location; /* token location, or -1 if unknown */ char *varName; /* VAR: variable name */ List *children; /* SEQ, ALT, GROUP: child nodes */ @@ -652,10 +652,10 @@ typedef struct RPCommonSyntax RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */ bool initial; /* true if is * initial */ - RPRPatternNode *rpPattern; /* PATTERN clause AST */ + RPRPatternNode *rpPattern; /* PATTERN parse tree */ List *rpDefs; /* row pattern definitions clause (list of * ResTarget) */ - ParseLoc location; /* PATTERN keyword location, or -1 */ + ParseLoc location; /* PATTERN keyword location, or -1 if unknown */ } RPCommonSyntax; /* @@ -1727,7 +1727,7 @@ typedef struct GroupingSet * options are never copied, per spec. * "defineClause" is Row Pattern Recognition DEFINE clause (list of * TargetEntry). TargetEntry.resname represents row pattern definition - * variable name. "rpPattern" represents PATTERN clause as an AST tree + * variable name. "rpPattern" represents the PATTERN clause as a parse tree * (RPRPatternNode). * */ @@ -1763,7 +1763,7 @@ typedef struct WindowClause * initial */ /* Row Pattern DEFINE clause (list of TargetEntry) */ List *defineClause; - /* Row Pattern PATTERN clause AST */ + /* Row Pattern PATTERN parse tree */ RPRPatternNode *rpPattern; } WindowClause; diff --git a/src/include/optimizer/rpr.h b/src/include/optimizer/rpr.h index 83d3cd29b07..f5462ab2a7c 100644 --- a/src/include/optimizer/rpr.h +++ b/src/include/optimizer/rpr.h @@ -27,8 +27,15 @@ * before release. */ #define RPR_VARID_MAX 0xEF /* pattern variables are 0 to 0xEF */ + +/* + * RPR_COUNT_INF is the value a runtime repetition count saturates at to avoid + * int32 overflow (see the count++ guard in nfa_match). It is defined as + * RPR_QUANTITY_INF so that a saturated count compares as "unbounded", just + * like an unbounded quantifier's max. + */ #define RPR_QUANTITY_INF PG_INT32_MAX /* unbounded quantifier */ -#define RPR_COUNT_MAX PG_INT32_MAX /* max runtime count (NFA state) */ +#define RPR_COUNT_INF RPR_QUANTITY_INF #define RPR_ELEMIDX_MAX PG_INT16_MAX /* max pattern elements */ #define RPR_ELEMIDX_INVALID ((RPRElemIdx) -1) /* invalid index */ #define RPR_DEPTH_MAX (PG_UINT8_MAX - 1) /* max pattern nesting depth: diff --git a/src/test/regress/expected/rpr_explain.out b/src/test/regress/expected/rpr_explain.out index cc86d0aae30..8672b4c3055 100644 --- a/src/test/regress/expected/rpr_explain.out +++ b/src/test/regress/expected/rpr_explain.out @@ -5598,13 +5598,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM rpr_ev_opt_mixed; -- find_window_run_conditions() pushes WHERE filters on monotonic window -- functions into WindowAgg as Run Conditions for early termination. -- With RPR's required frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED --- FOLLOWING), the monotonic direction determines which operators trigger --- Run Condition pushdown: +-- FOLLOWING), the monotonic direction of the window function determines +-- which comparison operators allow pushdown: -- INCREASING (<=): row_number, rank, dense_rank, percent_rank, -- cume_dist, ntile --- DECREASING (>): count(*) (via int8inc, END_UNBOUNDED_FOLLOWING) --- RPR window function results are match-dependent, not monotonic. --- Test with count(*) > 0 as representative case. +-- DECREASING (>): count(*). As the current row advances, the frame +-- (which ends at UNBOUNDED FOLLOWING) shrinks, so the +-- count decreases. +-- RPR window function results are match-dependent, not monotonic, so this +-- pushdown does not apply. Test with count(*) > 0 as a representative case. -- -- Without RPR: count(*) > 0 is pushed down as Run Condition EXPLAIN (COSTS OFF) diff --git a/src/test/regress/sql/rpr_explain.sql b/src/test/regress/sql/rpr_explain.sql index 297ca78d54b..115402e304d 100644 --- a/src/test/regress/sql/rpr_explain.sql +++ b/src/test/regress/sql/rpr_explain.sql @@ -3174,13 +3174,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM rpr_ev_opt_mixed; -- find_window_run_conditions() pushes WHERE filters on monotonic window -- functions into WindowAgg as Run Conditions for early termination. -- With RPR's required frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED --- FOLLOWING), the monotonic direction determines which operators trigger --- Run Condition pushdown: +-- FOLLOWING), the monotonic direction of the window function determines +-- which comparison operators allow pushdown: -- INCREASING (<=): row_number, rank, dense_rank, percent_rank, -- cume_dist, ntile --- DECREASING (>): count(*) (via int8inc, END_UNBOUNDED_FOLLOWING) --- RPR window function results are match-dependent, not monotonic. --- Test with count(*) > 0 as representative case. +-- DECREASING (>): count(*). As the current row advances, the frame +-- (which ends at UNBOUNDED FOLLOWING) shrinks, so the +-- count decreases. +-- RPR window function results are match-dependent, not monotonic, so this +-- pushdown does not apply. Test with count(*) > 0 as a representative case. -- -- Without RPR: count(*) > 0 is pushed down as Run Condition -- 2.50.1 (Apple Git-155)