public inbox for [email protected]  
help / color / mirror / Atom feed
From: Henson Choi <[email protected]>
To: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Tue, 5 May 2026 13:26:56 +0900
Message-ID: <CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAAAe_zBdwAUDNs_WFdLkFF=ewhkDkv-AqizVEVzhsfremGFb4w@mail.gmail.com>
	<[email protected]>
	<[email protected]>

Hi Tatsuo,

Thanks for the close look.  Inline below, then a sketch of what I
plan to put in v48 so you can pick what to pull first.

v47 starts to check whether range variable qualified expressions are
> used in DEFINE clause. If used, raise an error. This is good because
> we don't support the syntax yet (pattern variable range var), or it's
> prohibited (from clause range var). However, the error message may not
> be appropreate for the case when complex data type is involved.
>
> CREATE TEMP TABLE item (name TEXT, amount INT);
> CREATE TABLE
> CREATE TEMP TABLE sales(items item);
> CREATE TABLE
> SELECT (items).name, (items).amount, count(*) OVER w
> FROM sales WINDOW w AS (
>     ORDER BY (items).name
>     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>     AFTER MATCH SKIP PAST LAST ROW
>     PATTERN (A+)
>     DEFINE A AS (A.items).amount > 10
> );
> ERROR:  pattern variable qualified column reference "a.items" is not
> supported in DEFINE clause
> LINE 7:     DEFINE A AS (A.items).amount > 10
>                          ^
> If I change the DEFINE clause to:
>
>     DEFINE A AS (sales.items).amount > 10
>
> I get:
>
> ERROR:  range variable qualified column reference "sales.items" is not
> allowed in DEFINE clause
> LINE 8:     DEFINE A AS (sales.items).amount > 10
>                          ^
>
> In both cases, "a.items" or "sales.items" in the error messages are
> not column names, therefore the wording "column reference" in the
> error messages are not appropriate.
>

Agreed.  Even for the simple non-composite cases the noun is fuzzy
once A_Indirection enters the picture.  "expression" reads naturally
in both cases.


> In order to fix the issue, I think we need to add code to understand
> range var qualification form "A.item" or "sales.items" in complex data
> types case. But is it worth the trouble? The reword is just nicer
> error messages. If we support MEASURES, the code is useful. But so far
> we have decided to not support it in the first cut of RPR.
>

Not now.  Teaching the pre-check about the surrounding A_Indirection
duplicates work that MEASURES will need anyway, and the only visible
gain before MEASURES is a more accurate echo of the source text.

Instead I'll mark the limitation in three places so future MEASURES
work can't miss it: a block comment above the pre-check that names
the limitation and the revisit point, an XXX cross-reference in
parse_rpr.c pointing back to the pre-check, and composite-type cases
in rpr_base whose expected output quotes only the ColumnRef portion
-- so when indirection-aware quoting lands, those outputs churn as
a tripwire.


> Maybe we should just change the error messages something like below
> for now?
>
> ERROR:  pattern variable qualified expression "a.items" is not supported
> in DEFINE clause
> ERROR:  range variable qualified expression "sales.items" is not allowed
> in DEFINE clause


Yes -- planned for v48.  While I'm in that path I'll also clean up
two adjacent issues:

  - the pre-check is binary, so an unknown qualifier (typo,
    undefined name) is misreported as a range-variable reference
    with SYNTAX_ERROR instead of falling through to the standard
    "column does not exist" / "missing FROM-clause entry"
    diagnostic;

  - parse_rpr.c and the SELECT docs claim DEFINE-name "collection"
    and "filtering during planning"; the actual behavior is
    validate-and-reject at parse analysis.


------------------------------------------------------------------
v48 follow-up plan
------------------------------------------------------------------

The items below are independent, so they can ship as separate
patches or as a single batched posting -- whichever you prefer.
Order is just for narrative; nothing depends on anything else.

  [A] Sizable refactor: collapse the four DEFINE walkers across
      parser/planner/executor into a single phase-tagged
      traversal.  On that base, reject volatile / NextValueExpr
      in DEFINE (the NFA may re-evaluate predicates during
      backtracking; STABLE / IMMUTABLE remain accepted), and
      bundle a STABLE/IMMUTABLE baseline test as a guard against
      accidental over-rejection.

  [B] Make the empty-match path observable in tests: replace the
      stale XXX comments in rpr_nfa with the actual behavior, and
      add EXPLAIN ANALYZE coverage in rpr_explain that surfaces
      "NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window-
      empty case is regression-visible.

  [C] Trim the per-advance NFA visited-bitmap reset to a high-water
      mark range instead of the full bitmap.  Tradeoff: two int16
      comparisons added per visit, paying off for larger NFAs but
      added overhead for single-word bitmaps; semantics unchanged.
      I'll leave the decision to apply to your judgment.

  [D] DEFINE qualifier diagnostic: tri-classify (pattern var /
      range var / fall-through), reword to "expression", add
      unknown-qualifier and composite-type tests, and sync the
      adjacent stale comments and SELECT doc.

If any of [A]..[D] looks misjudged or you'd prefer a different
slicing, I'll reshape before posting.

Best,
Henson


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Row pattern recognition
  In-Reply-To: <CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox