public inbox for [email protected]
help / color / mirror / Atom feedFrom: Henson Choi <[email protected]>
To: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Tue, 5 May 2026 13:26:56 +0900
Message-ID: <CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<CAAAe_zBdwAUDNs_WFdLkFF=ewhkDkv-AqizVEVzhsfremGFb4w@mail.gmail.com>
<[email protected]>
<[email protected]>
Hi Tatsuo,
Thanks for the close look. Inline below, then a sketch of what I
plan to put in v48 so you can pick what to pull first.
v47 starts to check whether range variable qualified expressions are
> used in DEFINE clause. If used, raise an error. This is good because
> we don't support the syntax yet (pattern variable range var), or it's
> prohibited (from clause range var). However, the error message may not
> be appropreate for the case when complex data type is involved.
>
> CREATE TEMP TABLE item (name TEXT, amount INT);
> CREATE TABLE
> CREATE TEMP TABLE sales(items item);
> CREATE TABLE
> SELECT (items).name, (items).amount, count(*) OVER w
> FROM sales WINDOW w AS (
> ORDER BY (items).name
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> PATTERN (A+)
> DEFINE A AS (A.items).amount > 10
> );
> ERROR: pattern variable qualified column reference "a.items" is not
> supported in DEFINE clause
> LINE 7: DEFINE A AS (A.items).amount > 10
> ^
> If I change the DEFINE clause to:
>
> DEFINE A AS (sales.items).amount > 10
>
> I get:
>
> ERROR: range variable qualified column reference "sales.items" is not
> allowed in DEFINE clause
> LINE 8: DEFINE A AS (sales.items).amount > 10
> ^
>
> In both cases, "a.items" or "sales.items" in the error messages are
> not column names, therefore the wording "column reference" in the
> error messages are not appropriate.
>
Agreed. Even for the simple non-composite cases the noun is fuzzy
once A_Indirection enters the picture. "expression" reads naturally
in both cases.
> In order to fix the issue, I think we need to add code to understand
> range var qualification form "A.item" or "sales.items" in complex data
> types case. But is it worth the trouble? The reword is just nicer
> error messages. If we support MEASURES, the code is useful. But so far
> we have decided to not support it in the first cut of RPR.
>
Not now. Teaching the pre-check about the surrounding A_Indirection
duplicates work that MEASURES will need anyway, and the only visible
gain before MEASURES is a more accurate echo of the source text.
Instead I'll mark the limitation in three places so future MEASURES
work can't miss it: a block comment above the pre-check that names
the limitation and the revisit point, an XXX cross-reference in
parse_rpr.c pointing back to the pre-check, and composite-type cases
in rpr_base whose expected output quotes only the ColumnRef portion
-- so when indirection-aware quoting lands, those outputs churn as
a tripwire.
> Maybe we should just change the error messages something like below
> for now?
>
> ERROR: pattern variable qualified expression "a.items" is not supported
> in DEFINE clause
> ERROR: range variable qualified expression "sales.items" is not allowed
> in DEFINE clause
Yes -- planned for v48. While I'm in that path I'll also clean up
two adjacent issues:
- the pre-check is binary, so an unknown qualifier (typo,
undefined name) is misreported as a range-variable reference
with SYNTAX_ERROR instead of falling through to the standard
"column does not exist" / "missing FROM-clause entry"
diagnostic;
- parse_rpr.c and the SELECT docs claim DEFINE-name "collection"
and "filtering during planning"; the actual behavior is
validate-and-reject at parse analysis.
------------------------------------------------------------------
v48 follow-up plan
------------------------------------------------------------------
The items below are independent, so they can ship as separate
patches or as a single batched posting -- whichever you prefer.
Order is just for narrative; nothing depends on anything else.
[A] Sizable refactor: collapse the four DEFINE walkers across
parser/planner/executor into a single phase-tagged
traversal. On that base, reject volatile / NextValueExpr
in DEFINE (the NFA may re-evaluate predicates during
backtracking; STABLE / IMMUTABLE remain accepted), and
bundle a STABLE/IMMUTABLE baseline test as a guard against
accidental over-rejection.
[B] Make the empty-match path observable in tests: replace the
stale XXX comments in rpr_nfa with the actual behavior, and
add EXPLAIN ANALYZE coverage in rpr_explain that surfaces
"NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window-
empty case is regression-visible.
[C] Trim the per-advance NFA visited-bitmap reset to a high-water
mark range instead of the full bitmap. Tradeoff: two int16
comparisons added per visit, paying off for larger NFAs but
added overhead for single-word bitmaps; semantics unchanged.
I'll leave the decision to apply to your judgment.
[D] DEFINE qualifier diagnostic: tri-classify (pattern var /
range var / fall-through), reword to "expression", add
unknown-qualifier and composite-type tests, and sync the
adjacent stale comments and SELECT doc.
If any of [A]..[D] looks misjudged or you'd prefer a different
slicing, I'll reshape before posting.
Best,
Henson
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <CAAAe_zCwY4Lt_OS44DMuqua-szeDVVTgJDeH=c-d5Qco5v2sOQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox