Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wK7NL-000cAe-2I for pgsql-hackers@arkaria.postgresql.org; Tue, 05 May 2026 04:27:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wK7NJ-00A7Zn-0p for pgsql-hackers@arkaria.postgresql.org; Tue, 05 May 2026 04:27:13 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wK7NI-00A7Ze-2y for pgsql-hackers@lists.postgresql.org; Tue, 05 May 2026 04:27:12 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wK7NH-00000000Zes-00EW for pgsql-hackers@postgresql.org; Tue, 05 May 2026 04:27:12 +0000 Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-67b6da5a618so8117888a12.2 for ; Mon, 04 May 2026 21:27:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1777955228; cv=none; d=google.com; s=arc-20240605; b=I8WULaYdMR/y0nSffXaLNAAR8AJCwMurKbWqxTYaW7YtaXy7JELoGIbrLL5wuaSiI3 WtQFHtJcBRO0ZgESrEcjObkU7/wSuERaROyn4Y/k0cnNaUWvyxdetLvuiiCS33BD0HWR CB7SJXiVvfJGi4BA7TwauzAOP1rZKLWsqZhH/HBBoxF6yoNw+C/9YaZay36+rfHai+sX tQh/IBWoj9zA+ozMhVRZ9wHvYVXQ3AhfID3SFh2OXAEigmzfJQLzvCNQjQJ5LQIkvybx cTVbnGau57mFYd+tDMWb2+kGAFGaWHpduKFQzoZSxziKIPJL8MZwpD01t8ZTNcE0eTOR 3crA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=hvJeDvdyDOM8yvBQVcS7kEtKFQEEGlcnhSM8l9qKlI8=; fh=4zoeN+3HSQhZvky60dsgb+rabFvRJ3fHqOgM4d3VN/w=; b=PtVBr6rLdwiyXhx2ahSp6CgHAfdMz8n1lRBoQiJ3hgGdpx1Wt9x2lv8cIusoLNQvO4 b1gJ4GvWchMY23scObW+4vXr2nJdYg+wuP7gdpBWf9mEdj/sMc+oV6lrZPAXKC7lsgbi 8cbwm8SR20fFcuvfGeAhob+pBhNL3hAbMyi7DCqvzcZ3OngOfk9kPuEjD0NZddk+9N+Q FJ5Ym4AXKIxiM9d6et6LU3BhWAMCwjXDVTpFi2kBeQyR5EH4HREZcoQFsOPjqEzBmpjd EFd4hZQRMedpI6g9/3vkrog2fBgi1+97vTyJLtzYKIZLGvtBW+FrknfmqL0OiRZCQktc dwtA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777955228; x=1778560028; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hvJeDvdyDOM8yvBQVcS7kEtKFQEEGlcnhSM8l9qKlI8=; b=klCgTI/AHDLVb6SgGbNu3ZV/ItDDSk4V3eLvibS+0z4jKJv5NFRCWEsEOP1A+tRom0 xVU1cIDjluxOs9/FrHqN0yao8NbPDCGqYHxjH/tYvhpPD8xML6AcNSNjdLTycVMbcm1L Re26jon4+diUePGdX/QfBd+rcPSH5bPP/+d7Gw9JiN6AYzeO1PVP++gkb9jiPXgZN9zO KMZ6wAzMeAzF1ozrH77BndDDldVpimUg6VxBUsX5BFvwYqgK8Vw++U72rvWnasLoWV0+ BO9GYk0DlCwRaeGyGno7GiUDnbfZtONFhFELGdL+gCr4vNNCGO85y1pYKNolss3cEMYz EVIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777955228; x=1778560028; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hvJeDvdyDOM8yvBQVcS7kEtKFQEEGlcnhSM8l9qKlI8=; b=Zt7/QPR2Z7u93sDq0b7CFt+upBVLYwkwPwxi0NhjYly+WYfnRcdk/NpMJ8GOyy8/C5 vZ4mKGmiSlbBE8G7umJd11GhtE3Q8MtZH7VM8Mal+VQo92ozeUeMjUW7TVAyoqm8g9NQ HejxNJ4Y90yfiNHiqj5wXHvUFemAo7k1CGr/ryRdyNpXfjVhooDXi5B6YOl+L6jl5orK LCreqsIMeS4G1FjTNqeS/Zbkhdv/e7NmeRuVWMjEhxPZtPyKKiC8weXJXz7+colbNrEc 17IuZmt5YuqnEu5zNylrvwqGeELcSVzgMEMEwqR9otwm8C7ooGsrqBHM4gsE4Tp4qC4+ 4cIA== X-Forwarded-Encrypted: i=1; AFNElJ9+HMLa7R5d4VFcDg55+/f7Sby/VqvpaAE6pRD8E1a4xKpgzGel2/lqqzskrm2r/T9RaTxSvfzxAgFLuPdB@postgresql.org X-Gm-Message-State: AOJu0YxS3NZHx/mN5xO0vXHFFRV1aY/Yg41WEmxiBTA4zVL1N0yfS12G 9kZud/hWz6w9VIDW3p5X269s5/cIX5VxeryUNqofO/uN7nz++REGbRnoXfd8rxyfuLAVqQDFPXT Ml1lsLnoZJRRBaJxWuYwXXScFWtqW5To= X-Gm-Gg: AeBDieulAPKdMvra3qosvd5NXwCMohlEmyG824Vkz6aUbgtIrvG9ZQN0tSnflY6DpGg rKhOcO7jydQfyB3rA8D9QhYriievu2wjnyKV9bbfz7PMVCMfiscRD1Ws6cwwBqjbDvt/9h9cO43 V0o3nRyUjauObn3uDnN8ba/zcVNGQFKBVV/YmHuPx7xjJjoWCckf6WWsr/wN7O5wJmpVuNTtx1A 5HD5GapPqsX0Rh/tn8rjqrjDdJfSTv6vqbnqCjk0+c43f605o3buh437jnJgy0u+8tHAVDC3Kmy 1u5dU6INDEIq0tbaKgmbBKyfKTwdbbdciO3gvPzoDwMlWimNHrA= X-Received: by 2002:a05:6402:1515:b0:670:b72b:4044 with SMTP id 4fb4d7f45d1cf-67c1ada377bmr3878902a12.15.1777955227761; Mon, 04 May 2026 21:27:07 -0700 (PDT) MIME-Version: 1.0 References: <20260427.174220.1939160662649810289.ishii@postgresql.org> <20260502.140304.670813149418899420.ishii@postgresql.org> <20260505.090124.365339750969814137.ishii@postgresql.org> In-Reply-To: <20260505.090124.365339750969814137.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Tue, 5 May 2026 13:26:56 +0900 X-Gm-Features: AVHnY4KMQK2OhQyn0b5xMrVhLae08uJFSATLNxE59_QTxUSbHGyP8AK1IDp6B3M Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000c7027706510a75dc" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c7027706510a75dc Content-Type: text/plain; charset="UTF-8" Hi Tatsuo, Thanks for the close look. Inline below, then a sketch of what I plan to put in v48 so you can pick what to pull first. v47 starts to check whether range variable qualified expressions are > used in DEFINE clause. If used, raise an error. This is good because > we don't support the syntax yet (pattern variable range var), or it's > prohibited (from clause range var). However, the error message may not > be appropreate for the case when complex data type is involved. > > CREATE TEMP TABLE item (name TEXT, amount INT); > CREATE TABLE > CREATE TEMP TABLE sales(items item); > CREATE TABLE > SELECT (items).name, (items).amount, count(*) OVER w > FROM sales WINDOW w AS ( > ORDER BY (items).name > ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING > AFTER MATCH SKIP PAST LAST ROW > PATTERN (A+) > DEFINE A AS (A.items).amount > 10 > ); > ERROR: pattern variable qualified column reference "a.items" is not > supported in DEFINE clause > LINE 7: DEFINE A AS (A.items).amount > 10 > ^ > If I change the DEFINE clause to: > > DEFINE A AS (sales.items).amount > 10 > > I get: > > ERROR: range variable qualified column reference "sales.items" is not > allowed in DEFINE clause > LINE 8: DEFINE A AS (sales.items).amount > 10 > ^ > > In both cases, "a.items" or "sales.items" in the error messages are > not column names, therefore the wording "column reference" in the > error messages are not appropriate. > Agreed. Even for the simple non-composite cases the noun is fuzzy once A_Indirection enters the picture. "expression" reads naturally in both cases. > In order to fix the issue, I think we need to add code to understand > range var qualification form "A.item" or "sales.items" in complex data > types case. But is it worth the trouble? The reword is just nicer > error messages. If we support MEASURES, the code is useful. But so far > we have decided to not support it in the first cut of RPR. > Not now. Teaching the pre-check about the surrounding A_Indirection duplicates work that MEASURES will need anyway, and the only visible gain before MEASURES is a more accurate echo of the source text. Instead I'll mark the limitation in three places so future MEASURES work can't miss it: a block comment above the pre-check that names the limitation and the revisit point, an XXX cross-reference in parse_rpr.c pointing back to the pre-check, and composite-type cases in rpr_base whose expected output quotes only the ColumnRef portion -- so when indirection-aware quoting lands, those outputs churn as a tripwire. > Maybe we should just change the error messages something like below > for now? > > ERROR: pattern variable qualified expression "a.items" is not supported > in DEFINE clause > ERROR: range variable qualified expression "sales.items" is not allowed > in DEFINE clause Yes -- planned for v48. While I'm in that path I'll also clean up two adjacent issues: - the pre-check is binary, so an unknown qualifier (typo, undefined name) is misreported as a range-variable reference with SYNTAX_ERROR instead of falling through to the standard "column does not exist" / "missing FROM-clause entry" diagnostic; - parse_rpr.c and the SELECT docs claim DEFINE-name "collection" and "filtering during planning"; the actual behavior is validate-and-reject at parse analysis. ------------------------------------------------------------------ v48 follow-up plan ------------------------------------------------------------------ The items below are independent, so they can ship as separate patches or as a single batched posting -- whichever you prefer. Order is just for narrative; nothing depends on anything else. [A] Sizable refactor: collapse the four DEFINE walkers across parser/planner/executor into a single phase-tagged traversal. On that base, reject volatile / NextValueExpr in DEFINE (the NFA may re-evaluate predicates during backtracking; STABLE / IMMUTABLE remain accepted), and bundle a STABLE/IMMUTABLE baseline test as a guard against accidental over-rejection. [B] Make the empty-match path observable in tests: replace the stale XXX comments in rpr_nfa with the actual behavior, and add EXPLAIN ANALYZE coverage in rpr_explain that surfaces "NFA: N matched (len 0/0/0.0)" so the NFA-found-but-window- empty case is regression-visible. [C] Trim the per-advance NFA visited-bitmap reset to a high-water mark range instead of the full bitmap. Tradeoff: two int16 comparisons added per visit, paying off for larger NFAs but added overhead for single-word bitmaps; semantics unchanged. I'll leave the decision to apply to your judgment. [D] DEFINE qualifier diagnostic: tri-classify (pattern var / range var / fall-through), reword to "expression", add unknown-qualifier and composite-type tests, and sync the adjacent stale comments and SELECT doc. If any of [A]..[D] looks misjudged or you'd prefer a different slicing, I'll reshape before posting. Best, Henson --000000000000c7027706510a75dc Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tatsuo,

Thanks for the close lo= ok.=C2=A0 Inline below, then a sketch of what I
plan to put in v48 so yo= u can pick what to pull first.

v47 sta= rts to check whether range variable qualified expressions are
used in DEFINE clause. If used, raise an error. This is good because
we don't support the syntax yet (pattern variable range var), or it'= ;s
prohibited (from clause range var). However, the error message may not
be appropreate for the case when complex data type is involved.

CREATE TEMP TABLE item (name TEXT, amount INT);
CREATE TABLE
CREATE TEMP TABLE sales(items item);
CREATE TABLE
SELECT (items).name, (items).amount, count(*) OVER w
FROM sales WINDOW w AS (
=C2=A0 =C2=A0 ORDER BY (items).name
=C2=A0 =C2=A0 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
=C2=A0 =C2=A0 AFTER MATCH SKIP PAST LAST ROW
=C2=A0 =C2=A0 PATTERN (A+)
=C2=A0 =C2=A0 DEFINE A AS (A.items).amount > 10
);
ERROR:=C2=A0 pattern variable qualified column reference "a.items"= ; is not supported in DEFINE clause
LINE 7:=C2=A0 =C2=A0 =C2=A0DEFINE A AS (A.items).amount > 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0^
If I change the DEFINE clause to:

=C2=A0 =C2=A0 DEFINE A AS (sales.items).amount > 10

I get:

ERROR:=C2=A0 range variable qualified column reference "sales.items&qu= ot; is not allowed in DEFINE clause
LINE 8:=C2=A0 =C2=A0 =C2=A0DEFINE A AS (sales.items).amount > 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0^

In both cases, "a.items" or "sales.items" in the error = messages are
not column names, therefore the wording "column reference" in the=
error messages are not appropriate.

Agr= eed.=C2=A0 Even for the simple non-composite cases the noun is fuzzy
onc= e A_Indirection enters the picture. =C2=A0"expression" reads natu= rally
in both cases.
=C2=A0
In order to fix the issue, I think we need to add = code to understand
range var qualification form "A.item" or "sales.items" = in complex data
types case. But is it worth the trouble? The reword is just nicer
error messages. If we support MEASURES, the code is useful. But so far
we have decided to not support it in the first cut of RPR.
=

Not now.=C2=A0 Teaching the pre-check about the surrounding = A_Indirection
duplicates work that MEASURES will need anyway, and the on= ly visible
gain before MEASURES is a more accurate echo of the source te= xt.

Instead I'll mark the limitation in three places so future M= EASURES
work can't miss it: a block comment above the pre-check that= names
the limitation and the revisit point, an XXX cross-reference inparse_rpr.c pointing back to the pre-check, and composite-type cases
i= n rpr_base whose expected output quotes only the ColumnRef portion
-- so= when indirection-aware quoting lands, those outputs churn as
a tripwire= .=C2=A0
=C2=A0
Maybe we should just change the error messages something like below
for now?

ERROR:=C2=A0 pattern variable qualified expression "a.items" is n= ot supported in DEFINE clause
ERROR:=C2=A0 range variable qualified expression "sales.items" is= not allowed in DEFINE clause

Yes -- planned for v48.=C2=A0= While I'm in that path I'll also clean up
two adjacent issues:<= br>
=C2=A0 - the pre-check is binary, so an unknown qualifier (typo,
= =C2=A0 =C2=A0 undefined name) is misreported as a range-variable reference<= br>=C2=A0 =C2=A0 with SYNTAX_ERROR instead of falling through to the standa= rd
=C2=A0 =C2=A0 "column does not exist" / "missing FROM-= clause entry"
=C2=A0 =C2=A0 diagnostic;

=C2=A0 - parse_rpr.c= and the SELECT docs claim DEFINE-name "collection"
=C2=A0 =C2= =A0 and "filtering during planning"; the actual behavior is
= =C2=A0 =C2=A0 validate-and-reject at parse analysis.


-----------= -------------------------------------------------------
v48 follow-up pl= an
------------------------------------------------------------------
The items below are independent, so they can ship as separate
patch= es or as a single batched posting -- whichever you prefer.
Order is just= for narrative; nothing depends on anything else.

=C2=A0 [A] Sizable= refactor: collapse the four DEFINE walkers across
=C2=A0 =C2=A0 =C2=A0 = parser/planner/executor into a single phase-tagged
=C2=A0 =C2=A0 =C2=A0 = traversal.=C2=A0 On that base, reject volatile / NextValueExpr
=C2=A0 = =C2=A0 =C2=A0 in DEFINE (the NFA may re-evaluate predicates during
=C2= =A0 =C2=A0 =C2=A0 backtracking; STABLE / IMMUTABLE remain accepted), and=C2=A0 =C2=A0 =C2=A0 bundle a STABLE/IMMUTABLE baseline test as a guard ag= ainst
=C2=A0 =C2=A0 =C2=A0 accidental over-rejection.

=C2=A0 [B] = Make the empty-match path observable in tests: replace the
=C2=A0 =C2=A0= =C2=A0 stale XXX comments in rpr_nfa with the actual behavior, and
=C2= =A0 =C2=A0 =C2=A0 add EXPLAIN ANALYZE coverage in rpr_explain that surfaces=
=C2=A0 =C2=A0 =C2=A0 "NFA: N matched (len 0/0/0.0)" so the NF= A-found-but-window-
=C2=A0 =C2=A0 =C2=A0 empty case is regression-visibl= e.

=C2=A0 [C] Trim the per-advance NFA visited-bitmap reset to a hig= h-water
=C2=A0 =C2=A0 =C2=A0 mark range instead of the full bitmap.=C2= =A0 Tradeoff: two int16
=C2=A0 =C2=A0 =C2=A0 comparisons added per visit= , paying off for larger NFAs but
=C2=A0 =C2=A0 =C2=A0 added overhead for= single-word bitmaps; semantics unchanged.
=C2=A0 =C2=A0 =C2=A0 I'll= leave the decision to apply to your judgment.

=C2=A0 [D] DEFINE qua= lifier diagnostic: tri-classify (pattern var /
=C2=A0 =C2=A0 =C2=A0 rang= e var / fall-through), reword to "expression", add
=C2=A0 =C2= =A0 =C2=A0 unknown-qualifier and composite-type tests, and sync the
=C2= =A0 =C2=A0 =C2=A0 adjacent stale comments and SELECT doc.

If any of = [A]..[D] looks misjudged or you'd prefer a different
slicing, I'= ll reshape before posting.

Best,
Henson=C2=A0
--000000000000c7027706510a75dc--