Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUYsh-001Fy7-05 for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 23:50:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUYsf-00GWmX-2u for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 23:50:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUYsf-00GWmP-1A for pgsql-hackers@lists.postgresql.org; Tue, 02 Jun 2026 23:50:45 +0000 Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wUYsc-00000000pCU-3ENw for pgsql-hackers@postgresql.org; Tue, 02 Jun 2026 23:50:44 +0000 Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-beb2a97cc9aso725212766b.2 for ; Tue, 02 Jun 2026 16:50:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780444241; cv=none; d=google.com; s=arc-20240605; b=E6N2sfhJDvLQrYlC3QkdWtlpGeATQShCmn3wKG7av5UKhimDeOGDSVN6TgN8l/7N2d pV/Y4fL/pALEaQAmIMPHVur3CxRz4gD2ONSsCWjfkhnWp1mKaC7xpdn3oSaAVon/Hhbm C4QjAT7XRj4CFsnmXj43K/8QnYyJK5cHXdzhjOPFrTCZet0YJbzrSMyaXkcHtjbncYpz zQpYdGbnqIOaMwFzAGnyccsCxsK0vqV1vRLrnaKTRarEvYer+4fqjCPN/0bDHipCsWNP /x/J+10qdAeKMAgA3H+TNiOgH0fm7tj9E7UBTr1QljzMVkzjq99IsPfVLFqVuiNjsLyB KzzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=E6DsyNENRp1V01ZyDB1B0mzpM+kBxMgO16i8wJaqKoI=; fh=8a8xluKn/dnI1CzUSCiCZqFRnDx+fqPWa3+l7HyFqSE=; b=ZOP1U2l+NecaSGX3hWArvZ9gfqT044iszE9xEnRYmEaJ0V0fC3f9Ga95F0cA3+jV/u 5J7eEuD3RHbkt5qTJJMm46Fo2vcwn3RuSov5FWHxR8WMVxOIJu8tQtCyBfzOfWRjS6lK Jkz9/12SKavLwJ4RwpJZv7nW796TQjZ9UdvTSYPru4SdL5fRw39ztAFdf6jndeCx7V7Q cwqjZXrbpyso52R7I6Au9G0a+2XTMT7d4lanxsOGLEtzTMWtd90gA856xAx0DeBNtUwd 3+jRhYvwUnJNGqcaxT97zF0uGobU/u4V6sEB5hqnQVkAKxK921mYewRlO3ehw+pLkBd4 9HRw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780444241; x=1781049041; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=E6DsyNENRp1V01ZyDB1B0mzpM+kBxMgO16i8wJaqKoI=; b=dPQ4x2C/mqda6FyH1MadWLwUUZz+KFg1d+Jj5DLqhTe5WoyftcQbu6F2J5M7qsqvFh 6CmOW6C13UJvuvqG8sg+8Fgsx3QXJDo8zVLsy58F949yjB1XpxywW//TjoXrHOydqMkr u9dOIEhokaZRPCUMIrOuskllB/t6tYxp8eYFGtBQTRIcSmYv68rRAw25+XcXP9svGCf9 D4sTm4ld/859ND8doN42CdYzNFfUg62uY5uD4LB/TLL8tVkahddD7zPDrUGsCCbydFnK upApMTo5sEsUSLTbDt5mSKyQ3vyWBvJ9uIo63hyn1Bz873VtczggmigRnUxeonTpixm/ HDLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780444241; x=1781049041; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=E6DsyNENRp1V01ZyDB1B0mzpM+kBxMgO16i8wJaqKoI=; b=ELTuLvqQidhD9fIa5F+yLiaq7m82+KlIFRxP2ty2/Oai9Tmlu2HcLNhlYfJd7/ZD8X DoGwjwVVEoJ9stG9+wEVork39G8QprLhsekfQ7xNXrsdKBdnRbs5vMuvsLqO2ZZlz0Fx rQSjYWdbnJuirAvf+f8cKeLyKFz++zpvgVKL1yy7teSzF8kwPQi9ExzP225rHamb68cT WLjZ1RhOmDtl1OrbFdhBhTQSTN2Ijxaq5GfuRxXIHrdpiOyrUeJUM+ZCWnROW/6oOVNZ FUOSp4mdvok1UghHe0fq2hOkgTWsw4RwAIzxQS/y2yTOmomsboI8fS1jVl+WOlLLsBmj +1zg== X-Forwarded-Encrypted: i=1; AFNElJ+7/EAs0j8UPsczTl3OHpucmH0EuFXrG0Z3mHcTjIzw6RuwtQ30i69agNvOZQUo9QdIRmTnZ1ATrQMkebi8@postgresql.org X-Gm-Message-State: AOJu0YxcDxTL01dKL/SlFydyiV7fk26DV462PdOha3qFSSbBeGJTbRku GpmT8msp78SuwX9YuCxBCT2n2suG4z8+LcvR4QXmeDF4aWdWt1wnelLQghzSdFxa3xxLgzE5sq3 p8G7nCQF9uCKraOWArrfgzJgHCMxlzw4= X-Gm-Gg: Acq92OHjHLooCIT9WVTKd4dtYVZXYJwyRkzLTHwjMjaj13JuHJf8D29AK5MJhvPxX7f 2rQjHcnWdzQMX2odeA4AGGuZ2rjQvNtDOrXW7NAoTyX3SZ+xEz2xtaaqmpaO2y3y0MNXYUSb/YU X5f6GhHSGvj1kdEvNr1XtoyKbtkgzCFqNrb00StD1DGkiz3TmtQ83aRHDgJKKR3HTNBA1xSl+7+ HrUL1sGWGTVZo9q/V/8g9l0ghx0b1/zO1/FY2HfaEcFhN1B/SWEdb7GD/rEALqmg3QRqK2iThEz m+EV7giz3JEM4jylpizZihWtAvL2q6lXEn5sque7cDJ/WgQFtCk= X-Received: by 2002:a17:907:3d02:b0:be8:93c2:cbae with SMTP id a640c23a62f3a-bf0b1cdd758mr24937866b.19.1780444240869; Tue, 02 Jun 2026 16:50:40 -0700 (PDT) MIME-Version: 1.0 References: <20260601.111119.1029884790276077667.ishii@postgresql.org> <20260601.114703.1561993497414705173.ishii@postgresql.org> In-Reply-To: Reply-To: assam258@gmail.com From: Henson Choi Date: Wed, 3 Jun 2026 08:50:28 +0900 X-Gm-Features: AVHnY4JHJKYxVyZS89dnvA26qFJOUdxGPskT_6Tn3UNKaKpkWk-50bx1BurQW4k Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii , jian he Cc: zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000084fc6806534dfa58" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000084fc6806534dfa58 Content-Type: text/plain; charset="UTF-8" Hi all, While going over the row pattern grammar I want to put on record why the empty pattern -- PATTERN (()) and the like -- is left unsupported, and why I think that is the right call for this series rather than an oversight. Today it is simply a syntax error. row_pattern_primary is either a variable (ColId) or a parenthesized group '(' row_pattern ')', and row_pattern has no empty production, so '()' never parses. The standard's pattern syntax does allow an empty row pattern -- it matches the empty sequence, i.e. it produces an empty match -- so the question is whether we should grow the grammar, plus an NFA "empty" element, to accept it. My claim is that we do not need to: every way an empty pattern can appear reduces to something we already handle, so a dedicated empty element in the executor would be dead weight. There are two cases. 1. The empty pattern is the whole pattern: PATTERN (()) This pattern has zero pattern variables. But DEFINE is mandatory (per ISO/IEC 19075-5, Table 18), an empty DEFINE list is rejected, and every DEFINE variable must appear in PATTERN -- otherwise we already error with "DEFINE variable \"%s\" is not used in PATTERN". A pattern with no variables cannot satisfy any of that: there is nothing for DEFINE to define, yet DEFINE can be neither omitted nor left empty. So an all-empty pattern is rejected by the existing rules, with no new check needed. (It is degenerate in any case. An empty match at every row maps to "no reduced frame" -- row_is_in_reduced_frame() returns -1 for RF_EMPTY_MATCH exactly as it does for RF_UNMATCHED -- so every row would simply be unmatched.) 2. The empty pattern is embedded: A () B Here '()' consumes no rows; it is the identity element under concatenation, so A () B is equivalent to A B. This stays entirely on the parser/optimizer side: the empty-pattern production would carry an empty AST node, and an identity fold in the SEQ simplification (next to the prefix/suffix and consecutive merges it already does) drops the '()' before the executor ever sees it. The runtime is untouched -- which is the whole point. And if the fold removes everything, the pattern has collapsed to case 1 and is rejected there. The AST rewrites it would add: Concatenation (identity element): A () B -> A B () A -> A A () -> A A () () B -> A B Group unwrap (single non-empty child): (A ()) -> (A) -> A (() A) -> (A) -> A Quantified empty (empty repeated is still empty): ()* ()+ ()? (){3} -> (removed) A ()* B -> A B Collapses to case 1, then rejected: (()) (() ()) ((())) -> () -> rejected as case 1 The one form that is not pure deletion is an empty alternative. Here the empty branch is optionality, not nothing, but it is exactly the '?' quantifier applied to the rest -- X | () is (X)?: X | () -> (X)? (and () | X -> (X)??, reluctant) A | () -> A? (single var: group is redundant) A{2} | () -> (A{2})? (the group is required here -- A{2}? would parse as a reluctant A{2}, not an optional one) A B | () -> (A B)? (X may be a sequence) A | B | () -> (A | B)? (... or an alternation) (A | ()) B -> A? B The branch order maps to greediness ('X | ()' greedy, '() | X' reluctant). Either way it rewrites to a group plus '?', both of which we already support, so it still needs no new machinery. The rewrites also compose across nesting. A nested empty alternation applies the rule at each level, producing a nested nullable quantifier: (A | ()) | () -> A? | () (inner A | () -> A?) -> (A?)? (outer X | () -> (X)?, X = A?) and (A?)? = A?, (A+)? = A*, (A*)? = A* are all ordinary nullable constructs, never a new shape: A? | () -> (A?)? (= A?) A+ | () -> (A+)? (= A*) A* | () -> (A*)? (= A*) Whether the optimizer collapses these to a single quantifier or leaves the group as-is doesn't matter here -- both are nullable groups the runtime already handles, so still no executor-level element. (The optimizer leaves these unfolded: a non-exact outer over a nullable child falls outside tryMultiplyQuantifiers' conservative guard, which skips the whole class that can have gaps (e.g. (A{2}){2,3} = {4,6}). That is an over-rejection, not a correctness defect -- folding the nullable cases would be a safe optional extension; see the aside below.) (Duplicate empty branches would be merged first by the alternation dedup, so A | () | () reduces through A | () to A? just the same.) Putting it together: across the board an empty pattern is either (a) rejected by the existing DEFINE rules (case 1), (b) the concatenation identity, removable by the AST simplification (case 2), or (c) an existing nullable construct ('?', the empty alternative), and none of these would ever need an executor-level empty element. Given that, and that the feature has little practical use while the current series is already sizable, I am leaving the empty pattern out of scope: it stays a syntax error. Concretely, accepting it would take a grammar production, an empty AST node, the concatenation-identity deletion in the SEQ pass, and (optionally) an extension of the quantifier composition to fold the nullable cases -- all new work, none of it in this series. We can revisit it as that small follow-up if anyone actually wants it. Thanks, Henson ------------------------------------------------------------------------ Aside (separable from the post above): quantifier-composition gaps, unrelated to the empty pattern ------------------------------------------------------------------------ Independent of the empty pattern, the AST-level quantifier-composition pass (tryMultiplyQuantifiers, rpr.c) over-rejects: it collapses a nested quantifier (X{p,q}){m,n} into a single X{...} only when the outer is exact (m == n), the child is {1,1}, or both are unbounded. Those are the always-safe sufficient conditions. But the full foldable class is larger: (X{p,q}){m,n} is foldable to X{m*p, n*q} whenever the reachable repetition counts -- the union over k in [m,n] of [k*p, k*q] -- are contiguous (no gaps). That holds for every nullable child (p == 0) and, more generally, whenever the child span q-p is wide enough to bridge the jump between k and k+1 iterations. The guard thus rejects a strict superset of the genuinely unsafe (gap-producing) patterns: a class of provably-safe folds is left undone, and those patterns reach the executor as nested groups -- handled correctly, just not normalized. This is an optimization over-rejection, not a correctness defect: the output is identical either way. The two rules side by side (outer {m,n} over child {p,q}, INF = unbounded): current guard -- folds iff: m = n (outer exact) OR (p = 1 AND q = 1) (child {1,1}) OR (q = INF AND n = INF AND NOT(m = 0 AND p >= 2)) (both unbounded) true safe set -- foldable iff the reachable counts are contiguous: m = n (single interval) OR p = 0 (nullable child) OR ( p <= max(m,1)*(q - p) + 1 (child span bridges the AND (m >= 1 OR p <= 1) ) k -> k+1 jump; child {1,1} is the p=q=1 instance) over-rejected = safe AND NOT current: the non-exact-outer (m != n) folds over a gap-free child -- p = 0 (nullable) or a span-bridging child. e.g. (A?)?, (A*)?, (A+)?, (A?){2,3}, (A+){0,2}, (A{1,3}){2,4} (all in the table below). The boundaries (A{2}){2,3} and (A{2,})* are NOT in the safe set (p = 2 fails p <= max(m,1)*(q-p)+1), so both rules reject them -- correctly; widening the guard must keep doing so. Verified on the branch -- EXPLAIN's "Pattern:" line shows the optimized form (a nested group means it was not folded; a flat quantifier means it was): folds today (correct): (A){2,3} -> a{2,3} (child {1,1}) (A{2,3}){2} -> a{4,6} (outer exact) (A*)* -> a* (both unbounded) gap-free but NOT folded (stays nested; "want" is the safe normal form): (A?)? -> (a?)? want a? (A*)? -> (a*)? want a* (A+)? -> (a+)? want a* (A?){2,3} -> (a?){2,3} want a{0,3} (A+){0,2} -> (a+){0,2} want a* (A{1,3}){2,4} -> (a{1,3}){2,4} want a{2,12} correctly NOT foldable (real gap -- guard is right here): (A{2}){2,3} -> (a{2}){2,3} reaches {4,6}, not 4..6 The first three "not folded" rows are the nullable forms the empty pattern would lean on; the next three show the gap is broader (a range outer over a non-{1,1} child can still be gap-free). The win is only normalization / a smaller NFA, so this is low priority. And this exact pass has a history of subtle gap bugs (the (A{2,})* over-flatten and the INF-sum merge, both recently fixed), so widening the guard toward the safe set above should be its own patch, with boundary tests covering the foldable cases and the two gaps -- (A{2}){2,3} and (A{2,})* -- that must stay un-folded. --00000000000084fc6806534dfa58 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

While going over the row pattern grammar I = want to put on record why the
empty pattern -- PATTERN (()) and the like= -- is left unsupported, and why
I think that is the right call for this= series rather than an oversight.

Today it is simply a syntax error.= =C2=A0row_pattern_primary is either a
variable (ColId) or a parenthesiz= ed group '(' row_pattern ')', and
row_pattern has no emp= ty production, so '()' never parses.=C2=A0 The standard's
pa= ttern syntax does allow an empty row pattern -- it matches the empty
seq= uence, i.e. it produces an empty match -- so the question is whether we
= should grow the grammar, plus an NFA "empty" element, to accept i= t.

My claim is that we do not need to: every way an empty pattern ca= n appear
reduces to something we already handle, so a dedicated empty el= ement in
the executor would be dead weight.=C2=A0 There are two cases.

1. The empty pattern is the whole pattern: PATTERN (())

Th= is pattern has zero pattern variables.=C2=A0 But DEFINE is mandatory (perISO/IEC 19075-5, Table 18), an empty DEFINE list is rejected, and everyDEFINE variable must appear in PATTERN -- otherwise we already error with=
"DEFINE variable \"%s\" is not used in PATTERN".=C2= =A0 A pattern with no
variables cannot satisfy any of that: there is not= hing for DEFINE to
define, yet DEFINE can be neither omitted nor left em= pty.=C2=A0 So an all-empty
pattern is rejected by the existing rules, wi= th no new check needed.

(It is degenerate in any case.=C2=A0 An empt= y match at every row maps to "no
reduced frame" -- row_is_in_r= educed_frame() returns -1 for RF_EMPTY_MATCH
exactly as it does for RF_U= NMATCHED -- so every row would simply be
unmatched.)


2. The e= mpty pattern is embedded: A () B

Here '()' consumes no rows;= it is the identity element under
concatenation, so A () B is equivalent= to A B.=C2=A0 This stays entirely on the
parser/optimizer side: the emp= ty-pattern production would carry an empty
AST node, and an identity fol= d in the SEQ simplification (next to the
prefix/suffix and consecutive m= erges it already does) drops the '()'
before the executor ever s= ees it.=C2=A0 The runtime is untouched -- which is
the whole point.=C2= =A0 And if the fold removes everything, the pattern has
collapsed to cas= e 1 and is rejected there.

The AST rewrites= it would add:

=C2=A0 Concatenation (identity element):
=C2=A0 = =C2=A0 A () B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0A B
=C2=A0 = =C2=A0 () A =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0A
=C2= =A0 =C2=A0 A () =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0A
= =C2=A0 =C2=A0 A () () B =C2=A0 =C2=A0 =C2=A0 -> =C2=A0A B

=C2=A0 = Group unwrap (single non-empty child):
=C2=A0 =C2=A0 (A ()) =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0(A) =C2=A0-> =C2=A0A
=C2=A0 =C2= =A0 (() A) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0(A) =C2=A0-> = =C2=A0A

=C2=A0 Quantified empty (empty repeated is still empty):
= =C2=A0 =C2=A0 ()* =C2=A0()+ =C2=A0()? =C2=A0(){3} =C2=A0 =C2=A0-> =C2=A0= (removed)
=C2=A0 =C2=A0 A ()* B =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2=A0= A B

=C2=A0 Collapses to case 1, then rejected:
=C2=A0 =C2=A0 (())= =C2=A0 =C2=A0(() ()) =C2=A0 =C2=A0((())) =C2=A0 =C2=A0-> =C2=A0() =C2= =A0-> rejected as case 1


The one form that is not pure del= etion is an empty alternative.=C2=A0 Here the
empty branch is optionalit= y, not nothing, but it is exactly the '?'
quantifier applied to = the rest -- X | () is (X)?:

=C2=A0 =C2=A0 X= | () =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0(X)? =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0(and =C2=A0() | X =C2=A0-> =C2=A0(X)??, reluctant)
= =C2=A0 =C2=A0 =C2=A0 A | () =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0A? =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(single var: group is redundant)
= =C2=A0 =C2=A0 =C2=A0 A{2} | () =C2=A0 =C2=A0 -> =C2=A0(A{2})? =C2=A0 =C2= =A0 =C2=A0 (the group is required here -- A{2}?
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would parse as a reluctant A{2}, n= ot
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0an op= tional one)
=C2=A0 =C2=A0 =C2=A0 A B | () =C2=A0 =C2=A0 =C2=A0-> =C2= =A0(A B)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 (X may be a sequence)
=C2=A0 =C2= =A0 =C2=A0 A | B | () =C2=A0 =C2=A0-> =C2=A0(A | B)? =C2=A0 =C2=A0 =C2= =A0 (... or an alternation)
=C2=A0 =C2=A0 (A | ()) B =C2=A0 =C2=A0 =C2= =A0-> =C2=A0A? B


The branch order maps to greediness ('= ;X | ()' greedy, '() | X' reluctant).
Either way it rewrites= to a group plus '?', both of which we already
support, so it st= ill needs no new machinery.

The rewrites also compose across nesting= .=C2=A0 A nested empty alternation
applies the rule at each level, produ= cing a nested nullable quantifier:

=C2=A0 = =C2=A0 (A | ()) | ()
=C2=A0 =C2=A0 =C2=A0 =C2=A0 -> =C2=A0A? | () =C2= =A0 =C2=A0 =C2=A0(inner =C2=A0 A | () =C2=A0-> =C2=A0A?)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 -> =C2=A0(A?)? =C2=A0 =C2=A0 =C2=A0 =C2=A0(outer =C2= =A0 X | () =C2=A0-> =C2=A0(X)?, =C2=A0X =3D A?)


and (A?)? = =3D A?, (A+)? =3D A*, (A*)? =3D A* are all ordinary nullable
constructs,= never a new shape:

=C2=A0 =C2=A0 A? | () = =C2=A0 -> =C2=A0(A?)? =C2=A0 =C2=A0(=3D A?)
=C2=A0 =C2=A0 A+ | () =C2= =A0 -> =C2=A0(A+)? =C2=A0 =C2=A0(=3D A*)
=C2=A0 =C2=A0 A* | () =C2=A0= -> =C2=A0(A*)? =C2=A0 =C2=A0(=3D A*)

Whether the optimize= r collapses these to a single quantifier or leaves the
group as-is doesn= 't matter here -- both are nullable groups the runtime
already handl= es, so still no executor-level element. =C2=A0(The optimizer
leaves thes= e unfolded: a non-exact outer over a nullable child falls
outside tryMul= tiplyQuantifiers' conservative guard, which skips the whole
class th= at can have gaps (e.g. (A{2}){2,3} =3D {4,6}).=C2=A0 That is an
over-rej= ection, not a correctness defect -- folding the nullable cases
would be = a safe optional extension; see the aside below.)

(Duplicate empty br= anches would be merged first by the alternation dedup,
so A | () | () re= duces through A | () to A? just the same.)


Putting it together: = across the board an empty pattern is either

=C2=A0 (a) rejected by the existing DEFINE rules (case 1),
=C2=A0 (b) t= he concatenation identity, removable by the AST simplification
=C2=A0 = =C2=A0 =C2=A0 (case 2), or
=C2=A0 (c) an existing nullable construct (&#= 39;?', the empty alternative),

and none of these would ev= er need an executor-level empty element.

Given that, and that the fe= ature has little practical use while the
current series is already sizab= le, I am leaving the empty pattern out of
scope: it stays a syntax error= .=C2=A0 Concretely, accepting it would take a
grammar production, an emp= ty AST node, the concatenation-identity deletion
in the SEQ pass, and (o= ptionally) an extension of the quantifier
composition to fold the nullab= le cases -- all new work, none of it in this
series.=C2=A0 We can revisi= t it as that small follow-up if anyone actually
wants it.

Thanks,=
Henson


----------------------------= --------------------------------------------
Aside (separable from the p= ost above): quantifier-composition gaps,
unrelated to the empty pattern<= br>------------------------------------------------------------------------=

Independent of the empty pattern, the AST-level quantifier-c= omposition
pass (tryMultiplyQuantifiers, rpr.c) over-rejects: it collaps= es a nested
quantifier (X{p,q}){m,n} into a single X{...} only when the = outer is exact
(m =3D=3D n), the child is {1,1}, or both are unbounded.= =C2=A0 Those are the
always-safe sufficient conditions.=C2=A0 But the fu= ll foldable class is larger:
(X{p,q}){m,n} is foldable to X{m*p, n*q} wh= enever the reachable repetition
counts -- the union over k in [m,n] of [= k*p, k*q] -- are contiguous (no
gaps).=C2=A0 That holds for every nullab= le child (p =3D=3D 0) and, more generally,
whenever the child span q-p i= s wide enough to bridge the jump between k and
k+1 iterations.=C2=A0 The= guard thus rejects a strict superset of the genuinely
unsafe (gap-produ= cing) patterns: a class of provably-safe folds is left
undone, and those= patterns reach the executor as nested groups -- handled
correctly, just= not normalized.=C2=A0 This is an optimization over-rejection,
not a cor= rectness defect: the output is identical either way.

The two rules side by side (outer {m,n} over child {p,q}, INF =3D= unbounded):

=C2=A0 current guard -- folds iff:
=C2=A0 =C2=A0 =C2= =A0 m =3D n =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (oute= r exact)
=C2=A0 =C2=A0 OR (p =3D 1 AND q =3D 1) =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(child {1= ,1})
=C2=A0 =C2=A0 OR (q =3D INF AND n =3D INF AND NOT(m =3D 0 AND p >= ;=3D 2)) =C2=A0 (both unbounded)

=C2=A0 true safe set -- foldable if= f the reachable counts are contiguous:
=C2=A0 =C2=A0 =C2=A0 m =3D n =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (single interval)=C2=A0 =C2=A0 OR p =3D 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0(nullable child)
=C2=A0 =C2=A0 OR ( p <=3D max(m,1)*(q - p)= + 1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(child span bri= dges the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AND (m >=3D 1 OR p <=3D= 1) ) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0k ->= k+1 jump; child {1,1}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0is the p=3Dq=3D1= instance)

=C2=A0 over-rejected =3D safe AND NOT current:
=C2=A0 = =C2=A0 the non-exact-outer (m !=3D n) folds over a gap-free child -- p =3D = 0
=C2=A0 =C2=A0 (nullable) or a span-bridging child. =C2=A0e.g. (A?)?, (= A*)?, (A+)?,
=C2=A0 =C2=A0 (A?){2,3}, (A+){0,2}, (A{1,3}){2,4} (all in t= he table below).
=C2=A0 =C2=A0 The boundaries (A{2}){2,3} and (A{2,})* a= re NOT in the safe set
=C2=A0 =C2=A0 (p =3D 2 fails p <=3D max(m,1)*(= q-p)+1), so both rules reject them --
=C2=A0 =C2=A0 correctly; widening = the guard must keep doing so.

Verified on the branch -- EXPLAIN'= s "Pattern:" line shows the optimized
form (a nested group mea= ns it was not folded; a flat quantifier means it
was):

=C2=A0 fol= ds today (correct):
=C2=A0 =C2=A0 (A){2,3} =C2=A0 =C2=A0 =C2=A0 =C2=A0-&= gt; a{2,3} =C2=A0 =C2=A0 =C2=A0 =C2=A0(child {1,1})
=C2=A0 =C2=A0 (A{2,3= }){2} =C2=A0 =C2=A0 -> a{4,6} =C2=A0 =C2=A0 =C2=A0 =C2=A0(outer exact)=C2=A0 =C2=A0 (A*)* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> a* =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(both unbounded)

=C2=A0 gap-free b= ut NOT folded (stays nested; "want" is the safe normal form):
= =C2=A0 =C2=A0 (A?)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> (a?)? =C2=A0 = =C2=A0 =C2=A0 =C2=A0 want a?
=C2=A0 =C2=A0 (A*)? =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 -> (a*)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 want a*
=C2=A0 =C2= =A0 (A+)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -> (a+)? =C2=A0 =C2=A0 =C2= =A0 =C2=A0 want a*
=C2=A0 =C2=A0 (A?){2,3} =C2=A0 =C2=A0 =C2=A0 -> (a= ?){2,3} =C2=A0 =C2=A0 want a{0,3}
=C2=A0 =C2=A0 (A+){0,2} =C2=A0 =C2=A0 = =C2=A0 -> (a+){0,2} =C2=A0 =C2=A0 want a*
=C2=A0 =C2=A0 (A{1,3}){2,4}= =C2=A0 -> (a{1,3}){2,4} want a{2,12}

=C2=A0 correctly NOT foldab= le (real gap -- guard is right here):
=C2=A0 =C2=A0 (A{2}){2,3} =C2=A0 = =C2=A0 -> (a{2}){2,3} =C2=A0 reaches {4,6}, not 4..6

The first three "not folded" rows are= the nullable forms the empty pattern
would lean on; the next three show= the gap is broader (a range outer over
a non-{1,1} child can still be g= ap-free).=C2=A0 The win is only normalization /
a smaller NFA, so this i= s low priority.=C2=A0 And this exact pass has a history
of subtle gap bu= gs (the (A{2,})* over-flatten and the INF-sum merge, both
recently fixed= ), so widening the guard toward the safe set above should be
its own pat= ch, with boundary tests covering the foldable cases and the two
gaps -- = (A{2}){2,3} and (A{2,})* -- that must stay un-folded.
--00000000000084fc6806534dfa58--