MIME-Version: 1.0
References: 
 <CAAAe_zDDuJofafXyhggPPPLzUXeejH-19NfcLR7jNNQxZchtog@mail.gmail.com>
 <20260601.111119.1029884790276077667.ishii@postgresql.org>
 <CAAAe_zBhzHyazC_Ot+HXBe_nToKc7AHs4r-s0nNcaBPw0L17wA@mail.gmail.com>
 <20260601.114703.1561993497414705173.ishii@postgresql.org>
 <CAAAe_zCZc8s4zWfmVVUOt0y_FU=v7YTcJJJ4UL2gBzJ2_KkUmQ@mail.gmail.com>
 <CACJufxHVADC8e77pnQxSZRk7SYHCZFk6ZCM2HfTsKyD_kUji0A@mail.gmail.com>
 <CAAAe_zAZDuHSiVGvz9c6h=Pe=aN+FKZOrdNPfbTOk3XV+WFKYQ@mail.gmail.com>
In-Reply-To: 
 <CAAAe_zAZDuHSiVGvz9c6h=Pe=aN+FKZOrdNPfbTOk3XV+WFKYQ@mail.gmail.com>
Reply-To: assam258@gmail.com
From: Henson Choi <assam258@gmail.com>
Date: Wed, 3 Jun 2026 08:50:28 +0900
Message-ID: 
 <CAAAe_zDz3z2Paidk3jHOm9S3eMVLoXRxK0Lyo=5i_9-EfSH7fA@mail.gmail.com>
Subject: Re: Row pattern recognition
To: Tatsuo Ishii <ishii@postgresql.org>, jian he <jian.universality@gmail.com>
Cc: zsolt.parragi@percona.com, sjjang112233@gmail.com,
 vik@postgresfriends.org,
	er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com,
	peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org
Content-Type: multipart/alternative; boundary="00000000000084fc6806534dfa58"
Archived-At: 
 <https://www.postgresql.org/message-id/CAAAe_zDz3z2Paidk3jHOm9S3eMVLoXRxK0Lyo%3D5i_9-EfSH7fA%40mail.gmail.com>
Precedence: bulk

--00000000000084fc6806534dfa58
Content-Type: text/plain; charset="UTF-8"

Hi all,

While going over the row pattern grammar I want to put on record why the
empty pattern -- PATTERN (()) and the like -- is left unsupported, and why
I think that is the right call for this series rather than an oversight.

Today it is simply a syntax error.  row_pattern_primary is either a
variable (ColId) or a parenthesized group '(' row_pattern ')', and
row_pattern has no empty production, so '()' never parses.  The standard's
pattern syntax does allow an empty row pattern -- it matches the empty
sequence, i.e. it produces an empty match -- so the question is whether we
should grow the grammar, plus an NFA "empty" element, to accept it.

My claim is that we do not need to: every way an empty pattern can appear
reduces to something we already handle, so a dedicated empty element in
the executor would be dead weight.  There are two cases.


1. The empty pattern is the whole pattern: PATTERN (())

This pattern has zero pattern variables.  But DEFINE is mandatory (per
ISO/IEC 19075-5, Table 18), an empty DEFINE list is rejected, and every
DEFINE variable must appear in PATTERN -- otherwise we already error with
"DEFINE variable \"%s\" is not used in PATTERN".  A pattern with no
variables cannot satisfy any of that: there is nothing for DEFINE to
define, yet DEFINE can be neither omitted nor left empty.  So an all-empty
pattern is rejected by the existing rules, with no new check needed.

(It is degenerate in any case.  An empty match at every row maps to "no
reduced frame" -- row_is_in_reduced_frame() returns -1 for RF_EMPTY_MATCH
exactly as it does for RF_UNMATCHED -- so every row would simply be
unmatched.)


2. The empty pattern is embedded: A () B

Here '()' consumes no rows; it is the identity element under
concatenation, so A () B is equivalent to A B.  This stays entirely on the
parser/optimizer side: the empty-pattern production would carry an empty
AST node, and an identity fold in the SEQ simplification (next to the
prefix/suffix and consecutive merges it already does) drops the '()'
before the executor ever sees it.  The runtime is untouched -- which is
the whole point.  And if the fold removes everything, the pattern has
collapsed to case 1 and is rejected there.

The AST rewrites it would add:

  Concatenation (identity element):
    A () B          ->  A B
    () A            ->  A
    A ()            ->  A
    A () () B       ->  A B

  Group unwrap (single non-empty child):
    (A ())          ->  (A)  ->  A
    (() A)          ->  (A)  ->  A

  Quantified empty (empty repeated is still empty):
    ()*  ()+  ()?  (){3}    ->  (removed)
    A ()* B         ->  A B

  Collapses to case 1, then rejected:
    (())    (() ())    ((()))    ->  ()  -> rejected as case 1

The one form that is not pure deletion is an empty alternative.  Here the
empty branch is optionality, not nothing, but it is exactly the '?'
quantifier applied to the rest -- X | () is (X)?:

    X | ()          ->  (X)?          (and  () | X  ->  (X)??, reluctant)
      A | ()        ->  A?            (single var: group is redundant)
      A{2} | ()     ->  (A{2})?       (the group is required here -- A{2}?
                                       would parse as a reluctant A{2}, not
                                       an optional one)
      A B | ()      ->  (A B)?         (X may be a sequence)
      A | B | ()    ->  (A | B)?       (... or an alternation)
    (A | ()) B      ->  A? B

The branch order maps to greediness ('X | ()' greedy, '() | X' reluctant).
Either way it rewrites to a group plus '?', both of which we already
support, so it still needs no new machinery.

The rewrites also compose across nesting.  A nested empty alternation
applies the rule at each level, producing a nested nullable quantifier:

    (A | ()) | ()
        ->  A? | ()      (inner   A | ()  ->  A?)
        ->  (A?)?        (outer   X | ()  ->  (X)?,  X = A?)

and (A?)? = A?, (A+)? = A*, (A*)? = A* are all ordinary nullable
constructs, never a new shape:

    A? | ()   ->  (A?)?    (= A?)
    A+ | ()   ->  (A+)?    (= A*)
    A* | ()   ->  (A*)?    (= A*)

Whether the optimizer collapses these to a single quantifier or leaves the
group as-is doesn't matter here -- both are nullable groups the runtime
already handles, so still no executor-level element.  (The optimizer
leaves these unfolded: a non-exact outer over a nullable child falls
outside tryMultiplyQuantifiers' conservative guard, which skips the whole
class that can have gaps (e.g. (A{2}){2,3} = {4,6}).  That is an
over-rejection, not a correctness defect -- folding the nullable cases
would be a safe optional extension; see the aside below.)

(Duplicate empty branches would be merged first by the alternation dedup,
so A | () | () reduces through A | () to A? just the same.)


Putting it together: across the board an empty pattern is either

  (a) rejected by the existing DEFINE rules (case 1),
  (b) the concatenation identity, removable by the AST simplification
      (case 2), or
  (c) an existing nullable construct ('?', the empty alternative),

and none of these would ever need an executor-level empty element.

Given that, and that the feature has little practical use while the
current series is already sizable, I am leaving the empty pattern out of
scope: it stays a syntax error.  Concretely, accepting it would take a
grammar production, an empty AST node, the concatenation-identity deletion
in the SEQ pass, and (optionally) an extension of the quantifier
composition to fold the nullable cases -- all new work, none of it in this
series.  We can revisit it as that small follow-up if anyone actually
wants it.

Thanks,
Henson


------------------------------------------------------------------------
Aside (separable from the post above): quantifier-composition gaps,
unrelated to the empty pattern
------------------------------------------------------------------------

Independent of the empty pattern, the AST-level quantifier-composition
pass (tryMultiplyQuantifiers, rpr.c) over-rejects: it collapses a nested
quantifier (X{p,q}){m,n} into a single X{...} only when the outer is exact
(m == n), the child is {1,1}, or both are unbounded.  Those are the
always-safe sufficient conditions.  But the full foldable class is larger:
(X{p,q}){m,n} is foldable to X{m*p, n*q} whenever the reachable repetition
counts -- the union over k in [m,n] of [k*p, k*q] -- are contiguous (no
gaps).  That holds for every nullable child (p == 0) and, more generally,
whenever the child span q-p is wide enough to bridge the jump between k and
k+1 iterations.  The guard thus rejects a strict superset of the genuinely
unsafe (gap-producing) patterns: a class of provably-safe folds is left
undone, and those patterns reach the executor as nested groups -- handled
correctly, just not normalized.  This is an optimization over-rejection,
not a correctness defect: the output is identical either way.

The two rules side by side (outer {m,n} over child {p,q}, INF = unbounded):

  current guard -- folds iff:
      m = n                                       (outer exact)
    OR (p = 1 AND q = 1)                          (child {1,1})
    OR (q = INF AND n = INF AND NOT(m = 0 AND p >= 2))   (both unbounded)

  true safe set -- foldable iff the reachable counts are contiguous:
      m = n                                       (single interval)
    OR p = 0                                      (nullable child)
    OR ( p <= max(m,1)*(q - p) + 1                (child span bridges the
         AND (m >= 1 OR p <= 1) )                  k -> k+1 jump; child
{1,1}
                                                   is the p=q=1 instance)

  over-rejected = safe AND NOT current:
    the non-exact-outer (m != n) folds over a gap-free child -- p = 0
    (nullable) or a span-bridging child.  e.g. (A?)?, (A*)?, (A+)?,
    (A?){2,3}, (A+){0,2}, (A{1,3}){2,4} (all in the table below).
    The boundaries (A{2}){2,3} and (A{2,})* are NOT in the safe set
    (p = 2 fails p <= max(m,1)*(q-p)+1), so both rules reject them --
    correctly; widening the guard must keep doing so.

Verified on the branch -- EXPLAIN's "Pattern:" line shows the optimized
form (a nested group means it was not folded; a flat quantifier means it
was):

  folds today (correct):
    (A){2,3}        -> a{2,3}        (child {1,1})
    (A{2,3}){2}     -> a{4,6}        (outer exact)
    (A*)*           -> a*            (both unbounded)

  gap-free but NOT folded (stays nested; "want" is the safe normal form):
    (A?)?           -> (a?)?         want a?
    (A*)?           -> (a*)?         want a*
    (A+)?           -> (a+)?         want a*
    (A?){2,3}       -> (a?){2,3}     want a{0,3}
    (A+){0,2}       -> (a+){0,2}     want a*
    (A{1,3}){2,4}   -> (a{1,3}){2,4} want a{2,12}

  correctly NOT foldable (real gap -- guard is right here):
    (A{2}){2,3}     -> (a{2}){2,3}   reaches {4,6}, not 4..6

The first three "not folded" rows are the nullable forms the empty pattern
would lean on; the next three show the gap is broader (a range outer over
a non-{1,1} child can still be gap-free).  The win is only normalization /
a smaller NFA, so this is low priority.  And this exact pass has a history
of subtle gap bugs (the (A{2,})* over-flatten and the INF-sum merge, both
recently fixed), so widening the guard toward the safe set above should be
its own patch, with boundary tests covering the foldable cases and the two
gaps -- (A{2}){2,3} and (A{2,})* -- that must stay un-folded.

--00000000000084fc6806534dfa58
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi all,<br><br>While going over the row pattern grammar I =
want to put on record why the<br>empty pattern -- PATTERN (()) and the like=
 -- is left unsupported, and why<br>I think that is the right call for this=
 series rather than an oversight.<br><br>Today it is simply a syntax error.=
 =C2=A0row_pattern_primary is either a<br>variable (ColId) or a parenthesiz=
ed group &#39;(&#39; row_pattern &#39;)&#39;, and<br>row_pattern has no emp=
ty production, so &#39;()&#39; never parses.=C2=A0 The standard&#39;s<br>pa=
ttern syntax does allow an empty row pattern -- it matches the empty<br>seq=
uence, i.e. it produces an empty match -- so the question is whether we<br>=
should grow the grammar, plus an NFA &quot;empty&quot; element, to accept i=
t.<br><br>My claim is that we do not need to: every way an empty pattern ca=
n appear<br>reduces to something we already handle, so a dedicated empty el=
ement in<br>the executor would be dead weight.=C2=A0 There are two cases.<b=
r><br><br>1. The empty pattern is the whole pattern: PATTERN (())<br><br>Th=
is pattern has zero pattern variables.=C2=A0 But DEFINE is mandatory (per<b=
r>ISO/IEC 19075-5, Table 18), an empty DEFINE list is rejected, and every<b=
r>DEFINE variable must appear in PATTERN -- otherwise we already error with=
<br>&quot;DEFINE variable \&quot;%s\&quot; is not used in PATTERN&quot;.=C2=
=A0 A pattern with no<br>variables cannot satisfy any of that: there is not=
hing for DEFINE to<br>define, yet DEFINE can be neither omitted nor left em=
pty.=C2=A0 So an all-empty<br>pattern is rejected by the existing rules, wi=
th no new check needed.<br><br>(It is degenerate in any case.=C2=A0 An empt=
y match at every row maps to &quot;no<br>reduced frame&quot; -- row_is_in_r=
educed_frame() returns -1 for RF_EMPTY_MATCH<br>exactly as it does for RF_U=
NMATCHED -- so every row would simply be<br>unmatched.)<br><br><br>2. The e=
mpty pattern is embedded: A () B<br><br>Here &#39;()&#39; consumes no rows;=
 it is the identity element under<br>concatenation, so A () B is equivalent=
 to A B.=C2=A0 This stays entirely on the<br>parser/optimizer side: the emp=
ty-pattern production would carry an empty<br>AST node, and an identity fol=
d in the SEQ simplification (next to the<br>prefix/suffix and consecutive m=
erges it already does) drops the &#39;()&#39;<br>before the executor ever s=
ees it.=C2=A0 The runtime is untouched -- which is<br>the whole point.=C2=
=A0 And if the fold removes everything, the pattern has<br>collapsed to cas=
e 1 and is rejected there.<br><br><font face=3D"monospace">The AST rewrites=
 it would add:<br><br>=C2=A0 Concatenation (identity element):<br>=C2=A0 =
=C2=A0 A () B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0A B<br>=C2=A0 =
=C2=A0 () A =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0A<br>=C2=
=A0 =C2=A0 A () =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0A<br>=
=C2=A0 =C2=A0 A () () B =C2=A0 =C2=A0 =C2=A0 -&gt; =C2=A0A B<br><br>=C2=A0 =
Group unwrap (single non-empty child):<br>=C2=A0 =C2=A0 (A ()) =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0(A) =C2=A0-&gt; =C2=A0A<br>=C2=A0 =C2=
=A0 (() A) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0(A) =C2=A0-&gt; =
=C2=A0A<br><br>=C2=A0 Quantified empty (empty repeated is still empty):<br>=
=C2=A0 =C2=A0 ()* =C2=A0()+ =C2=A0()? =C2=A0(){3} =C2=A0 =C2=A0-&gt; =C2=A0=
(removed)<br>=C2=A0 =C2=A0 A ()* B =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt; =C2=A0=
A B<br><br>=C2=A0 Collapses to case 1, then rejected:<br>=C2=A0 =C2=A0 (())=
 =C2=A0 =C2=A0(() ()) =C2=A0 =C2=A0((())) =C2=A0 =C2=A0-&gt; =C2=A0() =C2=
=A0-&gt; rejected as case 1</font><br><br>The one form that is not pure del=
etion is an empty alternative.=C2=A0 Here the<br>empty branch is optionalit=
y, not nothing, but it is exactly the &#39;?&#39;<br>quantifier applied to =
the rest -- X | () is (X)?:<br><br><font face=3D"monospace">=C2=A0 =C2=A0 X=
 | () =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0(X)? =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0(and =C2=A0() | X =C2=A0-&gt; =C2=A0(X)??, reluctant)<br>=
=C2=A0 =C2=A0 =C2=A0 A | () =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0A? =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(single var: group is redundant)<br>=
=C2=A0 =C2=A0 =C2=A0 A{2} | () =C2=A0 =C2=A0 -&gt; =C2=A0(A{2})? =C2=A0 =C2=
=A0 =C2=A0 (the group is required here -- A{2}?<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0would parse as a reluctant A{2}, n=
ot<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0an op=
tional one)<br>=C2=A0 =C2=A0 =C2=A0 A B | () =C2=A0 =C2=A0 =C2=A0-&gt; =C2=
=A0(A B)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 (X may be a sequence)<br>=C2=A0 =C2=
=A0 =C2=A0 A | B | () =C2=A0 =C2=A0-&gt; =C2=A0(A | B)? =C2=A0 =C2=A0 =C2=
=A0 (... or an alternation)<br>=C2=A0 =C2=A0 (A | ()) B =C2=A0 =C2=A0 =C2=
=A0-&gt; =C2=A0A? B</font><br><br>The branch order maps to greediness (&#39=
;X | ()&#39; greedy, &#39;() | X&#39; reluctant).<br>Either way it rewrites=
 to a group plus &#39;?&#39;, both of which we already<br>support, so it st=
ill needs no new machinery.<br><br>The rewrites also compose across nesting=
.=C2=A0 A nested empty alternation<br>applies the rule at each level, produ=
cing a nested nullable quantifier:<br><br><font face=3D"monospace">=C2=A0 =
=C2=A0 (A | ()) | ()<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt; =C2=A0A? | () =C2=
=A0 =C2=A0 =C2=A0(inner =C2=A0 A | () =C2=A0-&gt; =C2=A0A?)<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 -&gt; =C2=A0(A?)? =C2=A0 =C2=A0 =C2=A0 =C2=A0(outer =C2=
=A0 X | () =C2=A0-&gt; =C2=A0(X)?, =C2=A0X =3D A?)</font><br><br>and (A?)? =
=3D A?, (A+)? =3D A*, (A*)? =3D A* are all ordinary nullable<br>constructs,=
 never a new shape:<br><br><font face=3D"monospace">=C2=A0 =C2=A0 A? | () =
=C2=A0 -&gt; =C2=A0(A?)? =C2=A0 =C2=A0(=3D A?)<br>=C2=A0 =C2=A0 A+ | () =C2=
=A0 -&gt; =C2=A0(A+)? =C2=A0 =C2=A0(=3D A*)<br>=C2=A0 =C2=A0 A* | () =C2=A0=
 -&gt; =C2=A0(A*)? =C2=A0 =C2=A0(=3D A*)<br></font><br>Whether the optimize=
r collapses these to a single quantifier or leaves the<br>group as-is doesn=
&#39;t matter here -- both are nullable groups the runtime<br>already handl=
es, so still no executor-level element. =C2=A0(The optimizer<br>leaves thes=
e unfolded: a non-exact outer over a nullable child falls<br>outside tryMul=
tiplyQuantifiers&#39; conservative guard, which skips the whole<br>class th=
at can have gaps (e.g. (A{2}){2,3} =3D {4,6}).=C2=A0 That is an<br>over-rej=
ection, not a correctness defect -- folding the nullable cases<br>would be =
a safe optional extension; see the aside below.)<br><br>(Duplicate empty br=
anches would be merged first by the alternation dedup,<br>so A | () | () re=
duces through A | () to A? just the same.)<br><br><br>Putting it together: =
across the board an empty pattern is either<br><br><font face=3D"monospace"=
>=C2=A0 (a) rejected by the existing DEFINE rules (case 1),<br>=C2=A0 (b) t=
he concatenation identity, removable by the AST simplification<br>=C2=A0 =
=C2=A0 =C2=A0 (case 2), or<br>=C2=A0 (c) an existing nullable construct (&#=
39;?&#39;, the empty alternative),<br><br></font>and none of these would ev=
er need an executor-level empty element.<br><br>Given that, and that the fe=
ature has little practical use while the<br>current series is already sizab=
le, I am leaving the empty pattern out of<br>scope: it stays a syntax error=
.=C2=A0 Concretely, accepting it would take a<br>grammar production, an emp=
ty AST node, the concatenation-identity deletion<br>in the SEQ pass, and (o=
ptionally) an extension of the quantifier<br>composition to fold the nullab=
le cases -- all new work, none of it in this<br>series.=C2=A0 We can revisi=
t it as that small follow-up if anyone actually<br>wants it.<br><br>Thanks,=
<br>Henson<br><br><br><font face=3D"monospace">----------------------------=
--------------------------------------------<br>Aside (separable from the p=
ost above): quantifier-composition gaps,<br>unrelated to the empty pattern<=
br>------------------------------------------------------------------------=
<br></font><br>Independent of the empty pattern, the AST-level quantifier-c=
omposition<br>pass (tryMultiplyQuantifiers, rpr.c) over-rejects: it collaps=
es a nested<br>quantifier (X{p,q}){m,n} into a single X{...} only when the =
outer is exact<br>(m =3D=3D n), the child is {1,1}, or both are unbounded.=
=C2=A0 Those are the<br>always-safe sufficient conditions.=C2=A0 But the fu=
ll foldable class is larger:<br>(X{p,q}){m,n} is foldable to X{m*p, n*q} wh=
enever the reachable repetition<br>counts -- the union over k in [m,n] of [=
k*p, k*q] -- are contiguous (no<br>gaps).=C2=A0 That holds for every nullab=
le child (p =3D=3D 0) and, more generally,<br>whenever the child span q-p i=
s wide enough to bridge the jump between k and<br>k+1 iterations.=C2=A0 The=
 guard thus rejects a strict superset of the genuinely<br>unsafe (gap-produ=
cing) patterns: a class of provably-safe folds is left<br>undone, and those=
 patterns reach the executor as nested groups -- handled<br>correctly, just=
 not normalized.=C2=A0 This is an optimization over-rejection,<br>not a cor=
rectness defect: the output is identical either way.<br><br><font face=3D"m=
onospace">The two rules side by side (outer {m,n} over child {p,q}, INF =3D=
 unbounded):<br><br>=C2=A0 current guard -- folds iff:<br>=C2=A0 =C2=A0 =C2=
=A0 m =3D n =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (oute=
r exact)<br>=C2=A0 =C2=A0 OR (p =3D 1 AND q =3D 1) =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(child {1=
,1})<br>=C2=A0 =C2=A0 OR (q =3D INF AND n =3D INF AND NOT(m =3D 0 AND p &gt=
;=3D 2)) =C2=A0 (both unbounded)<br><br>=C2=A0 true safe set -- foldable if=
f the reachable counts are contiguous:<br>=C2=A0 =C2=A0 =C2=A0 m =3D n =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (single interval)<b=
r>=C2=A0 =C2=A0 OR p =3D 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0(nullable child)<br>=C2=A0 =C2=A0 OR ( p &lt;=3D max(m,1)*(q - p)=
 + 1 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(child span bri=
dges the<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0AND (m &gt;=3D 1 OR p &lt;=3D=
 1) ) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0k -&gt;=
 k+1 jump; child {1,1}<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0is the p=3Dq=3D1=
 instance)<br><br>=C2=A0 over-rejected =3D safe AND NOT current:<br>=C2=A0 =
=C2=A0 the non-exact-outer (m !=3D n) folds over a gap-free child -- p =3D =
0<br>=C2=A0 =C2=A0 (nullable) or a span-bridging child. =C2=A0e.g. (A?)?, (=
A*)?, (A+)?,<br>=C2=A0 =C2=A0 (A?){2,3}, (A+){0,2}, (A{1,3}){2,4} (all in t=
he table below).<br>=C2=A0 =C2=A0 The boundaries (A{2}){2,3} and (A{2,})* a=
re NOT in the safe set<br>=C2=A0 =C2=A0 (p =3D 2 fails p &lt;=3D max(m,1)*(=
q-p)+1), so both rules reject them --<br>=C2=A0 =C2=A0 correctly; widening =
the guard must keep doing so.<br><br>Verified on the branch -- EXPLAIN&#39;=
s &quot;Pattern:&quot; line shows the optimized<br>form (a nested group mea=
ns it was not folded; a flat quantifier means it<br>was):<br><br>=C2=A0 fol=
ds today (correct):<br>=C2=A0 =C2=A0 (A){2,3} =C2=A0 =C2=A0 =C2=A0 =C2=A0-&=
gt; a{2,3} =C2=A0 =C2=A0 =C2=A0 =C2=A0(child {1,1})<br>=C2=A0 =C2=A0 (A{2,3=
}){2} =C2=A0 =C2=A0 -&gt; a{4,6} =C2=A0 =C2=A0 =C2=A0 =C2=A0(outer exact)<b=
r>=C2=A0 =C2=A0 (A*)* =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt; a* =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(both unbounded)<br><br>=C2=A0 gap-free b=
ut NOT folded (stays nested; &quot;want&quot; is the safe normal form):<br>=
=C2=A0 =C2=A0 (A?)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt; (a?)? =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 want a?<br>=C2=A0 =C2=A0 (A*)? =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 -&gt; (a*)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 want a*<br>=C2=A0 =C2=
=A0 (A+)? =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt; (a+)? =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 want a*<br>=C2=A0 =C2=A0 (A?){2,3} =C2=A0 =C2=A0 =C2=A0 -&gt; (a=
?){2,3} =C2=A0 =C2=A0 want a{0,3}<br>=C2=A0 =C2=A0 (A+){0,2} =C2=A0 =C2=A0 =
=C2=A0 -&gt; (a+){0,2} =C2=A0 =C2=A0 want a*<br>=C2=A0 =C2=A0 (A{1,3}){2,4}=
 =C2=A0 -&gt; (a{1,3}){2,4} want a{2,12}<br><br>=C2=A0 correctly NOT foldab=
le (real gap -- guard is right here):<br>=C2=A0 =C2=A0 (A{2}){2,3} =C2=A0 =
=C2=A0 -&gt; (a{2}){2,3} =C2=A0 reaches {4,6}, not 4..6</font><div><font fa=
ce=3D"monospace"><br></font>The first three &quot;not folded&quot; rows are=
 the nullable forms the empty pattern<br>would lean on; the next three show=
 the gap is broader (a range outer over<br>a non-{1,1} child can still be g=
ap-free).=C2=A0 The win is only normalization /<br>a smaller NFA, so this i=
s low priority.=C2=A0 And this exact pass has a history<br>of subtle gap bu=
gs (the (A{2,})* over-flatten and the INF-sum merge, both<br>recently fixed=
), so widening the guard toward the safe set above should be<br>its own pat=
ch, with boundary tests covering the foldable cases and the two<br>gaps -- =
(A{2}){2,3} and (A{2,})* -- that must stay un-folded.<br></div></div>

--00000000000084fc6806534dfa58--