Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwNfs-003Ds3-1i for pgsql-hackers@arkaria.postgresql.org; Sat, 28 Feb 2026 17:00:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vwNfp-00B5ki-22 for pgsql-hackers@arkaria.postgresql.org; Sat, 28 Feb 2026 17:00:13 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vwNfp-00B5kZ-0c for pgsql-hackers@lists.postgresql.org; Sat, 28 Feb 2026 17:00:13 +0000 Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vwNfl-00000001kjI-3ILI for pgsql-hackers@postgresql.org; Sat, 28 Feb 2026 17:00:12 +0000 Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-2adc1d9ec56so21248315ad.0 for ; Sat, 28 Feb 2026 09:00:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772298008; cv=none; d=google.com; s=arc-20240605; b=FH2tK6xvH/LZortWm9yGp8V4d4+/ZYC/3Dt/VqM8pKXjhlUOKYdtekPQ4T5KItcz0y MEFG/wu/d5ugJD9OdbzP6oMrsGpHJ78HRzQLXK/MylklRohOPgBxCzCooHc8W1E+i0nW GuKGgnszYhDZQ0ETE7FJOL+d/5LmIzT1jtd6plnn96hmtn6X5CDZgQwX3dVwHO5SzqLH 2r1NwsCIGC5sQUCwd4MAJm3MmVjbdxpY1GzTgq4Q3ckUylSbqcUxYjvx5nEDh2vS0qPc ShFg7Zug4z6tmVGwDTiuAtyOr37JRUwMbg9bM07pb1fxysTX7BIgslUfWz//ib5/OgT3 EoxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=cyAbC78/bnDsD9zXuNXQpqMNmo0g2ZgrzIdHgvOVG8E=; fh=mJP5WgmpQQdAGqBs9hVqjzNyiz4j8pJWc1YULUfb51o=; b=NnhoDIJkTt2BvFSACVGpnroF0YWXkoCfJC76v5JNO8xJOjAGld2/SI9wpWppmNOLXH gTN+YP1LQ2tqhGOqLbVHXoDL76OJqT/oxB1pwS4TsMHC2nYN6tdyK5FdBCc3RJuEsRM5 eIC7/STGY3ce7SwjQb5/UrwKHRxE1SeFYYo5YuJP9IASi9jPd1H3nwfRvWBMal8qNC5C EkNXgSaXpS9t2W3Dvb5B/1MME6pcC879hWNcJuP1hC1S8YWQHmYGj2McXzV795SxpzLB i7VMcr8FVWUnb2V+O4SGalImLJxHDNFCm32aUG4P113tJ7a/6eDR5s9nNrfeTLI03OmR 7tkQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772298008; x=1772902808; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=cyAbC78/bnDsD9zXuNXQpqMNmo0g2ZgrzIdHgvOVG8E=; b=Uw1cRFxd73uAB6S7wSoYpaRHREWh78/9uUnfrSKolZozKCCwUnquFs/b+CvrY8DM3L QjpeBLv+tA1klLqk+YCoJ69quzEF/KKcF6Ew7hiFrLH2TcGLHf3CzCWBtv5Dm6WmpVfc Y5zuk5d5OEbjj4Qas0B+YhwbRiKB9LxKmlNTVtznoFU7Qd96En1cqUzUkiCdZcWjyhH+ PGYLaxMZ/55RFFM3WwJa1fAlVliNl5QTa0n9Dx4pQ/LrqmVWS9q0LPI7xCAgJNF6K6af kY68HeKoqPzc7VgDuUbqH11AQd0aGHUCiftiW228aT6hBS2XQTmkyTvt9FO9EnWbLl+Q 9F0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772298008; x=1772902808; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cyAbC78/bnDsD9zXuNXQpqMNmo0g2ZgrzIdHgvOVG8E=; b=iD6Wf4TBgcmgMGzabaIUyQSBFBmivZnYidMpsM6xlcHl+7HpD4V6Bmod4ZaVeiMiUm jsyvlzi8ORV5vS/TjTSt0rpCdi4plRKboX6DmtsC6UiA47s13NtXyTrRzpCMQ4AmuXHR hH4ZCPh+hZrYk95gk+qCkYuCEVdbLnKHRtium/bUnYS7uEmYo5/CLZxJKwVndgZAEUAT wUCshoSu0d/qLXgjXvTUe2OIcjKapECDDDDWrnYlQMWiwE3mZUW133Wh0BzopOqQ5nAy Sfz/b40LRhtKlqSsxOEPX4IhVHW5vpR+AkXoONGWr1emfliP3gMfTfdlFjSqkjdueiFj FxXg== X-Forwarded-Encrypted: i=1; AJvYcCWRoZFWvoMn4OAD/6F9pCy2dLWWaoX7ezjO0vsIH77WEMKi2Mkl0QUW9hWrhy0Wngp1U1Xj+kS/q5z1RykV@postgresql.org X-Gm-Message-State: AOJu0YxcE55Yc1yL0LQajadl+IgJlSnvE/WtbWy70mCMs+om5oQI+9Wj VKhJkrwYc76y3DovSLs+Gl9QISsHsP8SuAAeFlXSp9YEuY24FGkxy2PLwqGhm61/sx/ADoKyLTK E6VLxc11tAZgbowkZn5dzvck0Dq6BAUU= X-Gm-Gg: ATEYQzxM1y+ZSivPO5VAaU6szMvb++rx+ijQX1MpUU8iI+WWEvFh9jqZ6ZPtfbqikhY A6vxTrEj9b/CFWY/mJd9Rm96DgZcddKRxCuy4RGw0fGPvhffzoA3vHQbsYiLYIatVUQW8w/fS8q DD9Z5OL/YHqqRsabxeZ9PjbtJUInQc0oHCvWc0kJc5+GprENax8pmuNmEQKSUcal3DAI+ak2S0J rp1vsy4y+IGmAmU3I67pxE+r7t7/1m62duZqE6N3Adh2/VrCKD51KS+Sub3M0rHYSUZ97nLVx8D 6GzRS4I8yEr3WFLUCqJPdhngFEJSEXF6VQUcHR6d X-Received: by 2002:a17:903:320b:b0:2aa:e387:b83b with SMTP id d9443c01a7336-2ae2e4bbe3emr63163575ad.43.1772298007932; Sat, 28 Feb 2026 09:00:07 -0800 (PST) MIME-Version: 1.0 References: <20260224.140927.1828965853586507533.ishii@postgresql.org> <20260224.204417.914253025244188300.ishii@postgresql.org> <20260227.225456.33226875991025537.ishii@postgresql.org> In-Reply-To: <20260227.225456.33226875991025537.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Sun, 1 Mar 2026 01:59:56 +0900 X-Gm-Features: AaiRm51hKUDqfQq6v2qyFu7bBVXywxbpW-GoTTjvPF_H8PM4P6rLZeRzurNvzUw Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000032f260064be549c7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000032f260064be549c7 Content-Type: text/plain; charset="UTF-8" Hi hackers, I'd like to propose an extension to the context absorption optimization for Row Pattern Recognition in the window clause: PREFIX pattern absorption. This refines the "anchored pattern absorption" concept I posted earlier [1], renaming it to "PREFIX pattern absorption" per Ishii-san's feedback [2], and formalizing the absorption conditions with count-dominance analysis. It builds on the existing absorption framework to handle patterns where a fixed-length prefix precedes the unbounded greedy repetition, using a "shadow path" design. [1] https://www.postgresql.org/message-id/CAAAe_zAEg7sVM=WDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg@mail.gmail.com [2] https://www.postgresql.org/message-id/20260217.180053.64368434967157940.ishii@postgresql.org 1. Problem Definition ===================== 1.1 Current Absorption Scope Context absorption currently applies to patterns with an unbounded-start structure -- where an unbounded greedy repetition appears at the very beginning of the pattern. Example: A+ B -- the first element A+ is an unbounded repetition, so an older context always has a repeat count greater than or equal to any newer context, making immediate absorption possible. 1.2 The PREFIX Pattern Problem For patterns like START SECOND A+ B+, where a finite-length prefix (PREFIX) precedes the unbounded repetition (A+), absorption does not currently work. Why: - A new context starts at the START element, so it is not at the same pattern position as an older context in the A+ region. - The allStatesAbsorbable condition is not satisfied, so absorption never fires. - Contexts accumulate while traversing the PREFIX, causing O(n^2) regression. 2. Relationship to the Existing Framework ========================================== 2.1 Position in the Three-Phase Execution Model The Match -> Absorb -> Advance model is preserved as-is. PREFIX absorption is an extension of the Absorb phase's eligibility criteria, not a new phase. 2.2 Prerequisites The same four prerequisites as existing absorption apply: 1. Greedy quantifier -- A+ must be greedy. 2. SKIP TO PAST LAST ROW -- prior matches suppress subsequent start points. 3. UNBOUNDED FOLLOWING -- all contexts see the same future rows. 4. Match-start-independent DEFINE predicates -- including predicates of PREFIX elements. 2.3 Extension of the Dominance Order In existing absorption, the dominance condition is: C_old and C_new are in the absorbable region of the same element, and C_old.counts >= C_new.counts. In PREFIX patterns this condition cannot be directly satisfied: C_new is in the PREFIX region while C_old is in the BODY region. We therefore need a mechanism to track what C_new's state *would be* if it had already reached the BODY region. 3. Design: Shadow Path ======================= 3.1 Core Idea Each context logically executes two parallel paths: - Original path: executes the full pattern in order (PREFIX -> BODY). This is the path that produces the actual match result and evaluates PREFIX predicates. - Shadow path: skips the PREFIX and starts directly in the BODY region (A+). - Used solely for absorption eligibility decisions. - Does not produce match results. - Actually evaluates the BODY element's DEFINE predicates to track counts accurately. - The shadow is placed at the BODY start state upon creation and immediately advances, so its initial count is 1. - Not created if there is no preceding context (nothing to be absorbed into). - Discarded when the preceding context dies. - Discarded when the shadow leaves the absorbable region (BODY->TAIL transition or match failure). 3.2 Absorption Conditions For C_old to absorb C_new, all of the following must hold: 1. C_old is past the PREFIX and in the absorbable region (BODY). - hasAbsorbableState = true (existing flag) 2. C_new was created at or after the row where C_old entered the absorbable region. - "Same row" means: in the Match phase C_old enters BODY and C_new is created; the subsequent Absorb phase of that same row can then perform the absorption check. 3. C_new's shadow path is alive in the absorbable region. - allStatesAbsorbable = true (evaluated on the shadow) 4. C_old.counts[depth] >= C_new.shadow.counts[depth] - Same count-dominance condition as existing absorption (equality counts as dominance). - The shadow's count increases as it processes rows, so the comparison uses actual counts at absorption time. Condition 2 is the key insight. Only contexts created after C_old has passed through the PREFIX into the absorbable region are eligible for absorption. Contexts created while C_old is still in the PREFIX cannot be absorbed. 3.3 Common Fate Soundness argument for absorption: C_old's original path and C_new's shadow path are at the same BODY element (A+). By match-start independence, the DEFINE predicates of both paths produce identical results on the same row. If C_new's shadow can match at some future row r, then C_old can also complete a match at row r (it has matched at least as many times, sees the same future rows, and DEFINE evaluations are identical). Under SKIP TO PAST LAST ROW, C_old's match includes C_new's start point, so removing C_new does not lose any reported match. 3.4 Complexity With PREFIX absorption, the maximum number of simultaneously active contexts is bounded by PREFIX_length + 1. Every C_new created after C_old enters BODY is absorbed immediately, so the only contexts that can coexist are those still traversing the PREFIX plus the single C_old in BODY. Since PREFIX length is a per-pattern constant, overall execution complexity is O(n). 3.5 Removal of Non-Absorbable Contexts Contexts that do not satisfy condition 2 -- i.e., those created before C_old entered BODY -- cannot be removed by PREFIX absorption. These contexts are removed by the existing SKIP TO PAST LAST ROW mechanism, which is separate from PREFIX absorption. When C_old successfully matches, all subsequent contexts whose start points fall within C_old's match range are removed as overlaps. Since contexts created during the PREFIX region have start points within C_old's match range, they are automatically removed upon C_old's match completion. The number of such contexts is at most PREFIX_length, which does not affect the O(n) complexity (see Section 3.4). 4. Compile-Time Analysis ========================= 4.1 Pattern Structure Recognition At compile time, traverse the pattern elements from the start until the first unbounded greedy repetition is found. All preceding fixed-length elements form the PREFIX; the unbounded element is the BODY; everything after it is the TAIL. If the first element is already unbounded, the pattern has no PREFIX and existing absorption applies directly. Pattern: E1 E2 ... Ek A+ B+ ... \__________/ \__/ \___/ PREFIX BODY TAIL (absorbable) (non-absorbable) 5. Worked Example ================== Pattern: START SECOND A+ B+ PREFIX: START, SECOND (length 2) BODY: A+ (absorbable) TAIL: B+ (non-absorbable) Data: r0-r3 satisfy A only, r4-r5 satisfy B only, r6 satisfies neither A nor B. Row C1(orig) C2(orig) C2(shadow) C3(orig) C3(shadow) Active r0 START - - - - 1 r1 SECOND START A(cnt=1) - - 2 r2 A(cnt=1) SECOND A(cnt=2) START A(cnt=1) 3 ^ C1 reaches BODY -> C3: shadow A(cnt=1), created at C1's BODY entry -> absorbed (1 >= 1) -> C2: created before C1's BODY entry -> not absorbable 2 r3 A(cnt=2) A(cnt=1) A(cnt=3) - - 2 r4 B(cnt=1) B(cnt=1) (discard) - - 2 ^ TAIL ^ TAIL ^ shadow A+->B+ -> TAIL -> discard r5 B(cnt=2) B(cnt=2) - - - 2 r6 neither A nor B -> B+ ends -> C1 match (r0-r5) 0 C2 removed by SKIP TO PAST LAST ROW overlap (C2 start r1 falls within C1 match range r0-r5) Max concurrent contexts: 3 = PREFIX_length(2) + 1 Key observations: - C3 is immediately removed at r2 by PREFIX absorption (shadow-path- based). - C2's shadow is discarded at r4 when A+->B+ transition moves it into TAIL. - C2 is not absorbable (condition 2 not met), survives until r6, then removed by SKIP TO PAST LAST ROW overlap when C1 matches (Section 3.5). This is a design proposal at this stage. Thoughts and feedback welcome. Best regards, Henson --00000000000032f260064be549c7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi hackers,

I'd like to propose an extension to= the context absorption optimization
for Row Pattern Recognition in the = window clause: PREFIX pattern
absorption. This refines the "anchore= d pattern absorption" concept I
posted earlier [1], renaming it to = "PREFIX pattern absorption" per
Ishii-san's feedback [2], = and formalizing the absorption conditions
with count-dominance analysis.=

It builds on the existing absorption framework to handle patterns w= here
a fixed-length prefix precedes the unbounded greedy repetition, usi= ng a
"shadow path" design.

[1] https://www.postgresql.org/message-id/CAAAe_zAEg7sVM=3D= WDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg@mail.gmail.com
[2] https://www.postgresql.org/message-id/20260217.180053.643= 68434967157940.ishii@postgresql.org


1. Problem Definition
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

1.1 Current Absorption Scope

C= ontext absorption currently applies to patterns with an unbounded-start
= structure -- where an unbounded greedy repetition appears at the very
be= ginning of the pattern.

Example: A+ B -- the first element A+ is an = unbounded repetition, so an
older context always has a repeat count grea= ter than or equal to any
newer context, making immediate absorption poss= ible.

1.2 The PREFIX Pattern Problem

For patterns like START = SECOND A+ B+, where a finite-length prefix
(PREFIX) precedes the unbound= ed repetition (A+), absorption does not
currently work.

Why:
-= A new context starts at the START element, so it is not at the same
=C2= =A0 pattern position as an older context in the A+ region.
- The allStat= esAbsorbable condition is not satisfied, so absorption
=C2=A0 never fire= s.
- Contexts accumulate while traversing the PREFIX, causing O(n^2)
= =C2=A0 regression.


2. Relationship to the Existing Framework
= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

2.1 Position in = the Three-Phase Execution Model

The Match -> Absorb -> Advance= model is preserved as-is. PREFIX
absorption is an extension of the Abso= rb phase's eligibility criteria,
not a new phase.

2.2 Prerequ= isites

The same four prerequisites as existing absorption apply:
=
1. Greedy quantifier -- A+ must be greedy.
2. SKIP TO PAST LAST ROW = -- prior matches suppress subsequent start
=C2=A0 =C2=A0points.
3. UN= BOUNDED FOLLOWING -- all contexts see the same future rows.
4. Match-sta= rt-independent DEFINE predicates -- including predicates of
=C2=A0 =C2= =A0PREFIX elements.

2.3 Extension of the Dominance Order

In e= xisting absorption, the dominance condition is:

=C2=A0 =C2=A0 C_old = and C_new are in the absorbable region of the same element,
=C2=A0 =C2= =A0 and C_old.counts >=3D C_new.counts.

In PREFIX patterns this c= ondition cannot be directly satisfied: C_new is
in the PREFIX region whi= le C_old is in the BODY region. We therefore
need a mechanism to track w= hat C_new's state *would be* if it had
already reached the BODY regi= on.


3. Design: Shadow Path
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

3.1 Core Idea

Each conte= xt logically executes two parallel paths:

- Original path: executes = the full pattern in order (PREFIX -> BODY).
=C2=A0 This is the path t= hat produces the actual match result and evaluates
=C2=A0 PREFIX predica= tes.

- Shadow path: skips the PREFIX and starts directly in the BODY= region
=C2=A0 (A+).
=C2=A0 - Used solely for absorption eligibility = decisions.
=C2=A0 - Does not produce match results.
=C2=A0 - Actually= evaluates the BODY element's DEFINE predicates to track
=C2=A0 =C2= =A0 counts accurately.
=C2=A0 - The shadow is placed at the BODY start s= tate upon creation and
=C2=A0 =C2=A0 immediately advances, so its initia= l count is 1.
=C2=A0 - Not created if there is no preceding context (not= hing to be
=C2=A0 =C2=A0 absorbed into).
=C2=A0 - Discarded when the = preceding context dies.
=C2=A0 - Discarded when the shadow leaves the ab= sorbable region (BODY->TAIL
=C2=A0 =C2=A0 transition or match failure= ).

3.2 Absorption Conditions

For C_old to absorb C_new, all o= f the following must hold:

1. C_old is past the PREFIX and in the ab= sorbable region (BODY).
=C2=A0 =C2=A0- hasAbsorbableState =3D true (exis= ting flag)

2. C_new was created at or after the row where C_old ente= red the
=C2=A0 =C2=A0absorbable region.
=C2=A0 =C2=A0- "Same row= " means: in the Match phase C_old enters BODY and C_new
=C2=A0 =C2= =A0 =C2=A0is created; the subsequent Absorb phase of that same row can then=
=C2=A0 =C2=A0 =C2=A0perform the absorption check.

3. C_new's= shadow path is alive in the absorbable region.
=C2=A0 =C2=A0- allStates= Absorbable =3D true (evaluated on the shadow)

4. C_old.counts[depth]= >=3D C_new.shadow.counts[depth]
=C2=A0 =C2=A0- Same count-dominance = condition as existing absorption (equality
=C2=A0 =C2=A0 =C2=A0counts as= dominance).
=C2=A0 =C2=A0- The shadow's count increases as it proce= sses rows, so the
=C2=A0 =C2=A0 =C2=A0comparison uses actual counts at a= bsorption time.

Condition 2 is the key insight. Only contexts create= d after C_old has
passed through the PREFIX into the absorbable region a= re eligible for
absorption. Contexts created while C_old is still in the= PREFIX cannot
be absorbed.

3.3 Common Fate

Soundness argu= ment for absorption:

=C2=A0 =C2=A0 C_old's original path and C_n= ew's shadow path are at the same BODY
=C2=A0 =C2=A0 element (A+). By= match-start independence, the DEFINE predicates of
=C2=A0 =C2=A0 both p= aths produce identical results on the same row. If C_new's
=C2=A0 = =C2=A0 shadow can match at some future row r, then C_old can also complete<= br>=C2=A0 =C2=A0 a match at row r (it has matched at least as many times, s= ees the
=C2=A0 =C2=A0 same future rows, and DEFINE evaluations are ident= ical). Under SKIP
=C2=A0 =C2=A0 TO PAST LAST ROW, C_old's match incl= udes C_new's start point, so
=C2=A0 =C2=A0 removing C_new does not l= ose any reported match.

3.4 Complexity

With PREFIX absorption= , the maximum number of simultaneously active
contexts is bounded by PRE= FIX_length + 1. Every C_new created after
C_old enters BODY is absorbed = immediately, so the only contexts that can
coexist are those still trave= rsing the PREFIX plus the single C_old in
BODY. Since PREFIX length is a= per-pattern constant, overall execution
complexity is O(n).

3.5 = Removal of Non-Absorbable Contexts

Contexts that do not satisfy cond= ition 2 -- i.e., those created before
C_old entered BODY -- cannot be re= moved by PREFIX absorption.

These contexts are removed by the existi= ng SKIP TO PAST LAST ROW
mechanism, which is separate from PREFIX absorp= tion. When C_old
successfully matches, all subsequent contexts whose sta= rt points fall
within C_old's match range are removed as overlaps. S= ince contexts
created during the PREFIX region have start points within = C_old's match
range, they are automatically removed upon C_old's= match completion.

The number of such contexts is at most PREFIX_len= gth, which does not
affect the O(n) complexity (see Section 3.4).

4. Compile-Time Analysis
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

4.1 Pattern Structure Recognition<= br>
At compile time, traverse the pattern elements from the start until = the
first unbounded greedy repetition is found. All preceding fixed-leng= th
elements form the PREFIX; the unbounded element is the BODY; everythi= ng
after it is the TAIL. If the first element is already unbounded, the<= br>pattern has no PREFIX and existing absorption applies directly.

= =C2=A0 =C2=A0 Pattern: E1 E2 ... Ek =C2=A0A+ =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0B+ ...
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\__________= / =C2=A0\__/ =C2=A0 =C2=A0 =C2=A0 =C2=A0 \___/
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 PREFIX =C2=A0 =C2=A0 =C2=A0 =C2=A0BODY =C2=A0 =C2= =A0 =C2=A0 =C2=A0 TAIL
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(absorbable) =C2=A0(non-abs= orbable)


5. Worked Example
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D

Pattern: START SECOND A+ B+
PREFIX: START, = SECOND (length 2)
BODY: A+ (absorbable)
TAIL: B+ (non-absorbable)
=
Data: r0-r3 satisfy A only, r4-r5 satisfy B only, r6 satisfies neither<= br>A nor B.

Row =C2=A0C1(orig) =C2=A0 C2(orig) =C2=A0 C2(shadow) =C2= =A0C3(orig) =C2=A0 C3(shadow) =C2=A0Active
r0 =C2=A0 START =C2=A0 =C2=A0= =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A01
r1 =C2=A0 SECOND =C2=A0 =C2=A0 =C2=A0START =C2=A0 =C2=A0 =C2=A0 = A(cnt=3D1) =C2=A0 =C2=A0- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A02
r2 =C2=A0 A(cnt=3D1) =C2=A0 =C2=A0SECOND =C2= =A0 =C2=A0 =C2=A0A(cnt=3D2) =C2=A0 =C2=A0START =C2=A0 =C2=A0 =C2=A0 A(cnt= =3D1) =C2=A0 3
=C2=A0 =C2=A0 =C2=A0^ C1 reaches BODY
=C2=A0 =C2=A0 = =C2=A0-> C3: shadow A(cnt=3D1), created at C1's BODY entry
=C2=A0= =C2=A0 =C2=A0 =C2=A0 -> absorbed (1 >=3D 1)
=C2=A0 =C2=A0 =C2=A0-= > C2: created before C1's BODY entry
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = -> not absorbable =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 2
r3 =C2=A0 A(cnt=3D2) =C2=A0 =C2=A0A(cnt=3D1) =C2=A0 =C2=A0A(cnt= =3D3) =C2=A0 =C2=A0- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A02
r4 =C2=A0 B(cnt=3D1) =C2=A0 =C2=A0B(cnt=3D1) =C2=A0 = =C2=A0(discard) =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A02
=C2=A0 =C2=A0 =C2=A0^ TAIL =C2=A0 =C2=A0 =C2=A0^ = TAIL =C2=A0 =C2=A0 =C2=A0^ shadow A+->B+ -> TAIL -> discard
r5 = =C2=A0 B(cnt=3D2) =C2=A0 =C2=A0B(cnt=3D2) =C2=A0 =C2=A0- =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A02
r6 =C2=A0 neither A nor B -> B+ ends -> C1 matc= h (r0-r5) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 0
=C2=A0 =C2=A0 =C2= =A0C2 removed by SKIP TO PAST LAST ROW overlap
=C2=A0 =C2=A0 =C2=A0(C2 s= tart r1 falls within C1 match range r0-r5)

Max concurrent contexts: = 3 =3D PREFIX_length(2) + 1

Key observations:
- C3 is immediately = removed at r2 by PREFIX absorption (shadow-path-
=C2=A0 based).
- C2&= #39;s shadow is discarded at r4 when A+->B+ transition moves it into
= =C2=A0 TAIL.
- C2 is not absorbable (condition 2 not met), survives unti= l r6, then
=C2=A0 removed by SKIP TO PAST LAST ROW overlap when C1 match= es
=C2=A0 (Section 3.5).

This is a design proposal at this= stage. Thoughts and feedback welcome.

Best regards,
Henson
--00000000000032f260064be549c7--