public inbox for [email protected]  
help / color / mirror / Atom feed
From: Henson Choi <[email protected]>
To: SungJun Jang <[email protected]>
Cc: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Wed, 11 Mar 2026 19:32:15 +0900
Message-ID: <CAAAe_zBY+Eqk_DQpS0cy1Eb67H9v92tyaf+U+SKKuLGUs_z+EA@mail.gmail.com> (raw)
In-Reply-To: <CAE+cgNgVWChqF-f-s4zT18V+oK1y9UOvORq9JX7jFKj4=r_taw@mail.gmail.com>
References: <CAAAe_zCvHz=tHkggP37OoQH8R0ux4j_CJdBTiPUg-L=6cVMyWg@mail.gmail.com>
	<[email protected]>
	<CAAAe_zAn2nFgM_gfsEDYu+MXCArRFoP6s9bRz2bP4X5HNmnYww@mail.gmail.com>
	<[email protected]>
	<CAE+cgNgVWChqF-f-s4zT18V+oK1y9UOvORq9JX7jFKj4=r_taw@mail.gmail.com>

Hi SungJun,

Thanks for the cross-validation report.

This appears to be due to its NFA implementation combined with Context
> Absorption,
>
which discards redundant matching contexts when they reach the same NFA
> state
>
but start later in the input.
>
> Example:
>
> Ctx1 start=0 length=2
> Ctx2 start=1 length=1
>
> Ctx1 subsumes Ctx2, so Ctx2 is discarded.
>
> This keeps the number of active contexts small and prevents quadratic
> growth.
>

Your description is accurate. An earlier-started context subsumes
all matches of a later-started context (monotonicity principle),
so the newer one can be safely removed.

However, Context Absorption is not universally applicable. It
requires careful analysis of several factors, and we continue to
refine the conditions under which absorption can be safely applied.

The current scope of absorption is determined by three factors:

1. Pattern structure:
   Currently supported:
   - Simple unbounded quantifiers: A+, (A B)+
   - Leading unbounded quantifier in group: (A+ B)+
   - First iteration of nested unbounded alternation (e.g., ((A+)|(B+))+)
   Planned improvements:
   - Patterns with a prefix before the repeating group (e.g., A (B+) C)
   - Fixed-length iteration groups (e.g., (A B{2})+ where each
     iteration consumes exactly 3 rows)

2. Frame specification:
   - AFTER MATCH SKIP TO NEXT ROW is not eligible (only
     SKIP PAST LAST ROW)
   - Frame end must be UNBOUNDED FOLLOWING

3. DEFINE clause evaluation:
   - Absorption relies on the monotonicity assumption that DEFINE
     conditions produce the same result regardless of starting
     position. Currently DEFINE uses simple boolean conditions
     where this holds. However, future additions such as
     CLASSIFIER() or running aggregates (e.g., RUNNING SUM())
     may introduce context-dependent evaluation that breaks
     this invariance, requiring absorption to be disabled for
     those cases.

That said, many practically useful patterns fall within these
conditions. Here are examples from our regression tests using
synthetic stock data:

  V-shape recovery (decline then rise):
    PATTERN (DECLINE{4,} RISE{4,})
    DEFINE
      DECLINE AS price < PREV(price),
      RISE AS price > PREV(price)

  W-bottom (decline, bounce, re-decline, recovery):
    PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,} RECOVER{3,})
    DEFINE
      DECLINE AS price < PREV(price),
      BOUNCE AS price > PREV(price),
      DIP AS price < PREV(price),
      RECOVER AS price > PREV(price)

  Consolidation then breakout:
    PATTERN (FLAT{5,} BREAKOUT)
    DEFINE
      FLAT AS price BETWEEN PREV(price) * 0.98 AND PREV(price) * 1.02,
      BREAKOUT AS price > PREV(price) * 1.05

  Price-volume divergence (bearish signal):
    PATTERN (INIT DIVERGE{3,})
    DEFINE
      DIVERGE AS price > PREV(price) AND volume < PREV(volume)

  Volume climax reversal:
    PATTERN (RALLY{3,} CLIMAX SELLOFF{2,})
    DEFINE
      RALLY AS price > PREV(price),
      CLIMAX AS volume > PREV(volume) * 1.5,
      SELLOFF AS price < PREV(price)

All of these use unbounded quantifiers with position-invariant
DEFINE conditions -- exactly the cases where Context Absorption
guarantees O(n) scaling. So despite the constraints, it remains
a powerful optimization for the workloads where RPR is most
commonly applied.

Best regards,
Henson


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Row pattern recognition
  In-Reply-To: <CAAAe_zBY+Eqk_DQpS0cy1Eb67H9v92tyaf+U+SKKuLGUs_z+EA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox