public inbox for [email protected]
help / color / mirror / Atom feedFrom: Henson Choi <[email protected]>
To: SungJun Jang <[email protected]>
Cc: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Wed, 11 Mar 2026 19:32:15 +0900
Message-ID: <CAAAe_zBY+Eqk_DQpS0cy1Eb67H9v92tyaf+U+SKKuLGUs_z+EA@mail.gmail.com> (raw)
In-Reply-To: <CAE+cgNgVWChqF-f-s4zT18V+oK1y9UOvORq9JX7jFKj4=r_taw@mail.gmail.com>
References: <CAAAe_zCvHz=tHkggP37OoQH8R0ux4j_CJdBTiPUg-L=6cVMyWg@mail.gmail.com>
<[email protected]>
<CAAAe_zAn2nFgM_gfsEDYu+MXCArRFoP6s9bRz2bP4X5HNmnYww@mail.gmail.com>
<[email protected]>
<CAE+cgNgVWChqF-f-s4zT18V+oK1y9UOvORq9JX7jFKj4=r_taw@mail.gmail.com>
Hi SungJun,
Thanks for the cross-validation report.
This appears to be due to its NFA implementation combined with Context
> Absorption,
>
which discards redundant matching contexts when they reach the same NFA
> state
>
but start later in the input.
>
> Example:
>
> Ctx1 start=0 length=2
> Ctx2 start=1 length=1
>
> Ctx1 subsumes Ctx2, so Ctx2 is discarded.
>
> This keeps the number of active contexts small and prevents quadratic
> growth.
>
Your description is accurate. An earlier-started context subsumes
all matches of a later-started context (monotonicity principle),
so the newer one can be safely removed.
However, Context Absorption is not universally applicable. It
requires careful analysis of several factors, and we continue to
refine the conditions under which absorption can be safely applied.
The current scope of absorption is determined by three factors:
1. Pattern structure:
Currently supported:
- Simple unbounded quantifiers: A+, (A B)+
- Leading unbounded quantifier in group: (A+ B)+
- First iteration of nested unbounded alternation (e.g., ((A+)|(B+))+)
Planned improvements:
- Patterns with a prefix before the repeating group (e.g., A (B+) C)
- Fixed-length iteration groups (e.g., (A B{2})+ where each
iteration consumes exactly 3 rows)
2. Frame specification:
- AFTER MATCH SKIP TO NEXT ROW is not eligible (only
SKIP PAST LAST ROW)
- Frame end must be UNBOUNDED FOLLOWING
3. DEFINE clause evaluation:
- Absorption relies on the monotonicity assumption that DEFINE
conditions produce the same result regardless of starting
position. Currently DEFINE uses simple boolean conditions
where this holds. However, future additions such as
CLASSIFIER() or running aggregates (e.g., RUNNING SUM())
may introduce context-dependent evaluation that breaks
this invariance, requiring absorption to be disabled for
those cases.
That said, many practically useful patterns fall within these
conditions. Here are examples from our regression tests using
synthetic stock data:
V-shape recovery (decline then rise):
PATTERN (DECLINE{4,} RISE{4,})
DEFINE
DECLINE AS price < PREV(price),
RISE AS price > PREV(price)
W-bottom (decline, bounce, re-decline, recovery):
PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,} RECOVER{3,})
DEFINE
DECLINE AS price < PREV(price),
BOUNCE AS price > PREV(price),
DIP AS price < PREV(price),
RECOVER AS price > PREV(price)
Consolidation then breakout:
PATTERN (FLAT{5,} BREAKOUT)
DEFINE
FLAT AS price BETWEEN PREV(price) * 0.98 AND PREV(price) * 1.02,
BREAKOUT AS price > PREV(price) * 1.05
Price-volume divergence (bearish signal):
PATTERN (INIT DIVERGE{3,})
DEFINE
DIVERGE AS price > PREV(price) AND volume < PREV(volume)
Volume climax reversal:
PATTERN (RALLY{3,} CLIMAX SELLOFF{2,})
DEFINE
RALLY AS price > PREV(price),
CLIMAX AS volume > PREV(volume) * 1.5,
SELLOFF AS price < PREV(price)
All of these use unbounded quantifiers with position-invariant
DEFINE conditions -- exactly the cases where Context Absorption
guarantees O(n) scaling. So despite the constraints, it remains
a powerful optimization for the workloads where RPR is most
commonly applied.
Best regards,
Henson
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <CAAAe_zBY+Eqk_DQpS0cy1Eb67H9v92tyaf+U+SKKuLGUs_z+EA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox