Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Grg-001lXb-1T for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 10:32:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0Gre-008TjN-09 for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 10:32:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0Grd-008TjF-24 for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 10:32:30 +0000 Received: from mail-pl1-x632.google.com ([2607:f8b0:4864:20::632]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0Grc-00000001au8-17Sm for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 10:32:29 +0000 Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-2ae50a33ff8so74728595ad.3 for ; Wed, 11 Mar 2026 03:32:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773225147; cv=none; d=google.com; s=arc-20240605; b=TO+L4tYxhISWp7z5V6N5+m2T3Vion1iXJP1yzlzfobPrtaRlJjt4F+0KYS8uQvDCOd TUgGl9lRIx5/5+79CoQhHvKEoRnz25cmhMc/uGu2BA5b5NpCudcupAtnGzx6SB4rDnwV j/plFLXH1FRklFS/ezqr2uCDvI9ZoN8FtqVrkSrZSZCZKJfnXbVCTxQpQM/aoUDolmSN mNkuxLogjXvxL6CTxJsh0SxyIiXKGnLE7xKf/N08EETJRZFfBkqfO5igEadetV1jKtOW Rhf0/edVhVqMyVOzs7PKt+jHMHjHZ6TaSh3YII1YPXvItX3S4gL1kbntQv4QDl9cTTu7 rczQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=oPAnLRv4TpkGTHj01LVEWGlOXz+SChl0EEZa93ENM9A=; fh=F2MFbjrnbMU4B0JV/DPraLz8gUTMX82ywvS7iFQX2sE=; b=B1WNODkwEiHNbqGdFfav4yIzZBLlqrsQPCBTsIuP2r8kmy6l3F08wAelDSSdzmXfFG jevfNBX/Ptuiteqn11uoITqPcu+tHFywU8T/H5a0E24WoY+v1hHFzIcFXSzS9j0zpcOg rnGvVZ3nq2426EMjdAdImnKaYQiqhpHTNrnaB4KBUx117pjn9eIR12o53TnKkcWWEx7V pPCF5FUKGNY01nMe3T89BOIFcNA3mo71TCQf/pgj0DUNWWW+oBqM6JeN3nCyGie4Zw4s Xtn5p+cUf8AWVxq/1D5LTKOT/a66w+fBYFZaVuX9ZUtoq6af+nXQ9Y/hkjEOC/ajcTjt iYSw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773225147; x=1773829947; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oPAnLRv4TpkGTHj01LVEWGlOXz+SChl0EEZa93ENM9A=; b=miU23CCOR2MWv+zDv0QnONz0HDv9ECx83Ub8bR31rpevwJ56EJmwtDMRkClZ1cSN2E qod45qSfo2Muk0dVTvfcrSAV+cvKMQVae2P93cT+s9nkKO2AI+B/Q4pbC7YCCi1S6qAX 29BKtSxYDNIBOAj7AuIhNbCngKeB3ew41XImgJ6JGK/iYfX7I6MknIf3ImVzJ1LEJRck Zgx+VF6p6BrEOE8W0bTLK6PyxHPDtrw3Qrr3Fj0p3+cKLyJQ081+YmoLyG1/LWeH9z/N OoHPIVuldSl5uXFMEDSbR4Uwkq1Dh7+OVSVXgoL9rdPbsMp5zfDnQSL2WhvFR8dZly4l V+kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773225147; x=1773829947; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oPAnLRv4TpkGTHj01LVEWGlOXz+SChl0EEZa93ENM9A=; b=ZKGKCV4Nr2kbVs6jd6sqlnCpxcYourjtYVcYFoHNNv8XK/v3vL5Z0yDNec5eWE+/+b ka6ZogT8ZjcLWPyUaS7PZf1bA5ZVQGvdQNrC5qRJAxAiPRrgGdNguMCyJ6pW9VvQa5OY RPtFcPmz6cYgvr2KYwSa/ZxBqnwMnl3/br7PAKUsDxpOrkx/ALo4Cc4PHYsBveWvJTVs S5u4ftBrSA7Xb+KlgAljyL3bNYsBdDnPmItNrCvVi3BFvYACiCoY2Oya3c1cc5xlbhTe nA4AGLtl2/nJsW14wk9VTs55D5aPcFvTlGcmoj3Nr2JGaTVFKMhc2DQhjQoUEfHCEdeb YxmA== X-Forwarded-Encrypted: i=1; AJvYcCVddsYEgrGrv8QQTKivsye7jPl7IhaRpK+3tq/yeW5OcPivZ1+J8ap+67zYw281gyaLGGydiIS38lJTcqux@postgresql.org X-Gm-Message-State: AOJu0YyoTqgLy6886Jgxs+nz8kR/k/meoyELh1TjgMBmwU4n/3w3ypiw VDw39YoDdRqdGyDE3XH0IZkAdcRxaGNawZhpsIeJiJPvwYQsmFAxkPOKCQsmirEyeGVGOIpmD9N 5LD7gsBef78+k3lxc00oSorqckCpdLPQ= X-Gm-Gg: ATEYQzy+FgkD+jfT3ahINqiSNG63Rv0S1L8494PK2sEaFeEPsW08jGJy6rlEtgu83k0 n2XL0Niu4b6sBEPvH8GnalBzy9EB5co4qeL4D94Oo+3EoEanDUU9rRaxDzjcj+c7K8hhDjbkDFj CwROY5ckrpCrMsOIYWX0XBLBrMu2Yo6aOO38VJ61acsxtELGFREHJGY5dEG7qMECdLEgxdmn0df tc4Ougylv+USCwdgqe+VGi7OTJ+vDJpIaxZaTvclFX/NZODWOYjyy3rVEnihf85FM9tOpTjNEmY tT474sZkg0jcL1oCsZDb2msRTGUvFgmo3r4pcMLT X-Received: by 2002:a17:903:1b4c:b0:2ad:e975:4735 with SMTP id d9443c01a7336-2aeae7f4b9amr22234705ad.20.1773225147341; Wed, 11 Mar 2026 03:32:27 -0700 (PDT) MIME-Version: 1.0 References: <20260309.120802.1845076903520202301.ishii@postgresql.org> <20260309.142202.1739855502263731478.ishii@postgresql.org> In-Reply-To: Reply-To: assam258@gmail.com From: Henson Choi Date: Wed, 11 Mar 2026 19:32:15 +0900 X-Gm-Features: AaiRm52GHiIy2V3XhGLK2GGkiZSC_1ruYJ3TN7LlVh6OT_kzAhxAS5oiHMrYxMg Message-ID: Subject: Re: Row pattern recognition To: SungJun Jang Cc: Tatsuo Ishii , vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000039e2a064cbd27eb" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000039e2a064cbd27eb Content-Type: text/plain; charset="UTF-8" Hi SungJun, Thanks for the cross-validation report. This appears to be due to its NFA implementation combined with Context > Absorption, > which discards redundant matching contexts when they reach the same NFA > state > but start later in the input. > > Example: > > Ctx1 start=0 length=2 > Ctx2 start=1 length=1 > > Ctx1 subsumes Ctx2, so Ctx2 is discarded. > > This keeps the number of active contexts small and prevents quadratic > growth. > Your description is accurate. An earlier-started context subsumes all matches of a later-started context (monotonicity principle), so the newer one can be safely removed. However, Context Absorption is not universally applicable. It requires careful analysis of several factors, and we continue to refine the conditions under which absorption can be safely applied. The current scope of absorption is determined by three factors: 1. Pattern structure: Currently supported: - Simple unbounded quantifiers: A+, (A B)+ - Leading unbounded quantifier in group: (A+ B)+ - First iteration of nested unbounded alternation (e.g., ((A+)|(B+))+) Planned improvements: - Patterns with a prefix before the repeating group (e.g., A (B+) C) - Fixed-length iteration groups (e.g., (A B{2})+ where each iteration consumes exactly 3 rows) 2. Frame specification: - AFTER MATCH SKIP TO NEXT ROW is not eligible (only SKIP PAST LAST ROW) - Frame end must be UNBOUNDED FOLLOWING 3. DEFINE clause evaluation: - Absorption relies on the monotonicity assumption that DEFINE conditions produce the same result regardless of starting position. Currently DEFINE uses simple boolean conditions where this holds. However, future additions such as CLASSIFIER() or running aggregates (e.g., RUNNING SUM()) may introduce context-dependent evaluation that breaks this invariance, requiring absorption to be disabled for those cases. That said, many practically useful patterns fall within these conditions. Here are examples from our regression tests using synthetic stock data: V-shape recovery (decline then rise): PATTERN (DECLINE{4,} RISE{4,}) DEFINE DECLINE AS price < PREV(price), RISE AS price > PREV(price) W-bottom (decline, bounce, re-decline, recovery): PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,} RECOVER{3,}) DEFINE DECLINE AS price < PREV(price), BOUNCE AS price > PREV(price), DIP AS price < PREV(price), RECOVER AS price > PREV(price) Consolidation then breakout: PATTERN (FLAT{5,} BREAKOUT) DEFINE FLAT AS price BETWEEN PREV(price) * 0.98 AND PREV(price) * 1.02, BREAKOUT AS price > PREV(price) * 1.05 Price-volume divergence (bearish signal): PATTERN (INIT DIVERGE{3,}) DEFINE DIVERGE AS price > PREV(price) AND volume < PREV(volume) Volume climax reversal: PATTERN (RALLY{3,} CLIMAX SELLOFF{2,}) DEFINE RALLY AS price > PREV(price), CLIMAX AS volume > PREV(volume) * 1.5, SELLOFF AS price < PREV(price) All of these use unbounded quantifiers with position-invariant DEFINE conditions -- exactly the cases where Context Absorption guarantees O(n) scaling. So despite the constraints, it remains a powerful optimization for the workloads where RPR is most commonly applied. Best regards, Henson --000000000000039e2a064cbd27eb Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi SungJun,

Thanks for the cross-v= alidation report.

This appears to be due to its NFA im= plementation combined with Context Absorption,=C2=A0
which discards redundant matching contexts w= hen they reach the same NFA state=C2=A0
=
but start later in the input.

Example:

Ctx1 = start=3D0 length=3D2
Ctx2 start=3D1 length=3D1

Ctx1 subsumes Ctx2= , so Ctx2 is discarded.

This keeps the number of active contexts sma= ll and prevents quadratic growth.

Your des= cription is accurate. An earlier-started context subsumes
all matches of= a later-started context (monotonicity principle),
so the newer one can = be safely removed.

However, Context Absorption is not universally ap= plicable. It
requires careful analysis of several factors, and we contin= ue to
refine the conditions under which absorption can be safely applied= .

The current scope of absorption is determined by three factors:
1. Pattern structure:
=C2=A0 =C2=A0Currently supported:
=C2=A0 = =C2=A0- Simple unbounded quantifiers: A+, (A B)+
=C2=A0 =C2=A0- Leading = unbounded quantifier in group: (A+ B)+
=C2=A0 =C2=A0- First iteration of= nested unbounded alternation (e.g., ((A+)|(B+))+)
=C2=A0 =C2=A0Planned = improvements:
=C2=A0 =C2=A0- Patterns with a prefix before the repeating= group (e.g., A (B+) C)
=C2=A0 =C2=A0- Fixed-length iteration groups (e.= g., (A B{2})+ where each
=C2=A0 =C2=A0 =C2=A0iteration consumes exactly = 3 rows)

2. Frame specification:
=C2=A0 =C2=A0- AFTER MATCH SKIP T= O NEXT ROW is not eligible (only
=C2=A0 =C2=A0 =C2=A0SKIP PAST LAST ROW)=
=C2=A0 =C2=A0- Frame end must be UNBOUNDED FOLLOWING

3. DEFINE c= lause evaluation:
=C2=A0 =C2=A0- Absorption relies on the monotonicity a= ssumption that DEFINE
=C2=A0 =C2=A0 =C2=A0conditions produce the same re= sult regardless of starting
=C2=A0 =C2=A0 =C2=A0position. Currently DEFI= NE uses simple boolean conditions
=C2=A0 =C2=A0 =C2=A0where this holds. = However, future additions such as
=C2=A0 =C2=A0 =C2=A0CLASSIFIER() or ru= nning aggregates (e.g., RUNNING SUM())
=C2=A0 =C2=A0 =C2=A0may introduce= context-dependent evaluation that breaks
=C2=A0 =C2=A0 =C2=A0this invar= iance, requiring absorption to be disabled for
=C2=A0 =C2=A0 =C2=A0those= cases.

That said, many practically useful patterns fall within thes= e
conditions. Here are examples from our regression tests using
synth= etic stock data:

=C2=A0 V-shape recovery (decline then rise):
=C2= =A0 =C2=A0 PATTERN (DECLINE{4,} RISE{4,})
=C2=A0 =C2=A0 DEFINE
=C2=A0= =C2=A0 =C2=A0 DECLINE AS price < PREV(price),
=C2=A0 =C2=A0 =C2=A0 R= ISE AS price > PREV(price)

=C2=A0 W-bottom (decline, bounce, re-d= ecline, recovery):
=C2=A0 =C2=A0 PATTERN (DECLINE{3,} BOUNCE{3,} DIP{3,}= RECOVER{3,})
=C2=A0 =C2=A0 DEFINE
=C2=A0 =C2=A0 =C2=A0 DECLINE AS pr= ice < PREV(price),
=C2=A0 =C2=A0 =C2=A0 BOUNCE AS price > PREV(pri= ce),
=C2=A0 =C2=A0 =C2=A0 DIP AS price < PREV(price),
=C2=A0 =C2= =A0 =C2=A0 RECOVER AS price > PREV(price)

=C2=A0 Consolidation th= en breakout:
=C2=A0 =C2=A0 PATTERN (FLAT{5,} BREAKOUT)
=C2=A0 =C2=A0 = DEFINE
=C2=A0 =C2=A0 =C2=A0 FLAT AS price BETWEEN PREV(price) * 0.98 AND= PREV(price) * 1.02,
=C2=A0 =C2=A0 =C2=A0 BREAKOUT AS price > PREV(pr= ice) * 1.05

=C2=A0 Price-volume divergence (bearish signal):
=C2= =A0 =C2=A0 PATTERN (INIT DIVERGE{3,})
=C2=A0 =C2=A0 DEFINE
=C2=A0 =C2= =A0 =C2=A0 DIVERGE AS price > PREV(price) AND volume < PREV(volume)
=C2=A0 Volume climax reversal:
=C2=A0 =C2=A0 PATTERN (RALLY{3,} CL= IMAX SELLOFF{2,})
=C2=A0 =C2=A0 DEFINE
=C2=A0 =C2=A0 =C2=A0 RALLY AS = price > PREV(price),
=C2=A0 =C2=A0 =C2=A0 CLIMAX AS volume > PREV(= volume) * 1.5,
=C2=A0 =C2=A0 =C2=A0 SELLOFF AS price < PREV(price)
All of these use unbounded quantifiers with position-invariant
DEFI= NE conditions -- exactly the cases where Context Absorption
guarantees O= (n) scaling. So despite the constraints, it remains
a powerful optimizat= ion for the workloads where RPR is most
commonly applied.

Best re= gards,
Henson=C2=A0
--000000000000039e2a064cbd27eb--