MIME-Version: 1.0
References: <20260207.133143.1951390390459539453.ishii@postgresql.org>
 <CAAAe_zDoO47FzUBHQU-0CgYD63boZNi-BJSG5Q87Jr5w5bGcdw@mail.gmail.com>
 <CAAAe_zBpgH_06Utd9r1rVYDLaF-UuTtT1_W69bVeEP4MBzvDBQ@mail.gmail.com> <20260209.160520.873483599710391856.ishii@postgresql.org>
In-Reply-To: <20260209.160520.873483599710391856.ishii@postgresql.org>
Reply-To: assam258@gmail.com
From: Henson Choi <assam258@gmail.com>
Date: Mon, 9 Feb 2026 17:03:23 +0900
Message-ID: <CAAAe_zALWNDOcTMCOk71d3FD5MHQLiCsL-US5ok=7ZoX=xtHFg@mail.gmail.com>
Subject: Re: Row pattern recognition
To: Tatsuo Ishii <ishii@postgresql.org>, jacob.champion@enterprisedb.com
Cc: david.g.johnston@gmail.com, vik@postgresfriends.org, er@xs4all.nl, 
	peter@eisentraut.org, pgsql-hackers@postgresql.org
Content-Type: multipart/alternative; boundary="00000000000063b1e1064a5f93ae"
Archived-At: <https://www.postgresql.org/message-id/CAAAe_zALWNDOcTMCOk71d3FD5MHQLiCsL-US5ok%3D7ZoX%3DxtHFg%40mail.gmail.com>
Precedence: bulk

--00000000000063b1e1064a5f93ae
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Tatsuo,

Thank you for the thorough testing and performance profiling!

BTW, I have tested RPR with large partition/reduced frame (up to 100k
> rows). Following query took over 3 minutes on my laptop.
>

The profiling results are very insightful and help us understand where
the time is spent in extreme cases.


> I don't try to say we should enhance nfa modules in the case when
> there is a large reduced frame, because I don't think this is a
> realistic use case and I doubt it's worth to fight against unrealistic
> scenario.
>

I agree with your assessment. Let me clarify:

The pattern itself (START UP+) is realistic - it's a common pattern for
detecting upward trends. However, 100k+ consecutive matches without
interruption is not realistic. Your UP{,3} example (94ms for 100k partition=
)
demonstrates that the current implementation performs very well for
realistic
use cases.


> I think this is more realistic use case than former. If I were
> correct, we don't need to work on current nfa code to squeeze better
> performance for unrealistic 100k reduced frame case. I maybe wrong
> and I would love to hear from others especially, Henson, Vik, Jacob.


I agree that we should prioritize realistic use cases. That said, there are
potential optimizations that could help:

1. Anchored pattern absorption (see my earlier message:

https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5Z=
zWgye6SupSjdMKpg%40mail.gmail.com
)

2. Alt-pruning: In patterns like "A* | B*", once the higher-priority A
branch
   has a confirmed match, the B branch can be pruned immediately. Even if B
   could continue extending to a longer match, it can never be selected due
to
   lexical order semantics=E2=80=94A will always win. This proactive prunin=
g
respects
   SQL standard semantics while reducing unnecessary state expansion.

However, given the complexity of NFA internals, I believe we should take a
step-by-step approach:

1. First, stabilize the current RPR patch and prepare it for review
2. Then, consider optimizations as separate follow-up patches
3. Each optimization should be well-tested and reviewed independently

This approach reduces risk and makes review more manageable. The fact that
the current implementation handles even unrealistic 100k-row cases without
crashing (just slowly) shows it's already robust.

What do you think about this phased approach?

Best regards,
Henson

P.S. I discovered a crash bug that was introduced in the latest patch
refactoring. The issue occurs with nested alternation patterns like
(A+ | (A | B)+)*, where infinite recursion happens in nfa_advance_alt when
the inner BEGIN(+)'s skip jump is followed as an ALT branch pointer. I will
include the fix in the next patch update.

--00000000000063b1e1064a5f93ae
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Hi Tatsuo,<br><br>Thank you for the thoro=
ugh testing and performance profiling!</div><div dir=3D"ltr"><br></div><div=
 class=3D"gmail_quote gmail_quote_container"><blockquote class=3D"gmail_quo=
te" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-sty=
le:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
BTW, I have tested RPR with large partition/reduced frame (up to 100k<br>
rows). Following query took over 3 minutes on my laptop.<br></blockquote><d=
iv><br></div>The profiling results are very insightful and help us understa=
nd where<br><div>the time is spent in extreme cases.</div><div>=C2=A0<br></=
div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bor=
der-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,20=
4);padding-left:1ex">
I don&#39;t try to say we should enhance nfa modules in the case when<br>
there is a large reduced frame, because I don&#39;t think this is a<br>
realistic use case and I doubt it&#39;s worth to fight against unrealistic<=
br>
scenario.<br></blockquote><div><br></div>I agree with your assessment. Let =
me clarify:<br><br>The pattern itself (START UP+) is realistic - it&#39;s a=
 common pattern for<br>detecting upward trends. However, 100k+ consecutive =
matches without<br>interruption is not realistic. Your UP{,3} example (94ms=
 for 100k partition)<br>demonstrates that the current implementation perfor=
ms very well for realistic<br><div>use cases.</div><div>=C2=A0</div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-wi=
dth:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-=
left:1ex">
I think this is more realistic use case than former. If I were<br>
correct, we don&#39;t need to work on current nfa code to squeeze better<br=
>
performance for unrealistic 100k reduced frame case. I maybe wrong<br>
and I would love to hear from others especially, Henson, Vik, Jacob.</block=
quote><div><br></div>I agree that we should prioritize realistic use cases.=
 That said, there are<br>potential optimizations that could help:<br><br>1.=
 Anchored pattern absorption (see my earlier message:<br>=C2=A0 =C2=A0<a hr=
ef=3D"https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQy=
XSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com">https://www.postgresql.org/message=
-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com<=
/a>)<br><br>2. Alt-pruning: In patterns like &quot;A* | B*&quot;, once the =
higher-priority A branch<br>=C2=A0 =C2=A0has a confirmed match, the B branc=
h can be pruned immediately. Even if B<br>=C2=A0 =C2=A0could continue exten=
ding to a longer match, it can never be selected due to<br>=C2=A0 =C2=A0lex=
ical order semantics=E2=80=94A will always win. This proactive pruning resp=
ects<br>=C2=A0 =C2=A0SQL standard semantics while reducing unnecessary stat=
e expansion.<br><br>However, given the complexity of NFA internals, I belie=
ve we should take a<br>step-by-step approach:<br><br>1. First, stabilize th=
e current RPR patch and prepare it for review<br>2. Then, consider optimiza=
tions as separate follow-up patches<br>3. Each optimization should be well-=
tested and reviewed independently<br><br>This approach reduces risk and mak=
es review more manageable. The fact that<br>the current implementation hand=
les even unrealistic 100k-row cases without<br>crashing (just slowly) shows=
 it&#39;s already robust.<br><br>What do you think about this phased approa=
ch?<br><br>Best regards,<br>Henson<br><br>P.S. I discovered a crash bug tha=
t was introduced in the latest patch<br>refactoring. The issue occurs with =
nested alternation patterns like<br>(A+ | (A | B)+)*, where infinite recurs=
ion happens in nfa_advance_alt when<br>the inner BEGIN(+)&#39;s skip jump i=
s followed as an ALT branch pointer. I will<br><div>include the fix in the =
next patch update.=C2=A0</div></div></div>

--00000000000063b1e1064a5f93ae--