Re: Row pattern recognition

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Henson Choi <[email protected]>
To: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Sat, 7 Mar 2026 08:18:27 +0900
Message-ID: <CAAAe_zDgbDQf=RMtGw_axaNm1d_HY-e4=0ddDy809p7kTuUwHQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAAAe_zBFKp7bn9YUamzNiy7s2LQ3C9VXsFLRTyVTbk+ETLfZUQ@mail.gmail.com>
	<[email protected]>
	<CAAAe_zBH=6nh6Yg9EuohthTgzHgtbhtvLYAnabg6mCHQPLLpqQ@mail.gmail.com>
	<[email protected]>

Hi, Tatsuo

Does "a zero-length match" mean "an empty match"?
>

Yes, they refer to the same thing.  "Zero-length match" is the more
common term in general regex implementations (PCRE2, Perl, Python,
Java, etc.[1]), but the RPR standard (ISO/IEC 19075-5, Section 4.12.2)
uses "empty match" exclusively.

[1] https://www.regular-expressions.info/zerolength.html

BTW, currently we place all nfa_* functions at the bottom of
> nodeWindowAgg.c.  However nodeWindowAgg.c in master branch places "API
> exposed to window functions" at the bottom of the file. Do you think
> we should follow the way?

Yes, we should follow master's convention.  I see three options:

  (a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and
      keep the "API exposed to window functions" section at the bottom,
      matching master's layout.

  (b) Separate file under src/backend/executor/, keeping it close to
      nodeWindowAgg.c while making the boundary explicit.

  (c) A dedicated src/backend/rpr/ directory modeled on
      src/backend/regex/, giving the NFA engine its own namespace.
      This could also be an opportunity to consolidate the existing
      src/backend/optimizer/plan/rpr.c into the same directory.

For now (a) is the safest change.  Longer term, (b) or (c) would make
more sense -- especially when we extend to MATCH_RECOGNIZE (R010),
where the NFA engine will need to be shared across both code paths.
Either way, the NFA engine can be exposed via a header so that R010
can share it without further restructuring.

Since the NFA algorithm is not familiar territory for most DBMS
developers, it would also be worth preserving the detailed algorithm
description posted earlier in this thread -- either as structured
comments or as a dedicated README alongside the code.

What do you think?  Should we start with (a) now and revisit the
broader restructuring approaches -- (b) or (c) -- later, or would you
prefer to discuss them first?  Either of those would also resolve the
file layout convention issue naturally, since new files would follow
proper conventions from the start.

One more thing: there are no ECPG example programs or regression tests
for RPR yet.  I'd like to propose adding them.  Shall I draft an
initial set, or would you prefer to coordinate with the ECPG
maintainers first?

Best regards,
Henson

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Row pattern recognition
  In-Reply-To: <CAAAe_zDgbDQf=RMtGw_axaNm1d_HY-e4=0ddDy809p7kTuUwHQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox