public inbox for [email protected]
help / color / mirror / Atom feedFrom: Henson Choi <[email protected]>
To: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Fri, 13 Feb 2026 10:14:26 +0900
Message-ID: <CAAAe_zAzOgB2KR9ACDD2o3QNP_gaCKnUd2bRGgbhcD=og50XXA@mail.gmail.com> (raw)
In-Reply-To: <CAAAe_zDPaPiRg2gn=kRtrh+VTis5EDUeCftBK46vKsZjUHAEtA@mail.gmail.com>
References: <[email protected]>
<CAAAe_zDX24SnaQFwTNVUx_vK5887mpD+rXTaedjzdfFrGROFGA@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAAAe_zDPaPiRg2gn=kRtrh+VTis5EDUeCftBK46vKsZjUHAEtA@mail.gmail.com>
Hi Tatsuo,
However, this raises interesting questions: should we optimize patterns
> by removing {0} quantifiers or simplifying them? And if so, how should
> we handle patterns that become empty after such optimization?
>
> For example:
> - PATTERN (A{0}) → empty pattern
> - PATTERN (A{0} B{0}) → empty pattern
> - PATTERN (A{0} B) → PATTERN (B) after optimization
>
> Empty patterns would result in zero-length matches, which our current
> implementation explicitly treats as invalid (see initialAdvance flag
> logic in nodeWindowAgg.c).
>
> More importantly, I recall that zero-length matches caused serious
> issues during development, which is why we added logic to explicitly
> avoid them.
>
> The reason I cannot immediately provide a concrete plan for A{0}
> support is that I need to deeply understand the semantic meaning of
> zero-length matches in the SQL standard first. Without this
> understanding, any implementation approach could be fundamentally
> flawed.
>
> Specifically, I need to investigate:
> - What zero-length matches mean semantically in RPR
> - How to handle empty patterns according to the standard
> - The correct behavior when a pattern optimizes to nothing
>
> After the current code review phase is complete, I'm also considering
> setting up an Oracle test environment to observe how it handles these
> edge cases. This could provide valuable insights into the expected
> behavior, especially for zero-length matches and empty patterns.
>
Our current implementation cannot support A{0} due to a structural
limitation.
The reduced_frame_map uses row-based representation (reduced_frame_map[pos]
= val),
which can only express matches consuming at least one row. It cannot
represent
zero-length matches that occur between rows without consuming any row
position.
Patterns like A{0}, A*, or A? can produce zero-length matches with no row
to mark
as RF_FRAME_HEAD and no position to register in the frame map.
We currently prevent this using the initialAdvance flag (nodeWindowAgg.c),
which skips FIN recording during initial epsilon transitions.
Supporting A{0} requires either restructuring reduced_frame_map to handle
virtual positions, or separate handling for zero-length matches. Before
choosing an approach, we need clarity on what the SQL standard expects for
zero-length match semantics (output generation, aggregate behavior, etc.).
Given this structural limitation, I'd like to ask: should we keep the
current
initialAdvance mechanism (which prevents zero-length matches) and handle
A{0}
separately?
Best regards,
Henson
view thread (7+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <CAAAe_zAzOgB2KR9ACDD2o3QNP_gaCKnUd2bRGgbhcD=og50XXA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox