public inbox for [email protected]
help / color / mirror / Atom feedFrom: jian he <[email protected]>
To: [email protected]
Cc: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Tue, 2 Jun 2026 13:46:22 +0800
Message-ID: <CACJufxHVADC8e77pnQxSZRk7SYHCZFk6ZCM2HfTsKyD_kUji0A@mail.gmail.com> (raw)
In-Reply-To: <CAAAe_zCZc8s4zWfmVVUOt0y_FU=v7YTcJJJ4UL2gBzJ2_KkUmQ@mail.gmail.com>
References: <CAAAe_zDDuJofafXyhggPPPLzUXeejH-19NfcLR7jNNQxZchtog@mail.gmail.com>
<[email protected]>
<CAAAe_zBhzHyazC_Ot+HXBe_nToKc7AHs4r-s0nNcaBPw0L17wA@mail.gmail.com>
<[email protected]>
<CAAAe_zCZc8s4zWfmVVUOt0y_FU=v7YTcJJJ4UL2gBzJ2_KkUmQ@mail.gmail.com>
On Mon, Jun 1, 2026 at 1:35 PM Henson Choi <[email protected]> wrote:
>
> Hi Tatsuo, Jian,
>
> While tidying RPR comments I found a small inconsistency in the varId bounds.
> The comment/README side I'm already fixing in the in-progress series; whether
> to also change the bounds is a separate follow-up. As lead author that one is
> ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
> it first.
>
> The current state, in src/include/optimizer/rpr.h:
>
> #define RPR_VARID_MAX 251
> #define RPR_VARID_BEGIN 252 /* control codes 252..255 */
> ... END 253, ALT 254, FIN 255
>
> RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
>
> and the limit enforced in parse_rpr.c:
>
> if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
> ereport(ERROR, "too many pattern variables", "Maximum is 251");
>
> So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
> assigned, yet the macro still classifies it as a variable -- one wider than
> the comment's own "0 to RPR_VARID_MAX - 1".
>
> RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
> match-history key, and since a match can run arbitrarily long the history
> grows with it -- so one byte per row, not two, is what keeps that footprint
> in check.
>
> The catch of staying in uint8: the four control codes already fill 252..255,
> so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
> {- -}) short of widening to uint16. So the hole is really the last reserve.
>
> Three ways, by what the gap is spent on:
>
> (1) Leave it -- just the doc alignment already underway: 251 stays a documented
> reserve, macro unchanged. No follow-up commit. The one free slot is then
> on hand for a single future control code, should one ever be needed.
>
> (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a few
> lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
> last free slot, so a future control code would then force either a
> compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
> (doubling history). Maximal variables now, the control question deferred.
>
> (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
> vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
> (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so history
> stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
> only the nominal drop to 240 variables -- but it is a narrowing, so free
> only pre-release.
>
> Which would you prefer?
>
3.
240 variables is enough, as each variable supports multiple complex AND/OR
conditions. Additionally, since PostgreSQL regular expressions use 14 special
characters, reserving the remaining ones in advance is a future-proof approach.
----------------------------------------------------
src/backend/executor/README.rpr
XII-4. Memory Pool Management
Choice: Custom free list
Rationale:
- NFA states are created and destroyed in large numbers per row
- Avoids palloc/pfree overhead
- State size is variable (counts[] array), but within a single query
maxDepth is fixed, so all states have the same size
It would be better simply to mention that:
RPRNFAState and RPRNFAContext are allocated in a partition-lifespan
memory context; they will be destroyed in release_partition.
--------------------------
in ExecRPRFreeContext:
{
if (ctx->states != NULL)
nfa_state_free_list(winstate, ctx->states);
if (ctx->matchedState != NULL)
nfa_state_free(winstate, ctx->matchedState);
}
If ctx->matchedState points to one of the states already in ctx->states, will
nfa_state_free() be called on the same RPRNFAState twice? Is this double-free
permitted, or do we have a mechanism in place to guard against it?
--------------------------
In ExecRPRProcessRow
if (currentPos == ctxFrameEnd)
{
/* Frame boundary reached: force mismatch */
nfa_match(winstate, ctx, NULL);
continue;
}
If I comment out the CONTINUE, the entire regression still succeeds.
--
jian
https://www.enterprisedb.com/
view thread (109+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <CACJufxHVADC8e77pnQxSZRk7SYHCZFk6ZCM2HfTsKyD_kUji0A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox