public inbox for [email protected]
help / color / mirror / Atom feedFrom: Henson Choi <[email protected]>
To: Tatsuo Ishii <[email protected]>
To: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Row pattern recognition
Date: Mon, 1 Jun 2026 14:35:24 +0900
Message-ID: <CAAAe_zCZc8s4zWfmVVUOt0y_FU=v7YTcJJJ4UL2gBzJ2_KkUmQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAAAe_zDDuJofafXyhggPPPLzUXeejH-19NfcLR7jNNQxZchtog@mail.gmail.com>
<[email protected]>
<CAAAe_zBhzHyazC_Ot+HXBe_nToKc7AHs4r-s0nNcaBPw0L17wA@mail.gmail.com>
<[email protected]>
Hi Tatsuo, Jian,
While tidying RPR comments I found a small inconsistency in the varId
bounds.
The comment/README side I'm already fixing in the in-progress series;
whether
to also change the bounds is a separate follow-up. As lead author that one
is
ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on
it first.
The current state, in src/include/optimizer/rpr.h:
#define RPR_VARID_MAX 251
#define RPR_VARID_BEGIN 252 /* control codes 252..255 */
... END 253, ALT 254, FIN 255
RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */
and the limit enforced in parse_rpr.c:
if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */
ereport(ERROR, "too many pattern variables", "Maximum is 251");
So 251 variables are accepted as varId 0..250, leaving 251 a hole: never
assigned, yet the macro still classifies it as a variable -- one wider than
the comment's own "0 to RPR_VARID_MAX - 1".
RPRVarId is a uint8, kept small on purpose: varId is the likely per-row
match-history key, and since a match can run arbitrarily long the history
grows with it -- so one byte per row, not two, is what keeps that footprint
in check.
The catch of staying in uint8: the four control codes already fill 252..255,
so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion
{- -}) short of widening to uint16. So the hole is really the last reserve.
Three ways, by what the gap is spent on:
(1) Leave it -- just the doc alignment already underway: 251 stays a
documented
reserve, macro unchanged. No follow-up commit. The one free slot is
then
on hand for a single future control code, should one ever be needed.
(2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a
few
lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the
last free slot, so a future control code would then force either a
compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes
(doubling history). Maximal variables now, the control question
deferred.
(3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:
vars 0..239, control 240..255, existing sentinels unmoved, macro becomes
(varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so
history
stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs
only the nominal drop to 240 variables -- but it is a narrowing, so free
only pre-release.
The asymmetry: (3) is the only one with a deadline -- a narrowing is
compatible
only before release, while (1)/(2) stay open forever. So the question is
whether to spend this one free moment to lock in 1-byte control headroom
(3),
or stay minimal now (1)/(2) and take the narrow-or-widen later if it is ever
needed. My own lean is toward (3): 240 variables is already far more than
any
real pattern will use, so the capacity we give up is nominal, while the
12-code
buffer closes the narrow-or-widen fork for good and keeps match history at
one
byte -- and it is the one choice that is free only now. That said, I'd like
the decision to rest on everyone's input -- Jian's and the list's as much as
mine -- with you, Tatsuo, weighing it all and making the final call.
Either way, once the feature matures and the final control-code count is
known,
the space can be repacked gap-free -- so none of these is the last word.
Which would you prefer?
Thanks,
Henson
view thread (109+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Row pattern recognition
In-Reply-To: <CAAAe_zCZc8s4zWfmVVUOt0y_FU=v7YTcJJJ4UL2gBzJ2_KkUmQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox