Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUFE4-001299-0D for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 02:51:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUFE2-00CLbn-33 for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 02:51:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUFE2-00CLbf-1f for pgsql-hackers@lists.postgresql.org; Tue, 02 Jun 2026 02:51:30 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wUFDz-00000000g11-2SXv for pgsql-hackers@postgresql.org; Tue, 02 Jun 2026 02:51:29 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=DN0GMPTmdAEUW6RUsrw8t2sRT3DTEJq04tyj4E49mu4=; b=tZcJKJXqld5VH4jqHq2nlToGWj c2OC1yD32hfiRWeetK0XJC4kL+PSOAUY7IVYAi7ve9pcBXozfzrf7K8ivs2RzzxuqyL5qUL3yQdlS PNSvIncm0RxiBL4UkrM8urHLm+cbjtL5jbCDCkJa+sj25e1RFvDwip27vtSwObzCo+H6hjpEfsRwc 57rLXS7RRwVaK4cFOwMO+9uwXf5WgnnZQ4xbtHDHTv80hKMhVLP5er+4gAz+uOHD6DID9Hw9yZnjH Omc0lKzijI6Q0BB9hRsiFsMTlXm4Xmfb9Z1iiZuGu37wNFvn+GZzrk/mn2iJMVZ4KjIvfYl/VNRQL zMC+I6hw==; Received: from [2409:11:4120:300:dc21:2fa3:619:de6a] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUFDp-001wWM-2P; Tue, 02 Jun 2026 02:51:19 +0000 Date: Tue, 02 Jun 2026 11:50:39 +0900 (JST) Message-Id: <20260602.115039.1897923276330429432.ishii@postgresql.org> To: assam258@gmail.com Cc: jian.universality@gmail.com, zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Subject: Re: Row pattern recognition From: Tatsuo Ishii In-Reply-To: References: <20260601.114703.1561993497414705173.ishii@postgresql.org> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:dc21:2fa3:619:de6a (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Henson, Jian, > Hi Tatsuo, Jian, > > While tidying RPR comments I found a small inconsistency in the varId > bounds. > The comment/README side I'm already fixing in the in-progress series; > whether > to also change the bounds is a separate follow-up. As lead author that one > is > ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on > it first. > > The current state, in src/include/optimizer/rpr.h: > > #define RPR_VARID_MAX 251 > #define RPR_VARID_BEGIN 252 /* control codes 252..255 */ > ... END 253, ALT 254, FIN 255 > > RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */ > > and the limit enforced in parse_rpr.c: > > if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */ > ereport(ERROR, "too many pattern variables", "Maximum is 251"); > > So 251 variables are accepted as varId 0..250, leaving 251 a hole: never > assigned, yet the macro still classifies it as a variable -- one wider than > the comment's own "0 to RPR_VARID_MAX - 1". > > RPRVarId is a uint8, kept small on purpose: varId is the likely per-row > match-history key, and since a match can run arbitrarily long the history > grows with it -- so one byte per row, not two, is what keeps that footprint > in check. > > The catch of staying in uint8: the four control codes already fill 252..255, > so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion > {- -}) short of widening to uint16. So the hole is really the last reserve. > > Three ways, by what the gap is spent on: > > (1) Leave it -- just the doc alignment already underway: 251 stays a > documented > reserve, macro unchanged. No follow-up commit. The one free slot is > then > on hand for a single future control code, should one ever be needed. > > (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a > few > lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the > last free slot, so a future control code would then force either a > compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes > (doubling history). Maximal variables now, the control question > deferred. > > (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary: > vars 0..239, control 240..255, existing sentinels unmoved, macro becomes > (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so > history > stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs > only the nominal drop to 240 variables -- but it is a narrowing, so free > only pre-release. > > The asymmetry: (3) is the only one with a deadline -- a narrowing is > compatible > only before release, while (1)/(2) stay open forever. So the question is > whether to spend this one free moment to lock in 1-byte control headroom > (3), > or stay minimal now (1)/(2) and take the narrow-or-widen later if it is ever > needed. My own lean is toward (3): 240 variables is already far more than > any > real pattern will use, so the capacity we give up is nominal, while the > 12-code > buffer closes the narrow-or-widen fork for good and keeps match history at > one > byte -- and it is the one choice that is free only now. That said, I'd like > the decision to rest on everyone's input -- Jian's and the list's as much as > mine -- with you, Tatsuo, weighing it all and making the final call. > > Either way, once the feature matures and the final control-code count is > known, > the space can be repacked gap-free -- so none of these is the last word. > > Which would you prefer? I'd prefer (3). Yes, I agree that 240 pattern variables is enough. Regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp