Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wTvJQ-000pnf-0k for pgsql-hackers@arkaria.postgresql.org; Mon, 01 Jun 2026 05:35:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wTvJO-008zKJ-24 for pgsql-hackers@arkaria.postgresql.org; Mon, 01 Jun 2026 05:35:43 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wTvJO-008zK9-0m for pgsql-hackers@lists.postgresql.org; Mon, 01 Jun 2026 05:35:42 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wTvJL-00000000Wk2-1nOH for pgsql-hackers@postgresql.org; Mon, 01 Jun 2026 05:35:41 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-68ced97b6eeso1521957a12.0 for ; Sun, 31 May 2026 22:35:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780292138; cv=none; d=google.com; s=arc-20240605; b=VXdHS7Xrp3oTdRoTX84r4w88BN19eY4X1f5+Z4449O14o2Mj6V/dJ/gRGkLUH8AyS3 jXI7JDJWh7X11wzR8P0MgRyH4Zm9ZF4n4xlrdoVro3Y4cSP4ny3zQZ18BoNFhcBVyku5 jHDwzNSr0Yio7AGlBlxGk+58h4r7t29y5ce4vGbffN9YsZKcc5YSREWE4t5Sctg2J5oo mG89UjzJXUrmwpvMUHwAtfQ19Sx/ytJ9UAZrE8YyQPBLPAhqFqDtED1AID6lZelLJHb6 vVU0eYP1YigyzWisBY7R2Ucx7wkEdkYROM0p17pSBUb9etdHtMU9oFix47B4E0aGZhoc OY9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=Zq5O9GEhHqNHaZ6X3W44Z+uVTh2P7BDAe57cDv0h9Tg=; fh=GNHQAD3shl5WzkNIg8f7pTgsNxzGkrWW3Y7nVRwl/Qo=; b=Dyt2+1sbZA4XDkw9faNeqsCEpnc9e02HX69bEuaXKr1nWE0c7sSNFvop8ybKn7edan zz7ewR9L3PX1qyx0Vkhq8isPGPUQgdZh6E9WGPdGd5IufY7HhUHSmmdkJaA9TXAFWixU dFu58jIKCOfsc0MvlXCl6sxicotinNyJYWMlvG9jKcQbXFRgak/VePuSqZP1+Hu7mxPu Sjho0khIfBzBM/qCedmZf5l4FesMoXmBGhIs4kiaiqw+HzlptypBCfrdFT9Vuah3BpvY sg+01w3r6Em0ZMdYUy1snpoz56hX7ZVd5Xu/fFekHsDPLoAucauuLaoZNWYArKfN4wAA +VBw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780292138; x=1780896938; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Zq5O9GEhHqNHaZ6X3W44Z+uVTh2P7BDAe57cDv0h9Tg=; b=DrIbYgqYkDWUDedc//OSy4i4BDDn1+kvhVXVynbkSH5mdK6q9YWIT5LKER6VYvZlVn hxX9tREyUpValip9I94IIZqiolmVUc6p8HBxGbQB7HH/ni9RGTVB9DZfiqLEp+ZaKRj8 m65XtF4Nb8qyRa2zmgI8meLnizcqyWySda2kb5bOiosU1K/oJhGKguWWg7DeB72XIp9n RXSe8sQlBhiC7k0Fn/UIW/Q6DuEVLhNJS4z5dZzjXGyvqY4jNRtvdMv84jVnak6PuY9U 3yCCO3ItwL3IZRkH/NBXxAMz2S8qX3XCfvCe3xPgFnN+5j8S2d1LtIsZLftfOQXEKHGE 5/Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780292138; x=1780896938; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Zq5O9GEhHqNHaZ6X3W44Z+uVTh2P7BDAe57cDv0h9Tg=; b=rMZYYYhXwrENTHCEZKqSI7LJ+ndEeXEp9q9utYEkNeycyAYCq1Vkv1oan+mS6vb7IB 56YYhRoTdK5GWsiKpujJk4GliPKFfkx9miT8fzCIA8996jyARjln50GkFHFRH36hBWyu A939VfJ+dkPrl2e0wX+AgbVIjmSGtiKus9FylgZofcrSrGDoWVl2szSQmhBOF8y/DoKJ AAhhqngl3UsglqYzD/0GULLrUd8S6XiiexYhRyV4ZcTyJUBc0+PS9CBjSRTze/nmY5pW lR8IAEAZCvsmbJk6BC9uuxo/z+OHPIUbT8FZtjwe2x7YMqOcZMVUp5/V+SZnjrq7kJ+V VoOw== X-Forwarded-Encrypted: i=1; AFNElJ+WB16wx3QBmOIVazpa7iSveJRZMAI8LA9CINUlirG+ApK/C6nRV42wCBketmHXkz+8QL4QreEil0z3U08i@postgresql.org X-Gm-Message-State: AOJu0Yxn9egAklIGLB1Wwk9c//T6xuSr3prADDslIPdBLpV3AlS3otqx yexPREvNCc18A3KphfvkNSfDHxMFU1zT70HAHmN9OqkNKGSXpeMrDmjeUF9uE1oFw9Aep94pl86 iYbpYBhcY92r3gNW7Trt9E6aExaMa8kE= X-Gm-Gg: Acq92OE9bkStVyNQws5bPR/3oG5RIpvnziUyawJxodxouWBRhUolQ48rL5+CAnEj7Bi UPJthJ/Gkh8JR5uwvTQsZjVtjvAM544SoYZTNtOfwLOV58XKvj+h1C1UL/s+G4JctYJd6S9pfCI hxT4lC4Cxb3rByIKbr+IFOiJULQNMofXo7zE4+7sIpsusmqq45GKYEd22SCgkwOGUHesbPYdsZo AUaNlaeovJwZH9RkhMwsZt6wdEvS58sVMIjSXqdK/sYQpZ88/G6qgeS3+DaKS7kWu8pfqYryMps muXludcnI0wyUQzZd3lgEy7GjR2GyQ2NahHRIfLYb9GdeS88Bw== X-Received: by 2002:a17:907:9453:b0:bd0:df77:98bb with SMTP id a640c23a62f3a-beab01d5f6cmr555176766b.9.1780292138115; Sun, 31 May 2026 22:35:38 -0700 (PDT) MIME-Version: 1.0 References: <20260601.111119.1029884790276077667.ishii@postgresql.org> <20260601.114703.1561993497414705173.ishii@postgresql.org> In-Reply-To: <20260601.114703.1561993497414705173.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Mon, 1 Jun 2026 14:35:24 +0900 X-Gm-Features: AVHnY4JIraHsQK4JCzhSb7T1wRpMrMHHM2e_-n0UKaQH79Yaki28aZ9q38Gt8CI Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii , jian.universality@gmail.com Cc: zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="0000000000007d279606532a90a4" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007d279606532a90a4 Content-Type: text/plain; charset="UTF-8" Hi Tatsuo, Jian, While tidying RPR comments I found a small inconsistency in the varId bounds. The comment/README side I'm already fixing in the in-progress series; whether to also change the bounds is a separate follow-up. As lead author that one is ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input on it first. The current state, in src/include/optimizer/rpr.h: #define RPR_VARID_MAX 251 #define RPR_VARID_BEGIN 252 /* control codes 252..255 */ ... END 253, ALT 254, FIN 255 RPRElemIsVar(e) == ((e)->varId <= RPR_VARID_MAX) /* 0..251 */ and the limit enforced in parse_rpr.c: if (list_length(*varNames) >= RPR_VARID_MAX) /* reject the 252nd */ ereport(ERROR, "too many pattern variables", "Maximum is 251"); So 251 variables are accepted as varId 0..250, leaving 251 a hole: never assigned, yet the macro still classifies it as a variable -- one wider than the comment's own "0 to RPR_VARID_MAX - 1". RPRVarId is a uint8, kept small on purpose: varId is the likely per-row match-history key, and since a match can run arbitrarily long the history grows with it -- so one byte per row, not two, is what keeps that footprint in check. The catch of staying in uint8: the four control codes already fill 252..255, so 251 is the only free slot for any future sentinel (anchor ^/$, exclusion {- -}) short of widening to uint16. So the hole is really the last reserve. Three ways, by what the gap is spent on: (1) Leave it -- just the doc alignment already underway: 251 stays a documented reserve, macro unchanged. No follow-up commit. The one free slot is then on hand for a single future control code, should one ever be needed. (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime; a few lines in parse_rpr.c / rpr.h plus the boundary test. But it spends the last free slot, so a future control code would then force either a compatibility-breaking narrow of RPR_VARID_MAX or a widen to two bytes (doubling history). Maximal variables now, the control question deferred. (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary: vars 0..239, control 240..255, existing sentinels unmoved, macro becomes (varId & 0xF0) != 0xF0. Buys 12-code headroom inside the byte, so history stays 1 byte and (2)'s fork never arises. Same edit shape as (2); costs only the nominal drop to 240 variables -- but it is a narrowing, so free only pre-release. The asymmetry: (3) is the only one with a deadline -- a narrowing is compatible only before release, while (1)/(2) stay open forever. So the question is whether to spend this one free moment to lock in 1-byte control headroom (3), or stay minimal now (1)/(2) and take the narrow-or-widen later if it is ever needed. My own lean is toward (3): 240 variables is already far more than any real pattern will use, so the capacity we give up is nominal, while the 12-code buffer closes the narrow-or-widen fork for good and keeps match history at one byte -- and it is the one choice that is free only now. That said, I'd like the decision to rest on everyone's input -- Jian's and the list's as much as mine -- with you, Tatsuo, weighing it all and making the final call. Either way, once the feature matures and the final control-code count is known, the space can be repacked gap-free -- so none of these is the last word. Which would you prefer? Thanks, Henson --0000000000007d279606532a90a4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tatsuo, Jian,

While tidying RPR comments I found= a small inconsistency in the varId bounds.
The comment/README side I= 9;m already fixing in the in-progress series; whether
to also change the= bounds is a separate follow-up.=C2=A0 As lead author that one is
ultima= tely your call, Tatsuo, but I'd welcome Jian's and the list's i= nput on
it first.

The current state, in src/include/optimizer/rpr= .h:

=C2=A0 #define RPR_VARID_MAX =C2=A0 251
=C2=A0 #define RPR_VA= RID_BEGIN 252 =C2=A0 /* control codes 252..255 */
=C2=A0 ... END 253, AL= T 254, FIN 255

=C2=A0 RPRElemIsVar(e) =C2=A0=3D=3D =C2=A0((e)->va= rId <=3D RPR_VARID_MAX) =C2=A0 /* 0..251 */

and the limit enforce= d in parse_rpr.c:

=C2=A0 if (list_length(*varNames) >=3D RPR_VARI= D_MAX) =C2=A0 /* reject the 252nd */
=C2=A0 =C2=A0 =C2=A0 ereport(ERROR,= "too many pattern variables", "Maximum is 251");
So 251 variables are accepted as varId 0..250, leaving 251 a hole: never<= br>assigned, yet the macro still classifies it as a variable -- one wider t= han
the comment's own "0 to RPR_VARID_MAX - 1".

RPR= VarId is a uint8, kept small on purpose: varId is the likely per-row
mat= ch-history key, and since a match can run arbitrarily long the history
g= rows with it -- so one byte per row, not two, is what keeps that footprint<= br>in check.

The catch of staying in uint8: the four control codes a= lready fill 252..255,
so 251 is the only free slot for any future sentin= el (anchor ^/$, exclusion
{- -}) short of widening to uint16.=C2=A0 So t= he hole is really the last reserve.

Three ways, by what the gap is s= pent on:

(1) Leave it -- just the doc alignment already underway: 25= 1 stays a documented
=C2=A0 =C2=A0 reserve, macro unchanged.=C2=A0 No fo= llow-up commit.=C2=A0 The one free slot is then
=C2=A0 =C2=A0 on hand fo= r a single future control code, should one ever be needed.

(2) Fill = it as a 252nd variable (0..251).=C2=A0 Compatible and doable anytime; a few=
=C2=A0 =C2=A0 lines in parse_rpr.c / rpr.h plus the boundary test.=C2= =A0 But it spends the
=C2=A0 =C2=A0 last free slot, so a future control = code would then force either a
=C2=A0 =C2=A0 compatibility-breaking narr= ow of RPR_VARID_MAX or a widen to two bytes
=C2=A0 =C2=A0 (doubling hist= ory).=C2=A0 Maximal variables now, the control question deferred.

(3= ) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary:=C2=A0 =C2=A0 vars 0..239, control 240..255, existing sentinels unmoved, m= acro becomes
=C2=A0 =C2=A0 (varId & 0xF0) !=3D 0xF0.=C2=A0 Buys 12-c= ode headroom inside the byte, so history
=C2=A0 =C2=A0 stays 1 byte and = (2)'s fork never arises.=C2=A0 Same edit shape as (2); costs
=C2=A0 = =C2=A0 only the nominal drop to 240 variables -- but it is a narrowing, so = free
=C2=A0 =C2=A0 only pre-release.

The asymmetry: (3) is the on= ly one with a deadline -- a narrowing is compatible
only before release,= while (1)/(2) stay open forever.=C2=A0 So the question is
whether to sp= end this one free moment to lock in 1-byte control headroom (3),
or stay= minimal now (1)/(2) and take the narrow-or-widen later if it is ever
ne= eded.=C2=A0 My own lean is toward (3): 240 variables is already far more th= an any
real pattern will use, so the capacity we give up is nominal, whi= le the 12-code
buffer closes the narrow-or-widen fork for good and keeps= match history at one
byte -- and it is the one choice that is free only= now.=C2=A0 That said, I'd like
the decision to rest on everyone'= ;s input -- Jian's and the list's as much as
mine -- with you, T= atsuo, weighing it all and making the final call.

Either way, once t= he feature matures and the final control-code count is known,
the space = can be repacked gap-free -- so none of these is the last word.

Which= would you prefer?

Thanks,
Henson
--0000000000007d279606532a90a4--