Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUHxy-0013u9-1m for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 05:47:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUHxx-00CvZO-0f for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Jun 2026 05:47:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUHxw-00CvZF-2c for pgsql-hackers@lists.postgresql.org; Tue, 02 Jun 2026 05:47:04 +0000 Received: from mail-vk1-xa30.google.com ([2607:f8b0:4864:20::a30]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wUHxu-00000000nej-1iAf for pgsql-hackers@postgresql.org; Tue, 02 Jun 2026 05:47:04 +0000 Received: by mail-vk1-xa30.google.com with SMTP id 71dfb90a1353d-59aaca022easo2675453e0c.0 for ; Mon, 01 Jun 2026 22:47:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780379220; cv=none; d=google.com; s=arc-20240605; b=ShgG+hqwHpZBaPo++hO0gQFsCqzCc5SOmu5RLZ1qbFiXwUJVL0R901bnSX0d1rlvTP Gpe0ptHyVqd/Z1qtY9X4ao5mRF+Gu742nm7x9KLleZ2WhbwwdTC37lXi+fKunk8+uKhO WFVO2ZHYDr8kvj4rU7ZwnTerP+jN0EoQS2y8WM036+QgHLmo9KhMcq5BZDOgFcfVxgwg WYXmlBXzK12hgfH0J0Ay8gWs5u7g9/CHChruEXtjoCCF9EBMrhFyTH+vl+UgLgJoPxqa u5xxa2H8GgYoV3qdEaA9tUA+g58cVI3GbQH/Gd4x3OygtToNLPkGMQGhkiKl4lGKlQ2a zI0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=sweZCeEx2h/57dvBVIPUvIvW/e1b5lJpEghnlVtpiaA=; fh=YVMs69blpvejkQ9C7/LnbjIxIfXBdbkcriSQL+Y3GcE=; b=UIA1E5scoVFfU6tLQdTw2KHueeHsGefdJnT418KMjlNLa9xb9edVBGXiIoDcV79Sq9 aV5G6y0viJP/irqVh315OxD7ZwwB2/uscbr+KxJklNOvbPRJCzakEYGTA/ZeCdk02qRr mR74bDERCkKQyz6rX35rgkQlGdaJszVVeSaeoNioWcKuEb/dZBu1JRoeh9tCEBZgxeli gXXDnVc01puT8iUdJ/SPufEHqGnPfbgr/dbSPYvxDw7STgiDiIAXmvmZbMxfcHjQ61La aCRIzveSJo3QvYZS3HAQ6YLymHkGADX200f6R3YHWeWYMlRuGvkr6grtO3O4yfyFJiXi 3z7Q==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780379219; x=1780984019; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sweZCeEx2h/57dvBVIPUvIvW/e1b5lJpEghnlVtpiaA=; b=Xx8VJiGb7+nA4XVoWsA48MMYjUR80uRPWLE8nj46vWr/as2Xt/leIRbcMcEUBawap0 iA8EkOaO2FCBJYtLg39NvX+0Ufo4JC7tREHMnNht6ERdYV4ZwGNh5/kj01cDpi0CasDZ lxET6DamTm+++LD7imx5Oza8iXMmNyfSDcpMLzYoV05NgWbgZuD2cO4J+rGLemS5kaGR mAF6YmWA44QBk81CWJVUNEg731XpvfB8HClWYLUDCDWxYJPa/Xm4teqRVjBQoGg3p+pC JDwBjM6+vvDBgL2+hcE3rgc6Pj1yRsfolqQ2COaMWhTVcTfZ9Tu2RmEePj71Eg6VgHW/ U5Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780379220; x=1780984020; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=sweZCeEx2h/57dvBVIPUvIvW/e1b5lJpEghnlVtpiaA=; b=OSAwV72UhSrHIRL+AcvI6oyrr1u+8/GJIkxWo6Nnwaaa2PwEp8gUKOE1IyGRFS/XGo LDXYzskebWUi4xzNpEWf2pS/Se0nswokUaccFsWfM6ByjHQCfgrWnlaNPePI+D9gPnXA 3VcmCLpI6fCmNiP+zUns2fKtnR0rICPYhK3vv08wZVdSjrdGTuI5PVui7ba/65ncdRuJ 6HhRP85GQ8LvH16lVA8PX5dL0nayze1AjE5/rjbY8dVvig7ykkoe/bUEyOsUCASXWyKR gsS6dzaX454Wx2yel/iv+hP2seCsTcD4JaLbNQqYvSasyZk5dl/VUmCh4pGJrNJ44mxk TzMg== X-Forwarded-Encrypted: i=1; AFNElJ9gUS7mCUkrEAzDtJYuynCPAkC8EOfD0lAAANlp+7Zj6FQPRWAIkIHQwdtr65EtWiWJDBbh3pNGn/RHNKMO@postgresql.org X-Gm-Message-State: AOJu0YxchL1J1l0XFYfZ5G3xXKxkfdkm/cBTll0iLqm4rpoX8sYWfI8T UzqJQ5XJKpWTKO0qXad+VOWYWFKnVHW6REzrsKFRDKG6HZu1R7TyxUmOBTEjucuVomdN7ylSBew UF8eX0pntEwR+VSrK2uv2R4AJQ6x4ZQs= X-Gm-Gg: Acq92OFuOQzwrlPQ84puaoOIdCTWt7AnUvbWD+6Q5Ror5sOimtVP5qFRMJ1wvym6kcN xRLRW4jKxmhjamAsmcGXiAVmbwhYtOpw+tPWl0sRJw9X2dttm73Y7HXRTU+/Gi+oLTwHo8P4R8c g7BGtOFC9KUXHXAmjLK1GNlc8YrUdz1etuXKDnuLn/gpScuzUYeNXQBIzmnNEphboOATflaLQAo jESPyvuV/taUN4cUYENiR8O0+hDAfAmyK7zHJ7YUhfMm37pndc3die2Kvqj+Qwxk7zajcVCgQFy EF1KGptipLdloChjDaBAAo0AoV9SCzgi15LIglkEwMQGz5KRkGyH11p0qXHlt2YDIW/OxZXLe4i ysQ/53V4UAjhEAUyQLC0zQHL4CdvmwmEXQ+dkTVzJyqOTatAXLcmrCaMq0F9NzxGmx/MhgWnl7y QX3dEv8x+uaHflB/R+puAzCZKFD3t19QS8chcsnBZXJno= X-Received: by 2002:a05:6102:26c7:b0:607:a151:d63 with SMTP id ada2fe7eead31-6c696e28e7bmr6333388137.7.1780379219343; Mon, 01 Jun 2026 22:46:59 -0700 (PDT) MIME-Version: 1.0 References: <20260601.111119.1029884790276077667.ishii@postgresql.org> <20260601.114703.1561993497414705173.ishii@postgresql.org> In-Reply-To: From: jian he Date: Tue, 2 Jun 2026 13:46:22 +0800 X-Gm-Features: AVHnY4Iggp9DWw5AmLYEPaJ_uPh-Ah9cGiazTpu50_e-Cw-D0QJ5MJuoENc0cT4 Message-ID: Subject: Re: Row pattern recognition To: assam258@gmail.com Cc: Tatsuo Ishii , zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Jun 1, 2026 at 1:35=E2=80=AFPM Henson Choi wro= te: > > Hi Tatsuo, Jian, > > While tidying RPR comments I found a small inconsistency in the varId bou= nds. > The comment/README side I'm already fixing in the in-progress series; whe= ther > to also change the bounds is a separate follow-up. As lead author that o= ne is > ultimately your call, Tatsuo, but I'd welcome Jian's and the list's input= on > it first. > > The current state, in src/include/optimizer/rpr.h: > > #define RPR_VARID_MAX 251 > #define RPR_VARID_BEGIN 252 /* control codes 252..255 */ > ... END 253, ALT 254, FIN 255 > > RPRElemIsVar(e) =3D=3D ((e)->varId <=3D RPR_VARID_MAX) /* 0..251 */ > > and the limit enforced in parse_rpr.c: > > if (list_length(*varNames) >=3D RPR_VARID_MAX) /* reject the 252nd */ > ereport(ERROR, "too many pattern variables", "Maximum is 251"); > > So 251 variables are accepted as varId 0..250, leaving 251 a hole: never > assigned, yet the macro still classifies it as a variable -- one wider th= an > the comment's own "0 to RPR_VARID_MAX - 1". > > RPRVarId is a uint8, kept small on purpose: varId is the likely per-row > match-history key, and since a match can run arbitrarily long the history > grows with it -- so one byte per row, not two, is what keeps that footpri= nt > in check. > > The catch of staying in uint8: the four control codes already fill 252..2= 55, > so 251 is the only free slot for any future sentinel (anchor ^/$, exclusi= on > {- -}) short of widening to uint16. So the hole is really the last reser= ve. > > Three ways, by what the gap is spent on: > > (1) Leave it -- just the doc alignment already underway: 251 stays a docu= mented > reserve, macro unchanged. No follow-up commit. The one free slot is= then > on hand for a single future control code, should one ever be needed. > > (2) Fill it as a 252nd variable (0..251). Compatible and doable anytime;= a few > lines in parse_rpr.c / rpr.h plus the boundary test. But it spends t= he > last free slot, so a future control code would then force either a > compatibility-breaking narrow of RPR_VARID_MAX or a widen to two byte= s > (doubling history). Maximal variables now, the control question defe= rred. > > (3) Reserve 16 control codes now (4 used + 12 spare) at the 0xF0 boundary= : > vars 0..239, control 240..255, existing sentinels unmoved, macro beco= mes > (varId & 0xF0) !=3D 0xF0. Buys 12-code headroom inside the byte, so = history > stays 1 byte and (2)'s fork never arises. Same edit shape as (2); co= sts > only the nominal drop to 240 variables -- but it is a narrowing, so f= ree > only pre-release. > > Which would you prefer? > 3. 240 variables is enough, as each variable supports multiple complex AND/OR conditions. Additionally, since PostgreSQL regular expressions use 14 speci= al characters, reserving the remaining ones in advance is a future-proof appro= ach. ---------------------------------------------------- src/backend/executor/README.rpr XII-4. Memory Pool Management Choice: Custom free list Rationale: - NFA states are created and destroyed in large numbers per row - Avoids palloc/pfree overhead - State size is variable (counts[] array), but within a single query maxDepth is fixed, so all states have the same size It would be better simply to mention that: RPRNFAState and RPRNFAContext are allocated in a partition-lifespan memory context; they will be destroyed in release_partition. -------------------------- in ExecRPRFreeContext: { if (ctx->states !=3D NULL) nfa_state_free_list(winstate, ctx->states); if (ctx->matchedState !=3D NULL) nfa_state_free(winstate, ctx->matchedState); } If ctx->matchedState points to one of the states already in ctx->states, wi= ll nfa_state_free() be called on the same RPRNFAState twice? Is this double-fr= ee permitted, or do we have a mechanism in place to guard against it? -------------------------- In ExecRPRProcessRow if (currentPos =3D=3D ctxFrameEnd) { /* Frame boundary reached: force mismatch */ nfa_match(winstate, ctx, NULL); continue; } If I comment out the CONTINUE, the entire regression still succeeds. -- jian https://www.enterprisedb.com/