Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vob5m-00AuAk-10 for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 05:42:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vob4k-006Ej3-18 for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 05:41:46 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vob4j-006Eiv-1t for pgsql-hackers@lists.postgresql.org; Sat, 07 Feb 2026 05:41:45 +0000 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vob4g-00000000wu3-3J7p for pgsql-hackers@postgresql.org; Sat, 07 Feb 2026 05:41:44 +0000 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-3545d66bb3aso653564a91.0 for ; Fri, 06 Feb 2026 21:41:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770442901; cv=none; d=google.com; s=arc-20240605; b=VIM4Ys4QbRYEkvgES8a1AoBm7WcL4e7jdpHPW3E+w9btadVRanOeCGyi2xkuJtZiqH Bsza6u6bBLzSRPrJIryWosQq1EHm86vhRbHAd8qjNykNGac/zhZqMsB/IUbR8hghXfLi gg4exJ9Z4w1kJVL4pCqVdoCRG/oNNQorrBLPyFmMZohyfwlKrwxJzf/tlSjOELdLTkZ/ G57NOjQNaGNyj8MzOGMLjlf7azR5qlogvIYaZ3DeuWOIg3iiy4ojQa2ZtcQEA9APw8W1 a+RiKKCoqnLYGe5ro3BasXM8S5k0pnonUMPdnLYSDIvXEKGuxcwWVXrUSjtbt+K1QyYJ 6ZNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=k//pzMxqbprm/zLBRV6oNXO6lKcysahnBQ96kwFZuhs=; fh=ZoTEGd6dQEANhEBXytOnChkPoBBGFzAe4FSHLU/ZQY4=; b=O0YeRA5ihKd7CIvJiWr/lipqvnx8Gg989VzmiGwUwywtcchKmlJxuzPmThs811/24K aW/QQamsldFmVZKHi58FwKCXUBa52bGA7n/Jl0JHk5LVeq8W1FbmCvFfHOSBfE3rZVtO zLBS7ezJbkiYn4r1I4q/d8QSGxO8vDsyoXW49aVrCnN5QOEf+aRcWy1+sbTRvK/4RPhL buLKPVIjzR2oaSM8SX+CTWN4znxlDwS64IwB0OMKKCzhMf3X0j/YBT0ej++exzZQVtv5 iiaIDy4fA+GqKdjB0SXbBab9qziJWis2bf9hLftoOJdY/h+S4jj462tIEuSjCFQdXsB6 E1vg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770442901; x=1771047701; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=k//pzMxqbprm/zLBRV6oNXO6lKcysahnBQ96kwFZuhs=; b=WLtsAhDX1SjnwlTXRaS3VyXX41t28KxecMHmuh5Jm23F/s3zp3ruVo6T1w2NvbEpTZ Wuxr4WEuIac4lxHJnjmZ1xiVM6xz2WTOIK8lORXQjA1zRegQdNh+ERkJBtngVMooulUZ aakSsOIHMEE90n9cmRseVQDsGiVle2QrfJ+5fURfDaxf2d5gnkCqkSucqoalP+tpDMZz mu0ULp77h4O+RN4HmXlY5kHFJ5ZDZ2mbnm90TreYQdldNw4vLDg5JIWiI3e/5Or+430k iHKth07eeA9l0IMvi7VNLqaj6TACqLkm7/0Tnoh73TFukTUbMuj24G6Fdyh1fJtDU82T OIwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770442901; x=1771047701; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k//pzMxqbprm/zLBRV6oNXO6lKcysahnBQ96kwFZuhs=; b=H1R4ZiXFNltSpsgGgbug2tgsbZhdoRbpu9MozYGgFF9SUp1QhZc0F6GOnGUCWIrw7t qwgs6fyOPQHE1xguC/7hU2sCSTq90fMx/xUcNGU1ODkWJ7FxB3WIg4LLCNkAwy7lLmgh X8VZvjb7TuVahMQ0sRLK9Z0cxrtCCvFT3t22kASFSnIdWI2K/Q2MUIw2r0rek6biMguJ j5xX9Uah93ccgDBmh7E/2NxgCFlsDTf2nbRoDTd532aCy63Jmg01rokKxMQK2CKKv1bW HjWtiqbEO8EClxquSmUcSGGax7FMsELXSTN4DvD+Gzj3tX1JFWQyxkwopJElTlwv5x7a dmtQ== X-Forwarded-Encrypted: i=1; AJvYcCUguE3RB8eb6fvspatVsI+dBZJV4h9oxbL21CvErnI/BjzcQtX/Latfs23eL7bFaTMW9xvKyrubijcJIkwA@postgresql.org X-Gm-Message-State: AOJu0YwpSkOiyxXow4XgOkXApC4tj+YC9rFOSA7BjKUXWae0ievgJuF6 g3l7+sdi+aiZ3qbmM/9KbMhuZDhRPpz+bSeohnnxCEKFB/NoK7pw5VwwH4mezPBN4sk/4+DaNwS /9HvSMaNzmmHrYjgnv8+hHbwwmuesVfM= X-Gm-Gg: AZuq6aL8hK4LROiF3/DkwM3EmfQkPALnK6JKOLO7AeP0yuK+t/JdQ7chwgVJZHoSvuQ /hmOpXhDPygExFHCWB+cY5mzboYixMwkZ3g8thgo8AhpoUrRREzoVcqHDdz+cAmnOyegsMDd7PN puESKlgqt5tmHcRSOQVJqgOP1gh+VLR88wr1Korw7SR6qjQZjPufXGM0WH+7hMoHEq5hncvaBUJ PSoOyBV822YG2xB0o1oAnslmtABt1ONqdau+QRlGvM+fJ6ybSP3FDw94NjZptTVICbCZBFl57Mo eqpK4+vFNyEHCflckPK0Ws9clXIE X-Received: by 2002:a17:90b:3c4d:b0:352:f2a6:334 with SMTP id 98e67ed59e1d1-354b3c91161mr4853044a91.16.1770442901207; Fri, 06 Feb 2026 21:41:41 -0800 (PST) MIME-Version: 1.0 References: <20260205.102519.1298921431937419793.ishii@postgresql.org> <20260207.133143.1951390390459539453.ishii@postgresql.org> In-Reply-To: <20260207.133143.1951390390459539453.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Sat, 7 Feb 2026 14:41:26 +0900 X-Gm-Features: AZwV_QjGUz39W17LrO95kx--cOFVTEYlpuU5IAi_CVwqkQINo8jyZro-2ir-068 Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, vik@postgresfriends.org, er@xs4all.nl, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000038c24e064a355c84" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000038c24e064a355c84 Content-Type: text/plain; charset="UTF-8" Hi Tatsuo, +-- Filter function to normalize Storage memory values only (not NFA > statistics) > +-- Works for text, JSON, and XML formats > > What about add reasoning why only storage memory values are normalized > something like: > > -- NFA statistics should not change between platforms. Done. I've updated the comment to: -- Filter function to normalize Storage memory values only (not NFA statistics). -- NFA statistics should not change between platforms; if they do, it could -- indicate issues such as uninitialized memory access. -- Works for text, JSON, and XML formats. This round focused primarily on reviewing explain.c. I believe we're getting close to the final version now. Only the nodeWindowAgg.c code review remains on my side. For the next submission, would it be helpful if I resend all the incremental patches I've already sent since v42 together as separate per-patch files? I've also made some updates to the documentation (advanced.sgml and ref/select.sgml). Could you take a look and let me know if there are any mistakes or if the changes look good to you? I've sent several incremental patches on top of v42, and the attached patch is the next one in that series. Here are the changes: Bugs fixed: - Fix server crash in fillRPRPatternAlt() when an inner ALT is the last element of an outer ALT's branch. Both alternations tried to set the 'next' pointer on the same endpoint element, triggering Assert(pat->elements[endPos].next == RPR_ELEMIDX_INVALID). Fix by redirecting all elements in the branch that share the old target to point to the outer ALT's afterAlt instead. (Test: PATTERN (C (A | B) | D) -- inner ALT at end of outer branch) - Add BEGIN element to fillRPRPatternGroup(). Previously only END was emitted for quantified groups, so groups with min=0 had no skip path to bypass the group content. Change to BEGIN/END pairs where BEGIN.jump points past END (skip) and END.jump points to the first child (loop-back). Add nfa_advance_begin() to the NFA executor for group entry and skip handling. (Test: PATTERN (A ((B | C) (D | E))* F?) -- * group matching 0 times) - Fix deparse output producing wrong parenthesization for nested ALT patterns. The flat-loop approach could not correctly suppress double parentheses for single-ALT groups or handle depth transitions around alternation separators. Refactor into mutual recursion: deparse_rpr_elements, deparse_rpr_group, deparse_rpr_alt, and deparse_rpr_var. - Fix deparse separator ordering for nested ALT: the ' | ' separator was emitted before closing inner parentheses, producing output like '(c (a | b | )d)' instead of '(c (a | b) | d)'. Close parens to match separator depth before emitting ' | '. - Fix missing ALT separator registration in deparse_rpr_alt() when an ALT is the first element of an outer ALT's branch. The code only checked elem->next (always -1 for ALT markers) but not elem->jump, which carries the outer branch's separator position. - Fix altEndPositions/altBranchStarts length mismatch caused by the 'if (*idx > branchStart)' guard that skipped empty branches. Remove the guard so both lists always have matching entries. - Fix RPRPatternNode.reluctant initialization in gram.y. ALT, SEQ, VAR, and GROUP primary productions used false (0) or left it uninitialized (defaulting to 0 from palloc0), but 0 is a valid ParseLoc meaning "location at offset 0". Change all four creation sites to use -1 (no location), matching the convention used by ParseLoc throughout the parser. - Fix reluctant quantifier display in both explain.c and ruleutils.c. A bare variable with reluctant ? (e.g. A?) was displayed as just 'a' since there was no quantifier to attach ? to. Now emit '{1}?' to make the reluctant marker unambiguous. New optimization: - Add mergeConsecutiveAlts() to the SEQ optimization pipeline. After GROUP{1,1} unwrap, bare alternations like (A | B) become ALT nodes in the SEQ. This step detects consecutive identical ALT nodes and wraps them in a GROUP with the appropriate quantifier, e.g. (A | B) (A | B) (A | B) -> (A | B){3}. Combined with the existing mergeGroupPrefixSuffix, patterns like (A | B) (A | B)+ (A | B) further reduce to (A | B){3,}. - Extend tryMultiplyQuantifiers() to fold nested GROUP quantifiers (e.g. ((A B)+)* -> (A B)*) in addition to VAR quantifiers. Other changes: - Add RPR_VARID_BEGIN (252) to rpr.h for the new control element type. Reduce RPR_VARID_MAX from 252 to 251 accordingly and update the maximum-variables boundary tests. - Add RPR_DEPTH_NONE (255) as sentinel for top-level elements that have no enclosing group. Reduce RPR_DEPTH_MAX to 254 accordingly. - Add BEGIN handling to computeAbsorbabilityRecursive() so that absorbability flags propagate correctly through group boundaries. - Extract show_rpr_nfa_stats() from show_windowagg_info() for NFA statistics display. - Replace defensive NULL check in collectPatternVariables() with Assert, since the caller guarantees a non-NULL pattern. - Pair each EXPLAIN test query in rpr_explain.sql with a pg_get_viewdef() check via CREATE TEMP VIEW, verifying that both the ruleutils (parse tree) and explain (bytecode) deparse paths produce consistent PATTERN output. - Add comment to rpr_explain_filter() explaining why only Storage memory values are normalized: NFA statistics should not change between platforms, and variation would indicate issues such as uninitialized memory access. Also add EXPLAIN test cases for nested ALT patterns, consecutive ALT merging, and ALT prefix/suffix merging. On a side note, I feel that the automata characteristics of RPR -- state transitions, context management, absorbability, etc. -- tend to require a much larger volume of test cases than typical RDBMS/SQL features. This seems unavoidable given the combinatorial nature of NFA execution. Best regards, Henson --00000000000038c24e064a355c84 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tatsuo,

+-- Filter function to normalize Storage memory values only (not NFA statis= tics)
+-- Works for text, JSON, and XML formats

What about add reasoning why only storage memory values are normalized
something like:

-- NFA statistics should not change between platforms.
Done. I've updated the comment to:

-- Filter function to = normalize Storage memory values only (not NFA statistics).
-- NFA statis= tics should not change between platforms; if they do, it could
-- indica= te issues such as uninitialized memory access.
-- Works for text, JSON, = and XML formats.

This round focused primarily on reviewing explain.c= .=C2=A0 I believe we're
getting close to the final version now.=C2= =A0 Only the nodeWindowAgg.c
code review remains on my side.

For = the next submission, would it be helpful if I resend all the
incremental= patches I've already sent since v42 together as separate
per-patch = files?

I've also made some updates to the documentation (advance= d.sgml and
ref/select.sgml).=C2=A0 Could you take a look and let me know= if there are
any mistakes or if the changes look good to you?

I&= #39;ve sent several incremental patches on top of v42, and the attached
= patch is the next one in that series.=C2=A0 Here are the changes:

Bu= gs fixed:

- Fix server crash in fillRPRPatternAlt() when an inner AL= T is the
=C2=A0 last element of an outer ALT's branch.=C2=A0 Both al= ternations tried to
=C2=A0 set the 'next' pointer on the same en= dpoint element, triggering
=C2=A0 Assert(pat->elements[endPos].next = =3D=3D RPR_ELEMIDX_INVALID).=C2=A0 Fix by
=C2=A0 redirecting all element= s in the branch that share the old target to
=C2=A0 point to the outer A= LT's afterAlt instead.
=C2=A0 (Test: PATTERN (C (A | B) | D) -- inne= r ALT at end of outer branch)

- Add BEGIN element to fillRPRPatternG= roup().=C2=A0 Previously only END was
=C2=A0 emitted for quantified grou= ps, so groups with min=3D0 had no skip path
=C2=A0 to bypass the group c= ontent.=C2=A0 Change to BEGIN/END pairs where
=C2=A0 BEGIN.jump points p= ast END (skip) and END.jump points to the first
=C2=A0 child (loop-back)= .=C2=A0 Add nfa_advance_begin() to the NFA executor for
=C2=A0 group ent= ry and skip handling.
=C2=A0 (Test: PATTERN (A ((B | C) (D | E))* F?) --= * group matching 0 times)

- Fix deparse output producing wrong pare= nthesization for nested ALT
=C2=A0 patterns.=C2=A0 The flat-loop approac= h could not correctly suppress double
=C2=A0 parentheses for single-ALT = groups or handle depth transitions around
=C2=A0 alternation separators.= =C2=A0 Refactor into mutual recursion:
=C2=A0 deparse_rpr_elements, depa= rse_rpr_group, deparse_rpr_alt, and
=C2=A0 deparse_rpr_var.

- Fix= deparse separator ordering for nested ALT: the ' | ' separator
= =C2=A0 was emitted before closing inner parentheses, producing output like<= br>=C2=A0 '(c (a | b | )d)' instead of '(c (a | b) | d)'.= =C2=A0 Close parens to
=C2=A0 match separator depth before emitting '= ; | '.

- Fix missing ALT separator registration in deparse_rpr_a= lt() when an
=C2=A0 ALT is the first element of an outer ALT's branc= h.=C2=A0 The code only
=C2=A0 checked elem->next (always -1 for ALT m= arkers) but not elem->jump,
=C2=A0 which carries the outer branch'= ;s separator position.

- Fix altEndPositions/altBranchStarts length = mismatch caused by the
=C2=A0 'if (*idx > branchStart)' guard= that skipped empty branches.=C2=A0 Remove
=C2=A0 the guard so both list= s always have matching entries.

- Fix RPRPatternNode.reluctant initi= alization in gram.y.=C2=A0 ALT, SEQ,
=C2=A0 VAR, and GROUP primary produ= ctions used false (0) or left it
=C2=A0 uninitialized (defaulting to 0 f= rom palloc0), but 0 is a valid
=C2=A0 ParseLoc meaning "location at= offset 0".=C2=A0 Change all four creation
=C2=A0 sites to use -1 (= no location), matching the convention used by
=C2=A0 ParseLoc throughout= the parser.

- Fix reluctant quantifier display in both explain.c an= d ruleutils.c.
=C2=A0 A bare variable with reluctant ? (e.g. A?) was dis= played as just 'a'
=C2=A0 since there was no quantifier to attac= h ? to.=C2=A0 Now emit '{1}?' to
=C2=A0 make the reluctant marke= r unambiguous.

New optimization:

- Add mergeConsecutiveAlts()= to the SEQ optimization pipeline.=C2=A0 After
=C2=A0 GROUP{1,1} unwrap,= bare alternations like (A | B) become ALT nodes
=C2=A0 in the SEQ.=C2= =A0 This step detects consecutive identical ALT nodes and
=C2=A0 wraps t= hem in a GROUP with the appropriate quantifier, e.g.
=C2=A0 (A | B) (A |= B) (A | B) -> (A | B){3}.=C2=A0 Combined with the existing
=C2=A0 me= rgeGroupPrefixSuffix, patterns like (A | B) (A | B)+ (A | B)
=C2=A0 furt= her reduce to (A | B){3,}.

- Extend tryMultiplyQuantifiers() to fold= nested GROUP quantifiers
=C2=A0 (e.g. ((A B)+)* -> (A B)*) in additi= on to VAR quantifiers.

Other changes:

- Add RPR_VARID_BEGIN (= 252) to rpr.h for the new control element type.
=C2=A0 Reduce RPR_VARID_= MAX from 252 to 251 accordingly and update the
=C2=A0 maximum-variables = boundary tests.

- Add RPR_DEPTH_NONE (255) as sentinel for top-level= elements that
=C2=A0 have no enclosing group.=C2=A0 Reduce RPR_DEPTH_MA= X to 254 accordingly.

- Add BEGIN handling to computeAbsorbabilityRe= cursive() so that
=C2=A0 absorbability flags propagate correctly through= group boundaries.

- Extract show_rpr_nfa_stats() from show_windowag= g_info() for NFA
=C2=A0 statistics display.

- Replace defensive N= ULL check in collectPatternVariables() with
=C2=A0 Assert, since the cal= ler guarantees a non-NULL pattern.

- Pair each EXPLAIN test query in= rpr_explain.sql with a
=C2=A0 pg_get_viewdef() check via CREATE TEMP VI= EW, verifying that both
=C2=A0 the ruleutils (parse tree) and explain (b= ytecode) deparse paths
=C2=A0 produce consistent PATTERN output.

=
- Add comment to rpr= _explain_filter() explaining why only Storage
=C2=A0 memory values are n= ormalized: NFA statistics should not change
=C2=A0 between platforms, an= d variation would indicate issues such as
=C2=A0 uninitialized memory ac= cess.

Also add EXPLAIN test cases for nested ALT patterns, consecuti= ve ALT
merging, and ALT prefix/suffix merging.

On a side not= e, I feel that the automata characteristics of RPR --
state transitions,= context management, absorbability, etc. -- tend to
require a much large= r volume of test cases than typical RDBMS/SQL
features.=C2=A0 This seems= unavoidable given the combinatorial nature of
NFA execution.
<= div>
Best regards,
Henson=C2=A0
--00000000000038c24e064a355c84--