Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRiMY-002ck7-2d for pgsql-hackers@arkaria.postgresql.org; Tue, 26 May 2026 03:21:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wRiLV-002e6r-1v for pgsql-hackers@arkaria.postgresql.org; Tue, 26 May 2026 03:20:46 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wRiLV-002e6j-0P for pgsql-hackers@lists.postgresql.org; Tue, 26 May 2026 03:20:46 +0000 Received: from mail-ua1-x934.google.com ([2607:f8b0:4864:20::934]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wRiLT-00000000nDB-249C for pgsql-hackers@postgresql.org; Tue, 26 May 2026 03:20:44 +0000 Received: by mail-ua1-x934.google.com with SMTP id a1e0cc1a2514c-9567aa1a047so3018645241.0 for ; Mon, 25 May 2026 20:20:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779765641; cv=none; d=google.com; s=arc-20240605; b=b20LsziuTaF5eOLk3EyuF8djhyMNs16pn5ywmO7FAIqyc+x8sPInH3EKUTbfeRleHg ePtk6bk78ue4wdeY+Aspy2w1YZ9SXR9y7vz1Wzci96SYBJnjVx4PfYIMTKrlnYWTVMqq GRqZ/ORznaTqJDqmkZgxQbFPwrow05kBKSCftomQtF4fX6cGyIz+M5mQQ2pwIAFLkmqN OZfGMyd4GuCFgR3OG9BGGUPAPydwLGCg3orN3Gm4funU6PKwX6L47TtbwASztxlP+u6j WtmTL3ZBsg3Xs+Lu7jsolYvdTlQ6lVM9ciOhOQAzell7ahfCJIxwq2oCzaI/iTfNCKrC 7/JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=di89l4yq0E31U+TvygbXbjWyej7mfHtiHcOKgdnx4ic=; fh=H3xWCbvTwMPXi2z4M8IVBNAHwJT5NcojyESWgMlwmYs=; b=I+kDiSLsL9vS1y6ws+q65uYdg64APCm1bHAPyb1+ynxxwE4lM/m9ybTjhdCqI/xB/h g9bXFZG+2V2aPsyjzgNPIllXtX8oH/eO/AMLjG1MUEEC2fH15v6dLKKjoPSOt1plLwbS GyK1nKYiiOUYGWTZl319rwtzjZK5o3EV2f8Kg3FXse+b6EVYUxIRKGRT/DPFCDVqVki+ DpbL/4H0/NKlFNQHHobmObPCpUM32oIyHw/6eZIYpUgrFZlzjv37zpgD+shNzcKY8gMH d1sev1BuGIdGiBOlMN97JAdg7vRk6DYQRbz7M2W5n9vXX7CD6gp1bvvDU64chXg4IB2x TF1g==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779765641; x=1780370441; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=di89l4yq0E31U+TvygbXbjWyej7mfHtiHcOKgdnx4ic=; b=slebkC2WnDO4yNhe9trzbiurd9hwI37cFtVnHEGqRyZeyK9BhJo7N0iEcEMK+dIcEL 1pQyYiSg9xmAaXS1meKzZxvUaQGz1tjk1/lpRvOAzalpyjJaUmAWR6/ULMlBEBQigj9c Z2x8U4HoZQ2BT98pRyB8q+lrmu8vbgseak+h+wmza2I+buKg/lKX1HSTj9DCmWyqOhz4 GQHd7EABVdxR2DuWk/vJ3kRXpRwLd6NnwZt6VUwYTCzCGZn3gCflSXxKNIQzHfnWp4za ObkCtsz/BZcJhQgkA4N1a5T4H2KuuZwff77QJdw/r2I8mOMAq7TRFXXA+JXoYJP3Lf2P GJFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779765641; x=1780370441; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=di89l4yq0E31U+TvygbXbjWyej7mfHtiHcOKgdnx4ic=; b=To10DCWkW+mRrIjBhqD1cRWjPDykEPSXhP9ZoAHfyhwoxGRuz4l1CShZSqmOHRbdmm vCCTsaY//FLq0IkdNQGv9ZLiQHlIBJ/UZX2Q/H23RJfwk4pc6BrC/k7rnzfj9GU2JsvB eIMVJ3AsCggfINQRTyu4x+jy96CBw0BzN0xQrDf92HBuScXRWmjMkY6V5Jygw46kYrSw LE/V6GRqKlqi1Jn/iHYap34LkdN5y6erk1oR2+I3wO1aJ710mVWUYSbYKySoRa8Tf9Jt 7k2WMh/h3UjzNPWjk4oudMMxliPwdmOpkD0CTY/JszlcwtDWodBLO0trLhToNgqG2otX jjMw== X-Forwarded-Encrypted: i=1; AFNElJ/qc4jA4X8c+TurC7V9gob/QW1e5hfr0Oyy/3IM/vebpOSWfw6QMW3i1OojbyVaEha85VvPLFnxXDQ7zsz3@postgresql.org X-Gm-Message-State: AOJu0YxJHfnlsAMxzcG9ih7//APV02BDjeo7bQt/fRqWDGVTtSGrbOrk 5/1v05R+FPmJEmzD2hDVkrg+fW0EIn1vFvnWmI+slRp2KEi4PZVQu7ageDRDRqFLV3LF+co/b8z CgBcSOI4Xam1q6LM+1jshpvG6SszGnj0= X-Gm-Gg: Acq92OFbnY3pjjAt+R08VvfngDswIJkct99fqq9LNrja0vfq3T+GeX7wfaPnILuip+P 8TXqGM3LAO8TxSWfTIB7fN8pryDe3+PDLn5wceVXVORXDiKk0B0oTvAYVOPsMF1zT/K0bhVMVjc 9Yw7q0ZE7OHayDQFoPdQXfNWXU363L9LLkVPQtg26cZP7S4fsQzOEM4iaHeydeKJ8/FtsICxtr5 JhOyHrTdmdpRDksMI3vVjw55K47Ez3U3jz+Jfs1TeIQ6K1J592MGzUzg6AgaqP/vJYDBcmYYFQd +L4fB+UxLFHqX17c28kReLWheaguPOAVfdiSdGBhsLsv9VCfXA70IyPuTFti+9ExmdM+Zd3ArMh LzXDVF36SXd5DoecePnKnQ/ZQbjbiIYWvwXsRPWUMt90U4Qijw39SdnrHURrSbJl7Fak5frow70 mpU59iW3Sjcdi5kFJ6GKdGn8c7D2xDknQZ X-Received: by 2002:a05:6102:1621:b0:633:f2bf:7de4 with SMTP id ada2fe7eead31-67c74ec7d35mr8158718137.10.1779765641070; Mon, 25 May 2026 20:20:41 -0700 (PDT) MIME-Version: 1.0 References: <20260502.140304.670813149418899420.ishii@postgresql.org> <20260505.090124.365339750969814137.ishii@postgresql.org> <20260517.190023.159085681032648582.ishii@postgresql.org> In-Reply-To: <20260517.190023.159085681032648582.ishii@postgresql.org> From: jian he Date: Tue, 26 May 2026 11:20:04 +0800 X-Gm-Features: AVHnY4JMUHR2q1WTrwM4Acz1KL9gDRGZKkURsvnn7lKdacwBe7K6UKi2lt6PiFo Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: assam258@gmail.com, zsolt.parragi@percona.com, sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, li.evan.chao@gmail.com, pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi. Disclaimer: I had some private off-list discussions with Henson Choi regarding execRPR.c, and the NFA. The following are more formal comments regarding execRPR.c and the NFA algorithm, based on v47-0001 to v47-0009, I haven't applied the incremental diff yet. (Since the incremental diffs are scattered across several threads, a unified v48 would be better). (I'm still wrapping my head around the NFA in execRPR.c, so take the comments below with a grain of salt.) PATTERN (A B C D) We can short-circuit and exit early if any of the evaluations (A, B, or C) fail in nfa_match. This is necessary since the chance of a pattern element evaluation returning false is not rare, I think. For src/backend/executor/README.rpr: We should explicitly explain 'absorbable' and 'absorption' somewhere in README.rpr, as the text currently just assumes the reader knows what they mean. Using some example illustrate "absorption" meaning, put it on README.rpr would be great. We can also mention that 'DFS' refers to Depth-First Search". `````` (4) Call nfa_advance(initialAdvance=true) `````` In V47, the variable `initialAdvance` does not exist. In nfa_advance_var, after the first Assert, we can add: Assert(elem->next < pattern->numElements); ExecRPRFinalizeAllContexts seems unnecessary; I commented it out, rerun the regress tests (TESTS='test_setup rpr_base rpr_nfa rpr_explain rpr_integration rpr' meson test -C $BUILD3 --num-processes 20 --suite regress --verbose) Only two SQL tests in rpr_explain.sql failed. SELECT first_value(id) OVER w AS match_start FROM stock_ticks WINDOW w AS ( ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN ((A B) {2}) DEFINE A AS price < 100, B AS price < 100); The query above invokes the following code. Since the PATTERN above is not greedy, is the comment below incorrect? `````` else { /* Greedy: enter first, skip second */ state->elemIdx = elem->next; nfa_route_to_elem(winstate, ctx, state, &elements[state->elemIdx], currentPos); if (skipState != NULL) { nfa_route_to_elem(winstate, ctx, skipState, &elements[elem->jump], currentPos); } } `````` nfa_advance_var ``` else if (canExit) { state->counts[depth] = 0; state->elemIdx = elem->next; nextElem = &elements[state->elemIdx]; /* * Update isAbsorbable for target element (monotonic: AND preserves * false) */ state->isAbsorbable = state->isAbsorbable && RPRElemIsAbsorbableBranch(nextElem); /* See comment above: increment outer END count for quantified VARs */ if (RPRElemIsEnd(nextElem)) { if (state->counts[nextElem->depth] < RPR_COUNT_MAX) state->counts[nextElem->depth]++; } nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos); } ``` The above ELSE IF overrides all RPRNFAState field values except RPRNFAState->next. Should we set RPRNFAState->next to NULL? (If I add ``state->next = NULL;`` in the above ELSE IF branch, all the regress tests still pass) Some comments about the situation of (RPRNFAState) state->next would be great. This also happens in `` if (canLoop && canExit)``, ``if (reluctant)`` branch. For function nfa_advance_var, I don't understand the meaning of the variable "count", after the first Assert I have added below: if (count > 2 && !state->isAbsorbable && RPRElemIsReluctant(elem)) elog(INFO, "count is %d and elem is reluctant and isAbsorbable is false XXXXX", count); if (count > 2 && state->isAbsorbable && RPRElemIsReluctant(elem)) elog(INFO, "count is %d and elem is reluctant and isAbsorbable is true XXXXX", count); Rerunning the regress tests shows that count >= 3 occurs very infrequently. I'm don't understand isAbsorbable, so I am not sure if the second ``elog(INFO`` is actually reachable or not. Can we add more complex queries (more count >= 3) to check if the "count" variable is working correctly? In function nfa_add_state_unique: /* Mark VAR in visited before duplicate check to prevent DFS loops */ winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |= ((bitmapword) 1 << BITNUM(state->elemIdx)); I honestly don't understand the purpose of the code block above. But it doesn't seem to influence the subsequent FOR LOOP; It only modifies its own variables without any other side effects within nfa_add_state_unique. Could we add some comments explaining which external functions rely on this code and why it belongs in nfa_add_state_unique? nfa_states_equal compareDepth = elem->depth + 1; /* depth 0 needs 1 count, etc. */ The comment above isn't helpful, IMHO, and I don't understand it. We should focus on why compareDepth should be ```elem->depth + 1```. function nfa_add_state_unique return value is not being used? Do we need to do something with the return value, or is this expected? (I don't have an opinion on it, I guess it would be better to raise this issue) In nfa_advance_alt, during the main WHILE loop, I think altElem->depth must be larger than elem->depth. Therefore we can do `````` if (altElem->depth == elem->depth) elog(ERROR, "nfa_advance_alt altElem->depth should not be the same as elem->depth reached"); if (altElem->depth < elem->depth) break; `````` -- jian https://www.enterprisedb.com/