Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyeRS-000IaB-1f for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Mar 2026 23:18:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vyeRQ-007LZT-3A for pgsql-hackers@arkaria.postgresql.org; Fri, 06 Mar 2026 23:18:45 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vyeRQ-007LZL-2A for pgsql-hackers@lists.postgresql.org; Fri, 06 Mar 2026 23:18:45 +0000 Received: from mail-pg1-x52f.google.com ([2607:f8b0:4864:20::52f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vyeRO-00000001GSe-1q9M for pgsql-hackers@postgresql.org; Fri, 06 Mar 2026 23:18:44 +0000 Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-c739561f0d3so1471307a12.3 for ; Fri, 06 Mar 2026 15:18:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772839120; cv=none; d=google.com; s=arc-20240605; b=MKSAkjO949DlKgbo7InA6Z1Gf+Fnrs3+eVFjCov24TgZwbdxxj7d0mSocdgomPzfbY piiUzBZo3ETlwFKskFRC7QDyeK6z55oN9aa+8AM6OHuI9x/ijlelG0JknSg7VYlPDT0y 0nco5+0hFDMbg0J5LSpxV0EAytuhO5k3G0U5Zs5pKX+bCY7/VHyN0WseNAo6wABys2Qa b06kCzIj3ZvtLxM5EH2trN31uriXs1rWtiQpjq+a1xFDpp0SghR2G9F9I1ZkMQy6frJv 9khro6Ne9vVksKiU0rtwIpnmhigJ8W1puqYgO5xRrgzLWSyEjaaPYLBwxZI8cbs10Zgs +BHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=os8myWNIpU/h4raUycBf0hcEqMt+gPJYtGaFTb1gLkM=; fh=LgcPOkeZF4JUdBJOItmlrWQ0DbYSfVxflueZh0mqGx8=; b=MdNg1igy3nQSosv6+JD1NkX5rmqcC/1pU4D5wzw9+ish9EAK1rZ8RW5ZMrC2Dz1cnw qnFv2r2wUOkOs7r6XUnSsEVdKLaQrBY0dqZdYgEQiDNT79ktVcj9SW/HNr3d2r6fGfmO LBcVjotIXqsR+LH5B5ZQ0EYUsaWCD1si+aGxli7PD6EpHhIO3eC2rE0zshVYbqeqOqbc 9re02FS1gyIMD+ghk6kaM4sEpZe553SYdkyYdoTcvaPQf810M8m/RZzNYN8h8u99HxVi oAWyiORVgKmtPv7KdXD1L3nYTN3RoJEPcA8pTaJ/ws9wtzKqc4ulShDF8BchXsoub2Cd 1pmg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772839120; x=1773443920; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=os8myWNIpU/h4raUycBf0hcEqMt+gPJYtGaFTb1gLkM=; b=UQCtN3yaHIVZaZygORuSZlXmAUT8sT1quKLuVmlnNRLsh3lZ3zT8GP6uvu+O1bOp1t 6VG4dRuMEVhlPM1ooQk8Paa63CQnKAE9CetsiaygZIbx4uAOXVT8EqpvD5wODbG1+Von HZTVSfxEnJMMraUwnZ505MmWwM5Xqjx0V8+uwKdmanEOnh0/ZtuBQRXiLvi2es9VJSjx KQj4JvkikrWa8pIC9dk5148d6lPskRWWlxzMtOQlFVkuNKYtx0/scG7PzvvZ5qfXi1HM HBc18q5I/0PWLkwn6wqRVuJLCFW0kgbJIZcOLj+2yKPcUihhSUQ+Upv0n2OikOyPxvZX Ft0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772839120; x=1773443920; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=os8myWNIpU/h4raUycBf0hcEqMt+gPJYtGaFTb1gLkM=; b=jftdAZDhfXs+DrbgCxbn8qpsqO82e0sc09KsqYgM+8r791yaxG0dmxzH/3J4zlg+I7 S8LOc8vzWyj1y4IRYWlPtgYWVqqmM/Zv+ACmCyWmndhWh/N5jC4W1DGAfjoPLpWSK4Kl Df//QocB1wdz2sF5M1CAP6Lv5cMsOTWroT2qOF52PREtk9gcrelD5R9K9CqCrc7giRVs Kk4/xzIuT1xygc5MN8+Me5k4W8cuG7E3sQXeSID+sfDnCzk7aCUPdDH/cfi+0/mcScgx 4NCEdIXXq49eLbSx5JKChVTqP8T5+OPNJIZJH0TXPu/XcS0Qm/xVP60bXvENOhabn/c0 NC/g== X-Forwarded-Encrypted: i=1; AJvYcCXoK4/8U2EMttJHemYu5eylQKryes6yTqIMjVzZpiOqnhOlAd+JOaJYLw7M/i3bXVo9IXvHp8b458Qs9j5I@postgresql.org X-Gm-Message-State: AOJu0YzW7gZ6Z0v6C+3RqEQ9V+WTv0QXjmkMj52l0piKpGjNWS14Q111 Vd52y24q+AF0ZJIRn8bZRnGhP6fhhPOUWDDz6SzrSU6Sh+dO1UtPcdoSjP9Gyk/5oQlOT69wqYC 3n4OAbLWTMWS4Svu1tBfQ/S++sDrofYhsP+PQ/v8= X-Gm-Gg: ATEYQzyYWTKa895A69hWmbjgLhx+2fUa1VOkOL1NeZvUNP1AyyhZBGZHSb02MeKexdi rrU8KgR0SuQcvajcwqXmk5iHUIZvQPVlnflsMU6aCcwG3PTshAxT8mBzKXufMSWDDz5O+DREiXR 9kVVLoUom0Qq4gXrvspQMoYZ+R6fT98h7cMuglEhPjQdtfht4RLYoHvp0sYp7Z3Q9gWfRP7egkU bXW6dyV4Ujr7k0LufAKEQ1N4W/Q9VGbujpsyGKDhrhUEjvj0nB/ofglFq2iqe7QruG12gHEjGH0 alDvjRZ57Wew2msqHdjYHS729TNnLvTzRuhDkjKJ X-Received: by 2002:a05:6a21:a343:b0:398:4a5c:d5a3 with SMTP id adf61e73a8af0-398590880b9mr3601932637.34.1772839120022; Fri, 06 Mar 2026 15:18:40 -0800 (PST) MIME-Version: 1.0 References: <20260305.142049.1864331791480656300.ishii@postgresql.org> <20260306.153837.2137322608184587391.ishii@postgresql.org> In-Reply-To: <20260306.153837.2137322608184587391.ishii@postgresql.org> Reply-To: assam258@gmail.com From: Henson Choi Date: Sat, 7 Mar 2026 08:18:27 +0900 X-Gm-Features: AaiRm529Wf3MAMFqT1oFyMLfh_cyPWPgHd-X9lQZz9epgJ2NBMHkmQYfXYQcYf4 Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: sjjang112233@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000fe3129064c63452b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000fe3129064c63452b Content-Type: text/plain; charset="UTF-8" Hi, Tatsuo Does "a zero-length match" mean "an empty match"? > Yes, they refer to the same thing. "Zero-length match" is the more common term in general regex implementations (PCRE2, Perl, Python, Java, etc.[1]), but the RPR standard (ISO/IEC 19075-5, Section 4.12.2) uses "empty match" exclusively. [1] https://www.regular-expressions.info/zerolength.html BTW, currently we place all nfa_* functions at the bottom of > nodeWindowAgg.c. However nodeWindowAgg.c in master branch places "API > exposed to window functions" at the bottom of the file. Do you think > we should follow the way? Yes, we should follow master's convention. I see three options: (a) Reorder within nodeWindowAgg.c: move the nfa_* functions up and keep the "API exposed to window functions" section at the bottom, matching master's layout. (b) Separate file under src/backend/executor/, keeping it close to nodeWindowAgg.c while making the boundary explicit. (c) A dedicated src/backend/rpr/ directory modeled on src/backend/regex/, giving the NFA engine its own namespace. This could also be an opportunity to consolidate the existing src/backend/optimizer/plan/rpr.c into the same directory. For now (a) is the safest change. Longer term, (b) or (c) would make more sense -- especially when we extend to MATCH_RECOGNIZE (R010), where the NFA engine will need to be shared across both code paths. Either way, the NFA engine can be exposed via a header so that R010 can share it without further restructuring. Since the NFA algorithm is not familiar territory for most DBMS developers, it would also be worth preserving the detailed algorithm description posted earlier in this thread -- either as structured comments or as a dedicated README alongside the code. What do you think? Should we start with (a) now and revisit the broader restructuring approaches -- (b) or (c) -- later, or would you prefer to discuss them first? Either of those would also resolve the file layout convention issue naturally, since new files would follow proper conventions from the start. One more thing: there are no ECPG example programs or regression tests for RPR yet. I'd like to propose adding them. Shall I draft an initial set, or would you prefer to coordinate with the ECPG maintainers first? Best regards, Henson --000000000000fe3129064c63452b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi, Tatsuo

Does "a zero-length match" mean "an empty match"?

Yes, they refer to the same thing. =C2=A0"Ze= ro-length match" is the more
common term in general regex implement= ations (PCRE2, Perl, Python,
Java, etc.[1]), but the RPR standard (ISO/I= EC 19075-5, Section 4.12.2)
uses "empty match" exclusively.
[1] h= ttps://www.regular-expressions.info/zerolength.html
=C2=A0
<= br>
BTW, currently we place all nfa_* functions at the bottom of
nodeWindowAgg.c.=C2=A0 However nodeWindowAgg.c in master branch places &quo= t;API
exposed to window functions" at the bottom of the file. Do you think we should follow the way?

Yes, we should follow master'= s convention.=C2=A0 I see three options:

=C2=A0 (a) Reorder within n= odeWindowAgg.c: move the nfa_* functions up and
=C2=A0 =C2=A0 =C2=A0 kee= p the "API exposed to window functions" section at the bottom,=C2=A0 =C2=A0 =C2=A0 matching master's layout.

=C2=A0 (b) Separ= ate file under src/backend/executor/, keeping it close to
=C2=A0 =C2=A0 = =C2=A0 nodeWindowAgg.c while making the boundary explicit.

=C2=A0 (c= ) A dedicated src/backend/rpr/ directory modeled on
=C2=A0 =C2=A0 =C2=A0= src/backend/regex/, giving the NFA engine its own namespace.
=C2=A0 =C2= =A0 =C2=A0 This could also be an opportunity to consolidate the existing=C2=A0 =C2=A0 =C2=A0 src/backend/optimizer/plan/rpr.c into the same direct= ory.

For now (a) is the safest change.=C2=A0 Longer term, (b) or (c)= would make
more sense -- especially when we extend to MATCH_RECOGNIZE (= R010),
where the NFA engine will need to be shared across both code path= s.
Either way, the NFA engine can be exposed via a header so that R010can share it without further restructuring.

Since the NFA algorith= m is not familiar territory for most DBMS
developers, it would also be w= orth preserving the detailed algorithm
description posted earlier in thi= s thread -- either as structured
comments or as a dedicated README along= side the code.

What do you think?=C2=A0 Should we start with (a) now= and revisit the
broader restructuring approaches -- (b) or (c) -- later= , or would you
prefer to discuss them first?=C2=A0 Either of those would= also resolve the
file layout convention issue naturally, since new file= s would follow
proper conventions from the start.


One more th= ing: there are no ECPG example programs or regression tests
for RPR yet.= =C2=A0 I'd like to propose adding them.=C2=A0 Shall I draft an
initi= al set, or would you prefer to coordinate with the ECPG
maintainers firs= t?


Best regards,
Henson
--000000000000fe3129064c63452b--