Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr3Ri-005cwn-0o for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 00:23:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vr3Rh-00Gl6Z-0h for pgsql-hackers@arkaria.postgresql.org; Sat, 14 Feb 2026 00:23:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr3Rg-00Gl6R-2f for pgsql-hackers@lists.postgresql.org; Sat, 14 Feb 2026 00:23:36 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vr3Rd-00000000WlT-2sdl for pgsql-hackers@postgresql.org; Sat, 14 Feb 2026 00:23:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=LEnaJhZtbksKHRjZXy5Y5iPqKwJJn5X212DuH91BJU4=; b=V0+FYzJSnBV5POiMGAFZ87oAzS hqOMA8+WcnlsA+Ft9SCq5hQyy1gl+f7ngWMSdT4IihVNNSOEUFtnzxYcWlSeNVxOxWkWmIuVos25Q fIOUBhlWbW79XYq/zVgKeE/ReeDRJkatuBTgDfu4dI5X+/1TBAQZBCr4hsTv/fjmqS6x9I9h6xdXf HXkfNRUCjPguL1CRJFPNDJyup59R9Ke2NJcNlCUkt01t9vGnRsaLRhvsnN3gIYRq+lwhVzsP7mR/2 GQKdMAmFPUEpjjB8lZ6esRJ0w8R+LbAlcAg4H8MGupNUsT/NjeEFKtmKEbJUsYCD52SMiay03jnpc I0+2RoJw==; Received: from [2409:11:4120:300:988e:9d8f:fb2a:6767] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vr3RW-001mEq-1b; Sat, 14 Feb 2026 00:23:28 +0000 Date: Sat, 14 Feb 2026 09:23:10 +0900 (JST) Message-Id: <20260214.092310.1353606038244245395.ishii@postgresql.org> To: assam258@gmail.com Cc: vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Subject: Re: Row pattern recognition From: Tatsuo Ishii In-Reply-To: References: <20260212.101701.1434868071304923886.ishii@postgresql.org> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:988e:9d8f:fb2a:6767 (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > Hi Tatsuo, > > However, this raises interesting questions: should we optimize patterns >> by removing {0} quantifiers or simplifying them? And if so, how should >> we handle patterns that become empty after such optimization? >> >> For example: >> - PATTERN (A{0}) → empty pattern >> - PATTERN (A{0} B{0}) → empty pattern >> - PATTERN (A{0} B) → PATTERN (B) after optimization Those optimizations are valid for now because they will produce the same output tables. However, if we implement MEASURES, we cannot do such that optimization. For example, PATTERN (A{0} B) allows to inquire count(A.*), while PATTERN (B) will not. >> Empty patterns would result in zero-length matches, which our current >> implementation explicitly treats as invalid (see initialAdvance flag >> logic in nodeWindowAgg.c). >> >> More importantly, I recall that zero-length matches caused serious >> issues during development, which is why we added logic to explicitly >> avoid them. >> >> The reason I cannot immediately provide a concrete plan for A{0} >> support is that I need to deeply understand the semantic meaning of >> zero-length matches in the SQL standard first. Without this >> understanding, any implementation approach could be fundamentally >> flawed. >> >> Specifically, I need to investigate: >> - What zero-length matches mean semantically in RPR There are four kinds of pattern matching results in RPR. 1. match: Pattern successfully matches and maps to a sequence of rows count aggregate returns > 0 count aggregate as a pattern measure[1] returns > 0 other aggregates return non null classifier() returns non null result ordinary column references are null/non null 2. unmatch: Pattern fails to match a sequence of rows. count aggregate returns 0 count aggregate as a pattern measure returns 0 other aggregates return null classifier() returns null ordinary column references are null/non null 3. no match: Pattern matching skipped due to AFTER MATCH clause count aggregate returns 0 count aggregate as a pattern measure returns null other aggregates return null classifier() returns null ordinary column references are null/non null 4. empty match: Pattern successfully matches but failed to map to a row count aggregate returns 0 count aggregate as a pattern measure returns 0 other aggregates return null classifier() returns null ordinary column references are null/non null As you can see there's no difference among 2, 3 and 4 in terms of the output table except "count aggregate as a pattern measure". However we have not implemented MEASURES, and we can ignore the difference for now. [1] e.g. "MEASURES count(*) AS mcount" >> - How to handle empty patterns according to the standard >> - The correct behavior when a pattern optimizes to nothing >> >> After the current code review phase is complete, I'm also considering >> setting up an Oracle test environment to observe how it handles these >> edge cases. This could provide valuable insights into the expected >> behavior, especially for zero-length matches and empty patterns. >> > > Our current implementation cannot support A{0} due to a structural > limitation. > > The reduced_frame_map uses row-based representation (reduced_frame_map[pos] > = val), > which can only express matches consuming at least one row. It cannot > represent > zero-length matches that occur between rows without consuming any row > position. I think we don't need to worry about this ("It cannot represent zero-length matches that occur between rows without consuming any row position." problem) In Window RPR (R020) if an empty match is found, the row position immediately moves to next row. This is because Window clause requires that number of rows in the input table and the output table exactly matches. If we insert unmatched rows between the rows, the number of rows in the output table is different from the number of rows in the input table, which is not allowed. > Patterns like A{0}, A*, or A? can produce zero-length matches with no row > to mark > as RF_FRAME_HEAD and no position to register in the frame map. For empty matching rows we just record the starting row position in reduced_frame_map. We may need to introduce new DEFINE something like RF_EMPTY though. And row_is_in_reduced_frame returns -1 if RF_EMPTY is detected. This makes an empty match case behaves like an unmatch case but it should be ok (see above). > We currently prevent this using the initialAdvance flag (nodeWindowAgg.c), > which skips FIN recording during initial epsilon transitions. > > Supporting A{0} requires either restructuring reduced_frame_map to handle > virtual positions, or separate handling for zero-length matches. Before > choosing an approach, we need clarity on what the SQL standard expects for > zero-length match semantics (output generation, aggregate behavior, etc.). See above. In summary I think we don't need to modify the structure of reduced_frame_map. > Given this structural limitation, I'd like to ask: should we keep the > current > initialAdvance mechanism (which prevents zero-length matches) and handle > A{0} > separately? > > Best regards, > Henson Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp