Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqhlb-00GF4m-1o for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Feb 2026 01:14:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vqhlZ-00BuoN-0y for pgsql-hackers@arkaria.postgresql.org; Fri, 13 Feb 2026 01:14:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vqhlY-00BuoF-2k for pgsql-hackers@lists.postgresql.org; Fri, 13 Feb 2026 01:14:41 +0000 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vqhlW-00000000N3q-1W3k for pgsql-hackers@postgresql.org; Fri, 13 Feb 2026 01:14:40 +0000 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-2a9296b3926so2625405ad.1 for ; Thu, 12 Feb 2026 17:14:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770945278; cv=none; d=google.com; s=arc-20240605; b=Ln+8fcZf27Jxex6XNvZ1YogMPEe2pOnp0AAChNVwjv9Xg1/FyCmZC3arNSRQjFMAOg v0s5B4Bp8LONcLAtJ6RxsMMCYnZlQen8jAMgBUHTUAuKT2HLKL4uG4Do/qn1X1ORbWzB lPjLW2yFvV9EUxeOhaHs1nHjXVNexX/RU5oe1imgLcACFoPyGyr/FuN+qDm1PrP/OJ3/ iCFJr+VNiOyeuBUx/pU3F+OPIalU1LTCv5QYAGDvPf1rpO1SxEjJ0XVA+HeVpc8m7ZO0 Do6smHdg923qlQkXPIx6msd/lHGz7bu1rEZtaHJr2NY+pt/bA9AkveaXr+lqY18DRbVt vc+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:dkim-signature; bh=inPrvBK9aE1T7zLL90kewlyA15UmtThZOVK4SQwNFgE=; fh=l4xxSZy3S3m+a96C3jLWCJy+c2+DqtzgqCPc/44vEl0=; b=RnH41XymZIRxNcHF/hVZcWajl/Rnd8P7LDfVAcE40COyir2ZCg2sOqcoe7zrywd6Hg C++jsQ81gvvr2qZZ0Lt8y6pDwd9dwqsGugI2Pv9tBiueBUVKBCj1x0SWRMD2qeumE1Kc 6s8X8W/x8Wb0mJ0F3uCbjE0VLHdsol6zvfbAOjVsCzDt2JTVQBcumKIdkj/0g774Zi16 gCvKm2cstngfGQkb3TexSo2Pn+rd18d/aSQkiMxftUxTsexVrxeoqoHIv4Fpvb6nYDmN JcS33a8TJnstoROotkcDytbGxL477ihiCpPyOZF5yC/oMFQ9rNC4ht7EV3+9qC/cK4bh miGA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770945278; x=1771550078; darn=postgresql.org; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=inPrvBK9aE1T7zLL90kewlyA15UmtThZOVK4SQwNFgE=; b=cD52KHIQsIhCrHG6JpCoqjreb9IHZghfFW8vWK6C8qKCofI2LGE7+Ysvv2A45GfFyF gfV7M11vmEj09N8qhLYwQQDst2OQRReA3vLAIgv4OuqmJCteYR2pzqfZSwTQaT6qb90U Vbqo5GqFDqnYFmqzgVXSuigqL/g0htwRoOFHMmt1HtPfePV0phH1GgeHZpELVqMC/x73 PZovRqYOhapITYkksFV2bQ/b4Wn74sWimolJwl1wXGmUwu1K/3xfbPCX1p2KurWlhJt8 teCfZ7/uKBhZlvCHQkj8sHzTghBZctBIq1/ktFBp5L490v4jOkdCfjzHeMBQm8fUFo+A jkog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770945278; x=1771550078; h=cc:to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=inPrvBK9aE1T7zLL90kewlyA15UmtThZOVK4SQwNFgE=; b=ZRm+vx+jpJHCZVRp65XV94bR+2B2m5F3dw90yA08PAWSLwlmB/3lp1qFl49Q8irvMc em/wl9JJL3ZRXuAfgA5iYlOaFo84lf79Jaarz+gEZUZikfDHj/BXwM4bpdgKu9HKGhFA RIEDNSZ2h7F+9j0aVzTjeBfNzn++msbvr95a9TDxKbn1iYBhv/BvEEXZjuUxqvXFni+M hMlyDD/6viHy1AxaGFWVfgd+mT8bp/x6OPgEPbk4fYgOjF5soBGQv2S5eV/O6BIDTknR QPQFKL2K/oOCOmcjGOKTsIljcD6CujFqRElq6XqHV/+6hOJ+hkqmNt671RJfVdHupTbo sOJw== X-Forwarded-Encrypted: i=1; AJvYcCWSLsitWa6LmI8qj3m/IoWlDljRcvH91LH/3iQZ/UIKss/qCOiTlOqtBuBN3DWbtij1SFfAW04iJFnCm8C1@postgresql.org X-Gm-Message-State: AOJu0Yw8qJePLdwK13YL9GSSBmBdX7wmSwyL5tgenXAgsCCejjZNNk+f xjteH8RPbtjcynBBLQWPKfuGASxEyt6qlRTAPqEOoPGUhKl/jyIsV7l1rX4Fk210guI5gTG5MQm +zu5y9xpSHNPQN5txNgwIQnlRIflqyZ0= X-Gm-Gg: AZuq6aIhSL0i2kOcCTFJeznofJBOAaDjCwdNUEYDYJBIurZu5jhAu9m1diPWyCldx8E dzI8tipuP7KVYgGjo4zuddEEk01pVFoP6hIp/lydwcgq3qylwOpocwDDFer04p8RFMwsLv56+O4 TOda44Jh93Tp59nt9GaMARP6iqBpf6zRbYiKeFYQLGIeOGdutpBABf1p+I0Uc+Wttzit6CO29uf ASFFDdynGNE50v3sUk0JvzHAeZnzIbDonkq3nui74nWeA1W6Pik++YjzV2RZ/8jungJJQhUnLcC oHMMmJKdp4J0Jr2AEb1C4ixxtf6lONGy4L/j/4o= X-Received: by 2002:a17:903:f8b:b0:2a9:62ce:1c0c with SMTP id d9443c01a7336-2ab50519c00mr1751585ad.6.1770945278056; Thu, 12 Feb 2026 17:14:38 -0800 (PST) MIME-Version: 1.0 References: <20260209.190209.470246357440525054.ishii@postgresql.org> <20260211.125032.1207789314761104094.ishii@postgresql.org> <20260212.101701.1434868071304923886.ishii@postgresql.org> In-Reply-To: Reply-To: assam258@gmail.com From: Henson Choi Date: Fri, 13 Feb 2026 10:14:26 +0900 X-Gm-Features: AZwV_QiF8Fi2DLudmNaEwmz6BvXOZO9IUBiCtoJYav36S-h7zWvXnCjkgPJpZ84 Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii Cc: ssam258@gmail.com, vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="000000000000371b5c064aaa54a5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000371b5c064aaa54a5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Tatsuo, However, this raises interesting questions: should we optimize patterns > by removing {0} quantifiers or simplifying them? And if so, how should > we handle patterns that become empty after such optimization? > > For example: > - PATTERN (A{0}) =E2=86=92 empty pattern > - PATTERN (A{0} B{0}) =E2=86=92 empty pattern > - PATTERN (A{0} B) =E2=86=92 PATTERN (B) after optimization > > Empty patterns would result in zero-length matches, which our current > implementation explicitly treats as invalid (see initialAdvance flag > logic in nodeWindowAgg.c). > > More importantly, I recall that zero-length matches caused serious > issues during development, which is why we added logic to explicitly > avoid them. > > The reason I cannot immediately provide a concrete plan for A{0} > support is that I need to deeply understand the semantic meaning of > zero-length matches in the SQL standard first. Without this > understanding, any implementation approach could be fundamentally > flawed. > > Specifically, I need to investigate: > - What zero-length matches mean semantically in RPR > - How to handle empty patterns according to the standard > - The correct behavior when a pattern optimizes to nothing > > After the current code review phase is complete, I'm also considering > setting up an Oracle test environment to observe how it handles these > edge cases. This could provide valuable insights into the expected > behavior, especially for zero-length matches and empty patterns. > Our current implementation cannot support A{0} due to a structural limitation. The reduced_frame_map uses row-based representation (reduced_frame_map[pos] =3D val), which can only express matches consuming at least one row. It cannot represent zero-length matches that occur between rows without consuming any row position. Patterns like A{0}, A*, or A? can produce zero-length matches with no row to mark as RF_FRAME_HEAD and no position to register in the frame map. We currently prevent this using the initialAdvance flag (nodeWindowAgg.c), which skips FIN recording during initial epsilon transitions. Supporting A{0} requires either restructuring reduced_frame_map to handle virtual positions, or separate handling for zero-length matches. Before choosing an approach, we need clarity on what the SQL standard expects for zero-length match semantics (output generation, aggregate behavior, etc.). Given this structural limitation, I'd like to ask: should we keep the current initialAdvance mechanism (which prevents zero-length matches) and handle A{0} separately? Best regards, Henson --000000000000371b5c064aaa54a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Tatsuo,

<= /div>
However, this raises interesting q= uestions: should we optimize patterns
by removing {0} quantifiers or sim= plifying them? And if so, how should
we handle patterns that become empt= y after such optimization?

For example:
- PATTERN (A{0}) =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0=E2=86=92 empty pattern
- PATTERN (A{0} B{0})= =C2=A0 =C2=A0 =E2=86=92 empty pattern
- PATTERN (A{0} B) =C2=A0 =C2=A0 = =C2=A0 =C2=A0=E2=86=92 PATTERN (B) after optimization

Empty patterns= would result in zero-length matches, which our current
implementation e= xplicitly treats as invalid (see initialAdvance flag
logic in nodeWindow= Agg.c).

More importantly, I recall that zero-length matches caused s= erious
issues during development, which is why we added logic to explici= tly
avoid them.

The reason I cannot immediately provide a concret= e plan for A{0}
support is that I need to deeply understand the semantic= meaning of
zero-length matches in the SQL standard first. Without this<= br>understanding, any implementation approach could be fundamentally
fla= wed.

Specifically, I need to investigate:
- What zero-length matc= hes mean semantically in RPR
- How to handle empty patterns according to= the standard
- The correct behavior when a pattern optimizes to nothing=

After the current code review phase is complete, I'm also consi= dering
setting up an Oracle test environment to observe how it handles t= hese
edge cases. This could provide valuable insights into the expected<= br>behavior, especially for zero-length matches and empty patterns.

Our current implementation cannot support A{0} due to a st= ructural limitation.

The reduced_frame_map uses row-based representa= tion (reduced_frame_map[pos] =3D val),
which can only express matches co= nsuming at least one row. It cannot represent
zero-length matches that o= ccur between rows without consuming any row position.

Patterns like = A{0}, A*, or A? can produce zero-length matches with no row to mark
as R= F_FRAME_HEAD and no position to register in the frame map.

We curren= tly prevent this using the initialAdvance flag (nodeWindowAgg.c),
which = skips FIN recording during initial epsilon transitions.

Supporting A= {0} requires either restructuring reduced_frame_map to handle
virtual po= sitions, or separate handling for zero-length matches. Before
choosing a= n approach, we need clarity on what the SQL standard expects for
zero-le= ngth match semantics (output generation, aggregate behavior, etc.).

Given this structural limitation, I&= #39;d like to ask: should we keep the current
initialAdvance mechanism (= which prevents zero-length matches) and handle A{0}
separately?

Best regards,
Henson=C2=A0
--000000000000371b5c064aaa54a5--