Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0BKu-001gdQ-2Z for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 04:38:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0BKq-007SPr-2o for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 04:38:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0BKq-007SPj-1f for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 04:38:17 +0000 Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0BKo-000000024y1-0XDN for pgsql-hackers@postgresql.org; Wed, 11 Mar 2026 04:38:16 +0000 Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-35a03da7bd6so178574a91.1 for ; Tue, 10 Mar 2026 21:38:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773203892; cv=none; d=google.com; s=arc-20240605; b=jxcKJlwjU+Wd8yA/m0ESANTjIDJX1EwG/AVY2cSaLthcyDiD2EHdWWR2lioJeCzPzf ojMpPuvuyUIbN+IGQsZ8FYO1wcbveCbf2pLeN7LuLmqqdhK9uti13tcRRFwh9iLZ1bpH d9mSRgjQXEfC4q4U4AqtzoQFSupdvjs0DK8eWFy3jVDesVVxM+oy90ONFOhx3WOa83WL h+nZOyO4yT590prgjGyuWrAnNwZ9UCbQ7h7eMIdGh+LUhDzKPAmsRWVMDK1blDJayrCI zj2tgHZLCOlOIce7iWOKl/U2h6mjRS8AS/apNdfcNHtSoJ/L9ca3u7WLmocIGnilHJMP TGYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=JcyDaXZEP7bsvQ+FsHvebOnybyZGjVQ+G4gkEo5cOeA=; fh=KG2Jm2X0Iy4QjrqLMPcMl8dzLi4rpLMDhpsNKDL2axk=; b=lN2meEr/ufokVTaT1ayzkVNwdDUqSBkgCWpfDQjf/6DY+CmXY1kGEt+M1FF4+Idzum 9KvKLN1nmBQ963xzTj1cxs48t6J9vW2M0HmP+qUYvlxyVKxN0H+AX1XsCyDONXST8SoB gM1b74MT96J5gqx7zdv02agr3xQts1zYDcTtXuadHmBWs6D+EE+hLbIfyWTMZZM5wOhg HFflDzJ823ZVQPmyZWhnWOhlNkLmoagrM0Xj0IuMXhvStPjp3ziHsAcHLaxSsCy0iEbV PsZrZZ61VFdESUrsp93dH6QVOP/4BMcFhHpbDZR3M6UsIN5SgNKecod/Hzcxt5qenIvT cCYg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773203892; x=1773808692; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JcyDaXZEP7bsvQ+FsHvebOnybyZGjVQ+G4gkEo5cOeA=; b=CouV2VA7KYepw5qTCgM2qgX8P34I6y3AU0OLysGhJoPfoiKUkTYHmO7Vho2src9Bor hsiw8Q4j9JBHRfxedt2bErB279wqNIF0gWYAlwUQ9Qcd0+WKlESsZJuWdqjKwt//ce9X OrwDf6X8rglNA4lQZJyOdpkyvFfo6hfA7BR+4EnBnCSXkmAEYKeQYkXek6JmqI1xoHNi ALAxNKSN5hQmJMFm3KA/mabqNGgRSQeUbWE9lmPzKA4NvuVTgmjRzE0JPXr/VBgGRlDb X0m9oslrq3zBkbM+vobWeOPwutm3JGBIb4PzU+/6o+nfWeoLqMlcHAEsYHIXnkWTJiRS GsDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773203892; x=1773808692; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JcyDaXZEP7bsvQ+FsHvebOnybyZGjVQ+G4gkEo5cOeA=; b=ZNtP2lKuWUOVK0srHpJgpQr0uohvd5Vm1oe8//sOUs6yKpNxNBgex9WNnPVFE/DiQZ 3WjV7U5NGuiLliuqJq4iMlcoDXWq0G/oHOLKYGvAktKS7JUBQHqHGNuUxCtTMb3qWWW9 WN2Mu9T/bRk320SdVYKW8tQHHtGqc8uWdHVKcWF6Vlsn6CBKE7xHQTfwP0EZiEZrH5UL QSwBXNEGBg5XcC7Oy7+AiGWf+hDuBvybqQIb0Hh5NDh/Iu8QQGsZSCGWomLCSUGS9+Vz Ii3wLfr/cNqmJ/jgEXcdhQyS0PXDd7bvjDUeSlhiQJ46TmUHHl7LQrInvyiuHOQWASs8 vu6A== X-Forwarded-Encrypted: i=1; AJvYcCUDDMfRZkzH8Hn97GMwcXSBrAYbvSx6uA1RrPWgELyIvGi6+CzZFVv6+/rTMs5+x8oGKdO+yakeEtnsn+FQ@postgresql.org X-Gm-Message-State: AOJu0YzeKNcA+NUxYpzurHASFYbHf8zOmi0GG8M4HChDghpueqAXkYEd d18Exae+LsUajd6AhgVAS2w7LpOAb6q6lySj3hspXzj1DvJ+Nxaa42BXs8yqWptDY/D9ac+Q94b bHhnModOtMk4T+pc7TNwEnxb1P+i+k5I= X-Gm-Gg: ATEYQzyY9sfoXbVhzy+DqHd1uIwlXEnCpCbOEuf+IMw1EnzAJJcWqaQgLw1lBAxFOH8 eNve33VbNldpXQvkGDy/5DF84nBlMmURvUm+y4htSEqReG5rCjZc0S87uuEjAC1EXGMC66I9aE1 rruTg73hHBVihJgz1tlY/vCsmF3JSTdI4PSqed15NXoyH02tbK2yMs9VpY2YASg5oUGjv/y4h1l vqDVqO+RlRHPDhR0JYETJ3SQpwD317dXi5eFjaPvsZYFGyHGQoQYBUh0J2D3mqMF/nmTYUmHjnA sbQYZms52cJqP61qIw== X-Received: by 2002:a17:90b:5385:b0:356:8719:f5b8 with SMTP id 98e67ed59e1d1-35a0134d22amr1246136a91.29.1773203891857; Tue, 10 Mar 2026 21:38:11 -0700 (PDT) MIME-Version: 1.0 References: <20260309.120802.1845076903520202301.ishii@postgresql.org> <20260309.142202.1739855502263731478.ishii@postgresql.org> In-Reply-To: <20260309.142202.1739855502263731478.ishii@postgresql.org> From: SungJun Jang Date: Wed, 11 Mar 2026 13:37:59 +0900 X-Gm-Features: AaiRm51HiI4ZAFzROZ4fO8tWcwTCdNGckzG2bOMk-UTns67X27vATWghq82GXpk Message-ID: Subject: Re: Row pattern recognition To: Tatsuo Ishii , assam258@gmail.com Cc: vik@postgresfriends.org, er@xs4all.nl, jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, peter@eisentraut.org, pgsql-hackers@postgresql.org Content-Type: multipart/alternative; boundary="00000000000016ac39064cb83402" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000016ac39064cb83402 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi hackers, I ran a cross-validation test comparing PostgreSQL RPR (current patch) and Trino MATCH_RECOGNIZE to verify both result correctness and performance characteristics across different scales. The dataset consists of two segments where rows are arranged sequentially as: A =E2=86=92 B =E2=86=92 C =E2=86=92 D Each segment contains a fixed distribution of categories A/B/C/D. Category E does not exist in the dataset and is used to test match failure behavior. Test scales ranged from 20,000 to 100,000 rows. All tests were executed using: AFTER MATCH SKIP PAST LAST ROW Test Environment Item | PostgreSQL | Trino ------------+-------------------------+----------------------- Version | 19devel (RPR patch) | 471 Runtime | Local | Local Docker container Platform | Linux x86_64 | Linux x86_64 RPR syntax | WINDOW clause | MATCH_RECOGNIZE Dataset Structure Each segment contains sequential categories: A : ~1/3 of rows B : ~1/3 of rows C : ~1/3 of rows D : 1 row Example at 1x scale: Segment size : 10,000 rows Total rows : 20,000 Test Cases Test 1 Pattern: PATTERN (A+ B+ C+ D) Expected result: 1 match per segment (2 total) Reason: Data is arranged exactly as A =E2=86=92 B =E2=86=92 C =E2=86=92 D. Test 2 Pattern: PATTERN (A+ B+ C+ E) Expected result: 0 rows (no match) Reason: Category E does not exist in the dataset. Correctness Results Across all scales both systems returned identical results. Test 1 =E2=86=92 1 match per segment (2 total) Test 2 =E2=86=92 0 rows Performance Results Scale | Total Rows | PG Test1 | PG Test2 | Trino Test1 | Trino Test2 ------+------------+----------+----------+-------------+------------- 1x | 20,000 | 19 ms | 17 ms | ~0.3 s | ~358 s 2x | 40,000 | 37 ms | 37 ms | ~0.8 s | ~1,364 s 3x | 60,000 | 57 ms | 51 ms | ~1.5 s | ~4,424 s 4x | 80,000 | 73 ms | 68 ms | ~2.3 s | ~9,989 s 5x | 100,000 | 99 ms | 92 ms | ~3.3 s | ~20,014 s Note: Trino measurements are adjusted to remove JVM startup overhead (~2.4s measured via SELECT 1). Observations Match success (Test 1) Both systems scale approximately linearly because the entire segment is consumed in a single pass after a successful match. PostgreSQL shows lower absolute latency. Match failure (Test 2) PostgreSQL maintains near-linear scaling. This appears to be due to its NFA implementation combined with Context Absorption, which discards redundant matching contexts when they reach the same NFA state but start later in the input. Example: Ctx1 start=3D0 length=3D2 Ctx2 start=3D1 length=3D1 Ctx1 subsumes Ctx2, so Ctx2 is discarded. This keeps the number of active contexts small and prevents quadratic growth. Measured scaling: 17 ms =E2=86=92 92 ms (1x =E2=86=92 5x) which is consistent with O(n). Trino also uses an NFA approach but appears to lack Context Absorption or a similar optimization. As a result, it explores matching contexts from all possible start positions, leading to rapidly increasing backtracking cost. Measured scaling: 358 s =E2=86=92 20,014 s (1x =E2=86=92 5x) This exceeds theoretical O(n=C2=B2) growth and appears closer to O(n=C2=B2)= =E2=80=93O(n=C2=B3) in practice, likely compounded by JVM memory and GC overhead. Summary Correctness PostgreSQL RPR and Trino MATCH_RECOGNIZE produce identical matching results across all tested scales. Performance Match success: Both systems scale roughly linearly. Match failure: PostgreSQL maintains O(n) scaling while Trino shows O(n=C2=B2) or worse beh= avior. At the largest tested scale: PostgreSQL : ~92 ms Trino : ~20,014 s (=E2=89=88 5.6 hours) PostgreSQL is therefore approximately 217,000=C3=97 faster in this scenario= . Conclusion Both systems use NFA-based pattern matching. The key difference appears to be Context Absorption in PostgreSQL, which removes redundant matching contexts and guarantees linear scaling even when patterns fail to match. This optimization prevents the non-linear performance degradation typically associated with row pattern recognition. Best regards SungJun 2026=EB=85=84 3=EC=9B=94 9=EC=9D=BC (=EC=9B=94) PM 2:22, Tatsuo Ishii =EB=8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1: > Hi Henson, > > >> Excellnt findings! BTW, I realized that we cannot use $1 of function > >> in PATTERN clause like: A{$1}. > >> > >> ERROR: 42601: syntax error at or near "$1" > >> LINE 10: PATTERN (A{$1}) > >> ^ > >> LOCATION: scanner_yyerror, scan.l:1211 > >> > >> Should we document somewhere? > >> > > > > The PATTERN quantifier {n} only accepts Iconst (integer literal) in the > > grammar. When a host variable or function parameter is used (e.g., > > A{$1}), the user gets a generic syntax error. > > Ok. > > > Oracle accepts broader syntax and validates later, producing an error > > at a later stage rather than a syntax error at parse time. > > > > PostgreSQL itself already has precedent for this pattern -- in fact, > > within the same window clause, frame offset (ROWS/RANGE/GROUPS) accepts > > a_expr in the grammar and then rejects variables in parse analysis via > > transformFrameOffset() -> checkExprIsVarFree(). > > > > I'd lean against documenting this. The SQL standard already defines > > the quantifier bound as , so there is nothing > > beyond the standard to call out, and documenting what is *not* allowed > > tends to raise questions that wouldn't otherwise occur to users. > > > > Rather, I think accepting a broader grammar and validating later would > > be the more appropriate response, producing a descriptive error like: > > > > "argument of bounded quantifier must be an integer literal" > > > > I can either include this in the current patch set or handle it as a > > separate follow-up after the main series is committed. What do you > > think? > > I think handing it as a separate follow-up after the commit is enough > unless other developers complain. > > Best regards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp > --00000000000016ac39064cb83402 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi hackers,

I ran a cross-validation test comparing= PostgreSQL RPR (current patch) and Trino MATCH_RECOGNIZE to verify both re= sult correctness and performance characteristics across different scales.
The dataset consists of two segments where rows are arranged sequenti= ally as:

A =E2=86=92 B =E2=86=92 C =E2=86=92 D

Each segment c= ontains a fixed distribution of categories A/B/C/D. Category E does not exi= st in the dataset and is used to test match failure behavior.

Test s= cales ranged from 20,000 to 100,000 rows.

All tests were executed us= ing:

AFTER MATCH SKIP PAST LAST ROW

Test Environment

I= tem =C2=A0 =C2=A0 =C2=A0 =C2=A0| PostgreSQL =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0| Trino
------------+-------------------------+--------= ---------------
Version =C2=A0 =C2=A0 | 19devel (RPR patch) =C2=A0 =C2= =A0| 471
Runtime =C2=A0 =C2=A0 | Local =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| Local Docker container
Platform =C2=A0 = =C2=A0| Linux x86_64 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | Linux x86_64
R= PR syntax =C2=A0| WINDOW clause =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| MATCH_R= ECOGNIZE

Dataset Structure

Each segment contains sequential c= ategories:

A : ~1/3 of rows
B : ~1/3 of rows
C : ~1/3 of rows<= br>D : 1 row

Example at 1x scale:

Segment size : 10,000 rows<= br>Total rows : 20,000

Test Cases

Test 1
Pattern:
PATTE= RN (A+ B+ C+ D)

Expected result:
1 match per segment (2 total)
Reason:
Data is arranged exactly as A =E2=86=92 B =E2=86=92 C =E2= =86=92 D.

Test 2
Pattern:
PATTERN (A+ B+ C+ E)

Expected= result:
0 rows (no match)

Reason:
Category E does not exist i= n the dataset.

Correctness Results

Across all scales both sys= tems returned identical results.

Test 1 =E2=86=92 1 match per segmen= t (2 total)
Test 2 =E2=86=92 0 rows

Performance Results

Sc= ale | Total Rows | PG Test1 | PG Test2 | Trino Test1 | Trino Test2
-----= -+------------+----------+----------+-------------+-------------
1x =C2= =A0 =C2=A0| =C2=A020,000 =C2=A0 =C2=A0| 19 ms =C2=A0 =C2=A0| 17 ms =C2=A0 = =C2=A0| ~0.3 s =C2=A0 =C2=A0 =C2=A0| ~358 s
2x =C2=A0 =C2=A0| =C2=A040,0= 00 =C2=A0 =C2=A0| 37 ms =C2=A0 =C2=A0| 37 ms =C2=A0 =C2=A0| ~0.8 s =C2=A0 = =C2=A0 =C2=A0| ~1,364 s
3x =C2=A0 =C2=A0| =C2=A060,000 =C2=A0 =C2=A0| 57= ms =C2=A0 =C2=A0| 51 ms =C2=A0 =C2=A0| ~1.5 s =C2=A0 =C2=A0 =C2=A0| ~4,424= s
4x =C2=A0 =C2=A0| =C2=A080,000 =C2=A0 =C2=A0| 73 ms =C2=A0 =C2=A0| 68= ms =C2=A0 =C2=A0| ~2.3 s =C2=A0 =C2=A0 =C2=A0| ~9,989 s
5x =C2=A0 =C2= =A0| 100,000 =C2=A0 =C2=A0| 99 ms =C2=A0 =C2=A0| 92 ms =C2=A0 =C2=A0| ~3.3 = s =C2=A0 =C2=A0 =C2=A0| ~20,014 s

Note:
Trino measurements are ad= justed to remove JVM startup overhead (~2.4s measured via SELECT 1).
Observations

Match success (Test 1)

Both systems scale appro= ximately linearly because the entire segment is consumed in a single pass a= fter a successful match.

PostgreSQL shows lower absolute latency.
Match failure (Test 2)

PostgreSQL maintains near-linear scalin= g.

This appears to be due to its NFA implementation combined with Co= ntext Absorption, which discards redundant matching contexts when they reac= h the same NFA state but start later in the input.

Example:

C= tx1 start=3D0 length=3D2
Ctx2 start=3D1 length=3D1

Ctx1 subsumes = Ctx2, so Ctx2 is discarded.

This keeps the number of active contexts= small and prevents quadratic growth.

Measured scaling:

17 ms= =E2=86=92 92 ms (1x =E2=86=92 5x)

which is consistent with O(n).
Trino also uses an NFA approach but appears to lack Context Absorption= or a similar optimization.

As a result, it explores matching contex= ts from all possible start positions, leading to rapidly increasing backtra= cking cost.

Measured scaling:

358 s =E2=86=92 20,014 s (1x = =E2=86=92 5x)

This exceeds theoretical O(n=C2=B2) growth and appears= closer to O(n=C2=B2)=E2=80=93O(n=C2=B3) in practice, likely compounded by = JVM memory and GC overhead.

Summary

Correctness

Postgr= eSQL RPR and Trino MATCH_RECOGNIZE produce identical matching results acros= s all tested scales.

Performance

Match success:
Both syste= ms scale roughly linearly.

Match failure:
PostgreSQL maintains O(= n) scaling while Trino shows O(n=C2=B2) or worse behavior.

At the la= rgest tested scale:

PostgreSQL : ~92 ms
Trino : ~20,014 s (=E2=89= =88 5.6 hours)

PostgreSQL is therefore approximately 217,000=C3=97 f= aster in this scenario.

Conclusion

Both systems use NFA-based= pattern matching.

The key difference appears to be Context Absorpti= on in PostgreSQL, which removes redundant matching contexts and guarantees = linear scaling even when patterns fail to match.

This optimization p= revents the non-linear performance degradation typically associated with ro= w pattern recognition.

Best regards

SungJun

2026=EB=85=84 3=EC=9B=94 9=EC=9D=BC (=EC=9B=94) PM 2:22, Tatsuo Ishii &= lt;ishii@postgresql.org>=EB= =8B=98=EC=9D=B4 =EC=9E=91=EC=84=B1:
Hi Henson,

>> Excellnt findings!=C2=A0 BTW, I realized that we cannot use $1 of = function
>> in PATTERN clause like: A{$1}.
>>
>> ERROR:=C2=A0 42601: syntax error at or near "$1"
>> LINE 10:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PATTERN (A{$1})
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^
>> LOCATION:=C2=A0 scanner_yyerror, scan.l:1211
>>
>> Should we document somewhere?
>>
>
> The PATTERN quantifier {n} only accepts Iconst (integer literal) in th= e
> grammar.=C2=A0 When a host variable or function parameter is used (e.g= .,
> A{$1}), the user gets a generic syntax error.

Ok.

> Oracle accepts broader syntax and validates later, producing an error<= br> > at a later stage rather than a syntax error at parse time.
>
> PostgreSQL itself already has precedent for this pattern -- in fact, > within the same window clause, frame offset (ROWS/RANGE/GROUPS) accept= s
> a_expr in the grammar and then rejects variables in parse analysis via=
> transformFrameOffset() -> checkExprIsVarFree().
>
> I'd lean against documenting this.=C2=A0 The SQL standard already = defines
> the quantifier bound as <unsigned integer literal>, so there is = nothing
> beyond the standard to call out, and documenting what is *not* allowed=
> tends to raise questions that wouldn't otherwise occur to users. >
> Rather, I think accepting a broader grammar and validating later would=
> be the more appropriate response, producing a descriptive error like:<= br> >
>=C2=A0 =C2=A0"argument of bounded quantifier must be an integer li= teral"
>
> I can either include this in the current patch set or handle it as a > separate follow-up after the main series is committed.=C2=A0 What do y= ou
> think?

I think handing it as a separate follow-up after the commit is enough
unless other developers complain.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
--00000000000016ac39064cb83402--