Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpLL8-0068h1-0m for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Feb 2026 07:05:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vpLL7-009Wq7-0G for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Feb 2026 07:05:44 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpLL6-009Wpz-2X for pgsql-hackers@lists.postgresql.org; Mon, 09 Feb 2026 07:05:44 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vpLL3-00000001Enl-3zkB for pgsql-hackers@postgresql.org; Mon, 09 Feb 2026 07:05:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=ddWpY0Acscj38vaLIrBrzk5xIidRaqUn2g3ldivtMSc=; b=Z7GxK9yvfwEheahmAPvfBfeXld Uoj39LbcZSE0KtFRfgpUEp5vuBRzrIbmSSAkwlV04IKQ/Su8soFNDshv6clqYuIni2DQXt61RKCzN 7wj+X7tLz45v5dpUxNoFa013NsIzLILxtIHZKMffi1AegFw/rjV1GVlBJJMZalCV2xGiX1AoJSAmQ aXk8IX3PJqw4XRJ9IzoJG5yHvn54WWWfGMY1jZRK9LTW8yQ1l92XXLakpBMeMZI4ICeRYcphnovGU y1FV/TiXHxrvfl4Q4h8WdrdtXuT4LXTIjCHP9PSaYTFHu78Wk4hdbFSvK7EEUScaet90C+g+wE6+I +ceAqDpw==; Received: from [2409:11:4120:300:7c1a:c224:6df3:53a5] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpLKv-004Ukm-22; Mon, 09 Feb 2026 07:05:35 +0000 Date: Mon, 09 Feb 2026 16:05:20 +0900 (JST) Message-Id: <20260209.160520.873483599710391856.ishii@postgresql.org> To: assam258@gmail.com Cc: jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, vik@postgresfriends.org, er@xs4all.nl, peter@eisentraut.org, pgsql-hackers@postgresql.org Subject: Re: Row pattern recognition From: Tatsuo Ishii In-Reply-To: References: <20260207.133143.1951390390459539453.ishii@postgresql.org> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:7c1a:c224:6df3:53a5 (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Henson, > Sorry, I forgot to attach the patch file in my previous email. Thanks for the patch! Patch cleanly applied and regression test passed here. >> On a side note, I feel that the automata characteristics of RPR -- >> state transitions, context management, absorbability, etc. -- tend to >> require a much larger volume of test cases than typical RDBMS/SQL >> features. This seems unavoidable given the combinatorial nature of >> NFA execution. I agree. A complex feature requires complex tests. We cannot avoid it. BTW, I have tested RPR with large partition/reduced frame (up to 100k rows). Following query took over 3 minutes on my laptop. EXPLAIN (ANALYZE) SELECT aid, count(*) OVER w FROM generate_series(1,100000) AS g(aid) WHERE aid <= 100000 WINDOW w AS ( ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START UP+) DEFINE START AS TRUE, UP AS aid > PREV(aid) ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=4337.40..4504.08 rows=33333 width=14) (actual time=224171.589..224320.055 rows=100000.00 loops=1) Window: w AS (ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) Pattern: start up+ Storage: Disk Maximum Storage: 4096kB NFA States: 200000 peak, 5000050001 total, 0 merged NFA Contexts: 100001 peak, 100001 total, 1 pruned NFA: 1 matched (len 100000/100000/100000.0), 0 mismatched NFA: 0 absorbed, 99998 skipped (len 2/99999/50000.5) Buffers: shared hit=3, temp read=400968 written=391 -> Sort (cost=3754.09..3837.42 rows=33333 width=4) (actual time=14.938..38.911 rows=100000.00 loops=1) Sort Key: aid Sort Method: quicksort Memory: 3073kB Buffers: shared hit=3, temp read=171 written=171 -> Function Scan on generate_series g (cost=0.00..1250.00 rows=33333 width=4) (actual time=5.799..11.672 rows=100000.00 loops=1) Filter: (aid <= 100000) Buffers: temp read=171 written=171 Planning: Buffers: shared hit=6 read=7 Planning Time: 0.185 ms Execution Time: 224324.532 ms It ran without eating all memory on my box (according to top command, it was around 200MB). Great! However it took nearly 4 minutes to complete the query. I think this is because there are 100k rows partition/reduced frame, and RPR needs to keep 100k contexts plus 200k states at the peak. I also ran "perf top" and took following profile. 19.87% postgres [.] nfa_match 18.92% postgres [.] nfa_advance_state 13.60% postgres [.] nfa_cleanup_dead_contexts 11.25% postgres [.] row_is_in_reduced_frame 11.06% postgres [.] nfa_state_clone 5.52% postgres [.] nfa_route_to_elem 4.39% postgres [.] nfa_add_state_unique.isra.0 4.32% postgres [.] nfa_state_alloc 2.36% libc.so.6 [.] __memset_avx2_unaligned_erms 0.63% postgres [.] memset@plt The reason for nfa_match is called so many times is, it's in a a loop which iterate over all the contexts: for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next) I don't try to say we should enhance nfa modules in the case when there is a large reduced frame, because I don't think this is a realistic use case and I doubt it's worth to fight against unrealistic scenario. Note that even if the partition size is large, if we change the PATTERN to: PATTERN (START UP{,3}) which generates only 4 rows reduced frames, we get very good performance: test=# test=# test=# EXPLAIN (ANALYZE) SELECT aid, count(*) OVER w FROM generate_series(1,100000) AS g(aid) WHERE aid <= 100000 WINDOW w AS ( ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AFTER MATCH SKIP PAST LAST ROW INITIAL PATTERN (START UP{,3}) DEFINE START AS TRUE, UP AS aid > PREV(aid) ); --------------------------------------------------------------------------------------------------------------------------------------------- WindowAgg (cost=4337.40..4504.08 rows=33333 width=14) (actual time=36.586..91.988 rows=100000.00 loops=1) Window: w AS (ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) Pattern: start up{0,3} Storage: Memory Maximum Storage: 17kB NFA States: 8 peak, 325001 total, 0 merged NFA Contexts: 5 peak, 100001 total, 0 pruned NFA: 25000 matched (len 4/4/4.0), 0 mismatched NFA: 0 absorbed, 75000 skipped (len 1/3/2.0) Buffers: temp read=171 written=171 -> Sort (cost=3754.09..3837.42 rows=33333 width=4) (actual time=36.561..40.933 rows=100000.00 loops=1) Sort Key: aid Sort Method: quicksort Memory: 3073kB Buffers: temp read=171 written=171 -> Function Scan on generate_series g (cost=0.00..1250.00 rows=33333 width=4) (actual time=14.297..28.692 rows=100000.00 loops=1) Filter: (aid <= 100000) Buffers: temp read=171 written=171 Planning Time: 0.087 ms Execution Time: 94.810 ms (18 rows) I think this is more realistic use case than former. If I were correct, we don't need to work on current nfa code to squeeze better performance for unrealistic 100k reduced frame case. I maybe wrong and I would love to hear from others especially, Henson, Vik, Jacob. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp