Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpO73-007Bcy-0Q for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Feb 2026 10:03:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vpO71-00A3Rp-2q for pgsql-hackers@arkaria.postgresql.org; Mon, 09 Feb 2026 10:03:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpO71-00A3Rh-0z for pgsql-hackers@lists.postgresql.org; Mon, 09 Feb 2026 10:03:23 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vpO6z-00000001FxR-1Y4g for pgsql-hackers@postgresql.org; Mon, 09 Feb 2026 10:03:22 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:References:In-Reply-To:From:Subject:Cc:To:Message-Id:Date:Sender :Reply-To:Content-ID:Content-Description; bh=JDOm9e7UM6pwhpnWv4Qg4nBWxcazFrkW7IsYGcDjUiM=; b=T+kQTKuxg4ZCVVv0XfILRNEqzb jEV5rduj33vN6FIEso+wtH8gGjyMF6LSoIWlMo+zxk4a8boVQNEEC9Ag6iBFAyQE2Zg5qi/0CBKjI YpVhe3DneLDMGhZXNS10bmBuvKwyQlReRcXDGcPx7mw95zl9JydN58FvRvyRB44FetO4zA833DCbO yVLcuBjWqOeUyGi+HsPkoVplmPtYTlyuRakE0zIMyrtYnwMbE+zpXk8X5oExYIs1RMt20xb0peXvm f1F2Zc/oONEDTmWOBNkWoQOubtVMAEK+0E0HryNLMe+LGmR/YHlpD0Hkl5Kwvi9tMT5UW5OqyCB3G G0/TShxQ==; Received: from [2409:11:4120:300:868e:fbe2:fc59:7840] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vpO6r-004XK9-0f; Mon, 09 Feb 2026 10:03:15 +0000 Date: Mon, 09 Feb 2026 19:02:09 +0900 (JST) Message-Id: <20260209.190209.470246357440525054.ishii@postgresql.org> To: assam258@gmail.com Cc: jacob.champion@enterprisedb.com, david.g.johnston@gmail.com, vik@postgresfriends.org, er@xs4all.nl, peter@eisentraut.org, pgsql-hackers@postgresql.org Subject: Re: Row pattern recognition From: Tatsuo Ishii In-Reply-To: References: <20260209.160520.873483599710391856.ishii@postgresql.org> X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:868e:fbe2:fc59:7840 (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Henson, > Hi Tatsuo, > > Thank you for the thorough testing and performance profiling! > > BTW, I have tested RPR with large partition/reduced frame (up to 100k >> rows). Following query took over 3 minutes on my laptop. >> > > The profiling results are very insightful and help us understand where > the time is spent in extreme cases. Glad to hear that the data is useful. > I agree with your assessment. Let me clarify: > > The pattern itself (START UP+) is realistic - it's a common pattern for > detecting upward trends. However, 100k+ consecutive matches without > interruption is not realistic. Your UP{,3} example (94ms for 100k partition) > demonstrates that the current implementation performs very well for > realistic > use cases. Yes, I agree. > I agree that we should prioritize realistic use cases. That said, there are > potential optimizations that could help: > > 1. Anchored pattern absorption (see my earlier message: > > https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com > ) > > 2. Alt-pruning: In patterns like "A* | B*", once the higher-priority A > branch > has a confirmed match, the B branch can be pruned immediately. Even if B > could continue extending to a longer match, it can never be selected due > to > lexical order semantics―A will always win. This proactive pruning > respects > SQL standard semantics while reducing unnecessary state expansion. > > However, given the complexity of NFA internals, I believe we should take a > step-by-step approach: > > 1. First, stabilize the current RPR patch and prepare it for review > 2. Then, consider optimizations as separate follow-up patches > 3. Each optimization should be well-tested and reviewed independently > > This approach reduces risk and makes review more manageable. The fact that > the current implementation handles even unrealistic 100k-row cases without > crashing (just slowly) shows it's already robust. > > What do you think about this phased approach? I totally agree. > Best regards, > Henson > > P.S. I discovered a crash bug that was introduced in the latest patch > refactoring. The issue occurs with nested alternation patterns like > (A+ | (A | B)+)*, where infinite recursion happens in nfa_advance_alt when > the inner BEGIN(+)'s skip jump is followed as an ALT branch pointer. I will > include the fix in the next patch update. Thanks for fixing the issue. Best regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp