Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0RcH-001ua5-34 for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 22:01:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0RcG-00CAVX-16 for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 22:01:20 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0RcG-00CAVO-04 for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 22:01:20 +0000 Received: from mail-dl1-x1236.google.com ([2607:f8b0:4864:20::1236]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0RcE-00000002CgC-0E0R for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 22:01:20 +0000 Received: by mail-dl1-x1236.google.com with SMTP id a92af1059eb24-1274204434bso711503c88.1 for ; Wed, 11 Mar 2026 15:01:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773266476; x=1773871276; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:fcc:content-language :references:user-agent:mime-version:to:from:subject:message-id:date :from:to:cc:subject:date:message-id:reply-to; bh=FzLLTZ0m+zlXH9f4GBGn5+wFGLQaKSlwFW6Z5tzwPw0=; b=R1XzRSPD9TTJQs7eJq1yBfLOMkSbPc56UIImr83iHaL2yHmzHpYoZKLhbPhyp5vbHB SMhMTAzWmk5pgGZOW3I2jJnSdmYuX1u8qx9DEQ5H3+B1lz9wjfgA2hrvxnupi55LKNbr XS36XBsEFnGkOC8gcEiznrR6pviWU9lk+Jr7RT8tdrjwUZNfmiO4hMQJDZNgXVqhZUBJ ZFAFtRdWUWbSCRXM9boyzsWQh9sVp+9V5Qb/LQcKR8LwrDzhk6/TKAigRpNeImstdrl6 /gyXbNRsTuJ5N7dAjzI3bddXbEPx1g+CB6WYIaBdqIFSV8OLSvi9gn5HrYQT0aoP0ei0 oxaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773266476; x=1773871276; h=content-transfer-encoding:in-reply-to:fcc:content-language :references:user-agent:mime-version:to:from:subject:message-id:date :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FzLLTZ0m+zlXH9f4GBGn5+wFGLQaKSlwFW6Z5tzwPw0=; b=LKpqkW+I3/Hkko322O8ixi312FLMMk9j/uLSAT9c+CdncchCWVmYsLnF6RpPzx/WIs u0thwuOX3tlkGbpEpi0gozwunIEgFWbaMfFfWbzt+XjGcfwGx4co/mVuBPbF4SSizM5l LN6lBkLf03GLN8ijjjnF3diTlSaKyNHM7v+SPdQaiIE9icDdvRxd0LxkLrlSaCmyOc+l 8Y1Ke6Z0LBRZbliQDUPIuMUupChWS0xI0VUTpJ02bIupVRrRWwW2SY7xSrEGLgpcY/q4 VOzYC6yoaDbtrMf80lZTg1FG7WfY8SLSz6ZwtQcs2oI6mWxx4ldc4BYUJuZlFb5vtG3k ZY7Q== X-Forwarded-Encrypted: i=1; AJvYcCXG2fEljcZvTRUvt/qvwHAgTRJ3J0kaLMJLlATJizsyenC20y4GIZwcCPdVbipGTYu4PL/5rVjmGN580pzu@lists.postgresql.org X-Gm-Message-State: AOJu0Yzt1Axh5tuJVqu/lvHFi4tpnsjEFpl07g7Xg9UfiUQCqUht8mIZ gAZqU+xbDFYz00svozoc7zfAIAXNsksNFlZCrWys62ror71Xri9nBwEq X-Gm-Gg: ATEYQzzKxCHNeDo05Mou7cZ1DfXcxCIFKUTjhCkl0dx2KdLwTu5n8r3P8gcUaGUwcdY tw4T5nWR6KoqymqfqXCBSX5Z/LKGV2VrxDYMvDZYPU0OvJ7lRqDQt5cksRjxbv6TIGZFcjDCK2Z 1hreHki8FVbC5TdsYHQhWTNrAq541M6ikYxwqkm4MuKMVXjyyq0wGAemWWC3+0kKh3TgpJIp+2W AdEfVDotkH3mbt0XDZJnTLHbJUQB833rhDt0tqWIdHcfVfoYgIZ6wWtP2cbBUmmoklEennnh97e hiWuUjNscXDO63c3BLCHU3cMAVCIg4Mv3z/VtUbDj4W3nnXWC6RBdGG3M1vncfZ6/1uUB0DZjVI SKXdPx5rFEqB6xZQjGqrf6gLFqFPjJmryNKwr4wggPDJQeqa/elDs4zT4Vj82ih4Om8N5LVbuO4 Yq+7+gxIfxAc1baTP7ijMLibgux+KMETEv4D7jIMv3jzs= X-Received: by 2002:a05:7022:792:b0:119:e569:f875 with SMTP id a92af1059eb24-128ecbdf63cmr598323c88.18.1773266475523; Wed, 11 Mar 2026 15:01:15 -0700 (PDT) Received: from localhost ([2804:14d:328a:a59c:39b4:259e:f912:71]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-128e7cb558fsm5571504c88.10.2026.03.11.15.01.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 11 Mar 2026 15:01:15 -0700 (PDT) Content-Type: text/plain; charset=UTF-8 Date: Wed, 11 Mar 2026 19:01:12 -0300 Message-Id: Subject: Re: [PATCH] llvmjit: always add the simplifycfg pass From: "Matheus Alcantara" To: "Pierre Ducroquet" , "Andres Freund" , "pgsql-hackers@lists.postgresql.org" MIME-Version: 1.0 User-Agent: Mozilla Thunderbird References: Content-Language: en-US X-Mozilla-Draft-Info: internal/draft; vcard=0; receipt=0; DSN=0; uuencode=0; attachmentreminder=0; deliveryformat=1 X-Identity-Key: id3 Fcc: imap://matheusssilv97%40gmail.com@imap.gmail.com/[Gmail]/Sent Mail In-Reply-To: Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 30/01/26 12:01, Pierre Ducroquet wrote: > Le jeudi 29 janvier 2026 à 12:19 AM, Andres Freund a écrit : > >> Hi, >> >> On 2026-01-28 07:56:46 +0000, Pierre Ducroquet wrote: >> >>> Here is a rebased version of the patch with a rewrite of the comment. Thank >>> you again for your previous review. FYI, I've tried adding other passes but >>> none had a similar benefits over cost ratio. The benefits could rather be in >>> changing from O3 to an extensive list of passes. >> >> >> I agree that we should have a better list of passes. I'm a bit worried that >> having an explicit list of passes that we manage ourselves is going to be >> somewhat of a pain to maintain across llvm versions, but ... >> >> WRT passes that might be worth having even with -O0 - running duplicate >> function merging early on could be quite useful, particularly because we won't >> inline the deform routines anyway. >> >>>> I did some benchmarks on some TPCH queries (1 and 4) and I got these >>>> results. Note that for these tests I set jit_optimize_above_cost=1000000 >>>> so that it force to use the default pass with simplifycfg. >> >> ... >> >> These queries are all simple enough that I'm not sure this is a particularly >> good benchmark for optimization speed. In particular, the deform routines >> don't have to deal with a lot of columns and there aren't a lot of functions >> (although I guess that shouldn't really matter WRT simplifycfg). >> > > simplifycfg seems to do more things on the deforming functions than I anticipated initially, explaining the performance benefits. I've written patches to our C code to generate better IR, but I discovered quite a puzzle. > The biggest gain I see on the generated amd64 code for a very simple query (SELECT * FROM demo WHERE a = 42) with simplifycfg is that it prevents spilling on the stack and it does what mem2reg was supposed to be doing. > > > Running opt -debug-pass-manager on a deform function, I get: > - with default,mem2reg > > Running pass: AnnotationRemarksPass on deform_0_1 (56 instructions) > Running analysis: TargetLibraryAnalysis on deform_0_1 > Running pass: PromotePass on deform_0_1 (56 instructions) > Running analysis: DominatorTreeAnalysis on deform_0_1 > Running analysis: AssumptionAnalysis on deform_0_1 > Running analysis: TargetIRAnalysis on deform_0_1 > > deform_0_1: # @deform_0_1 > .cfi_startproc > # %bb.0: # %entry > movq 24(%rdi), %rax > movq %rax, -48(%rsp) # 8-byte Spill > movq 32(%rdi), %rax > movq %rax, -40(%rsp) # 8-byte Spill > movq %rdi, %rax > addq $4, %rax > movq %rax, -32(%rsp) # 8-byte Spill > movq %rdi, %rax > addq $6, %rax > movq %rax, -24(%rsp) # 8-byte Spill > movq %rdi, %rax > addq $72, %rax > movq %rax, -16(%rsp) # 8-byte Spill > ... > > > > - with default,simplifycfg > > Running pass: AnnotationRemarksPass on deform_0_1 (56 instructions) > Running analysis: TargetLibraryAnalysis on deform_0_1 > Running pass: SimplifyCFGPass on deform_0_1 (56 instructions) > Running analysis: TargetIRAnalysis on deform_0_1 > Running analysis: AssumptionAnalysis on deform_0_1 > > deform_0_1: # @deform_0_1 > .cfi_startproc > # %bb.0: # %entry > movq 24(%rdi), %rax > movq 32(%rdi), %rsi > movq 64(%rdi), %rcx > movq 16(%rcx), %rcx > movzbl 22(%rcx), %edx > movslq %edx, %rdx > addq %rdx, %rcx > movl 72(%rdi), %edx > ... > > - with default,simplifycfg,mem2reg > > Running pass: SimplifyCFGPass on deform_0_1 (56 instructions) > Running analysis: TargetIRAnalysis on deform_0_1 > Running analysis: AssumptionAnalysis on deform_0_1 > Running pass: PromotePass on deform_0_1 (46 instructions) > Running analysis: DominatorTreeAnalysis on deform_0_1 > > deform_0_1: # @deform_0_1 > .cfi_startproc > # %bb.0: # %entry > movq 24(%rdi), %rax > movq 32(%rdi), %rsi > movq 64(%rdi), %rcx > movq 16(%rcx), %rcx > movzbl 22(%rcx), %edx > movb $0, (%rsi) > ... > > > So even when running only simplifycfg, the stack allocation goes away. > I am trying to figure that one out, but I suspect we are no longer doing the optimizations we thought we were doing with mem2reg only, hence the (surprising) speed gains with simplifycfg. > I did some tests to compare the IR output with different pass combinations. Using a query that deforms 6 columns, the raw IR generates trivial empty blocks like this: block.attr.0.attcheckalign: ; preds = %block.attr.0.start br label %block.attr.0.align block.attr.0.align: ; preds = %block.attr.0.attcheckalign br label %block.attr.0.store block.attr.0.store: ; preds = %block.attr.0.align %26 = load i64, ptr %v_offp, align 8 %27 = getelementptr i8, ptr %v_tupdata_base, i64 %26 ... With mem2reg only, the alloca is promoted but these empty blocks remain: block.attr.0.attcheckalign: ; preds = %block.attr.0.start br label %block.attr.0.align block.attr.0.align: ; preds = %block.attr.0.attcheckalign br label %block.attr.0.store block.attr.0.store: ; preds = %block.attr.0.align %25 = getelementptr i8, ptr %v_tupdata_base, i64 0 ... With simplifycfg only, trivial blocks are merged but alloca is not promoted: block.attr.0.start: ; preds = %block.attr.0.attcheckattno %21 = getelementptr i8, ptr %8, i32 0 %attnullbyte = load i8, ptr %21, align 1 %22 = and i8 %attnullbyte, 1 %attisnull = icmp eq i8 %22, 0 %23 = and i1 %hasnulls, %attisnull br i1 %23, label %block.attr.0.attisnull, label %block.attr.0.store block.attr.0.store: ; preds = %block.attr.0.start %26 = load i64, ptr %v_offp, align 8 %27 = getelementptr i8, ptr %v_tupdata_base, i64 %26 ... After mem2reg,simplifycfg the trivial blocks are merged and block.attr.0.start branches directly to block.attr.0.store: block.attr.0.start: ; preds = %block.attr.0.attcheckattno %20 = getelementptr i8, ptr %8, i32 0 %attnullbyte = load i8, ptr %20, align 1 %21 = and i8 %attnullbyte, 1 %attisnull = icmp eq i8 %21, 0 %22 = and i1 %hasnulls, %attisnull br i1 %22, label %block.attr.0.attisnull, label %block.attr.0.store block.attr.0.store: ; preds = %block.attr.0.start %25 = getelementptr i8, ptr %v_tupdata_base, i64 0 ... As the simplifycfg[1] may remove basic blocks and eliminate PHI nodes, perhaps this enables more registers to be used and avoid stack allocations? It seems to me that the stack allocation going away on your example may be a side-effect of the simpler CFG allowing better register allocation. However, I think that mem2reg is still needed since simplifycfg alone doesn't promote allocas, the two passes complement each other. What do you think? [1] https://llvm.org/docs/Passes.html#simplifycfg-simplify-the-cfg -- Matheus Alcantara EDB: https://www.enterprisedb.com