Re: More speedups for tuple deformation

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andres Freund <[email protected]>
To: David Rowley <[email protected]>
Cc: Chao Li <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Subject: Re: More speedups for tuple deformation
Date: Fri, 23 Jan 2026 11:33:26 -0500
Message-ID: <rvlc7pb6zn4kydqovcqh72lf2qfcgs3qkj2seq7tcpvxyqwtqt@nrvv6lpehwwa> (raw)
In-Reply-To: <pmik622adey6fnddivkt4uvkulvnc6rasmq3tcbrzeglx4hsn7@f3x6e2eph3w5>
References: <CAApHDvpoFjaj3+w_jD5uPnGazaw41A71tVJokLDJg2zfcigpMQ@mail.gmail.com>
	<CAApHDvrF6DG7=xD8JGo2HoQKN0LRFNF0ysVt6cKSNPiqbdQOSA@mail.gmail.com>
	<CAApHDvoh3Q413szd-zsUTCpQPWNdpUYvx-fvsB8DP8zOja+ckg@mail.gmail.com>
	<[email protected]>
	<CAApHDvqhbJU_-yF3Hbf4VhX33qXtpeYv3MsvMPDMcDwGGLr9ZQ@mail.gmail.com>
	<rbskhk7scqbxqnaw4o6nh6na2ffcclg3cxn4d4cn5jfr2z7vv3@kadtz65meesb>
	<CAApHDvpDxDFatUskuOfuM7A3VESrx8U7MtYnU_HiB0QLAg94zg@mail.gmail.com>
	<pmik622adey6fnddivkt4uvkulvnc6rasmq3tcbrzeglx4hsn7@f3x6e2eph3w5>

Hi,

On 2026-01-22 20:18:21 -0500, Andres Freund wrote:
> I haven't yet looked at the new version of the patch, but I ran your benchmark
> from upthread (fwiw, I removed the sleep 10 to reduce runtimes, the results
> seem stable enough anyway) on two intel machines, as you mentioned that you
> saw a lot variation in Azure.
>
> For both I disabled turbo boost, cpu idling and pinned the backend to a single
> CPU core.
>
> There's a bit of noise on "awork3" (basically an editor and an idle browser
> window), but everything is pinned to the other socket. "awork4" is entirely
> idle.
>
>
> Looks like overall the results are quite impressive!  Some of the extra_cols=0
> runs saphire rapids are a bit slower, but the losses are much smaller than the
> gains in other cases.
>
>
> I think it'd be good to add a few test cases of "incremental deforming" to the
> benchmark. E.g. a qual that accesses column 10, but projection then deforms up
> to 20.  I'm a bit worried that e.g. the repeated first_null_attr()
> computations could cause regressions.

The overhead of the aggregation etc makes it harder to see efficiency changes
in deformation speed:

I think it'd be worth replacing the SUM(a) with WHERE a < 0 (filtering all
rows), to reduce the cost of the executor dispatch.

Here's a profile of the SUM(a):

-   99.90%     0.00%  postgres         postgres           [.] standard_ExecutorRun
   - standard_ExecutorRun
      - 96.83% ExecAgg
         - 49.86% ExecInterpExpr
            - 28.30% slot_getsomeattrs_int
                 tts_buffer_heap_getsomeattrs
              0.67% tts_buffer_heap_getsomeattrs
            + 0.02% asm_sysvec_apic_timer_interrupt
         - 37.44% fetch_input_tuple
            - 31.42% ExecSeqScan
               + 20.58% heap_getnextslot
                 3.58% MemoryContextReset
                 0.52% heapgettup_pagemode
                 0.32% ExecStoreBufferHeapTuple
              0.99% heap_getnextslot
              0.79% MemoryContextReset
           2.81% int4_sum
           1.39% MemoryContextReset

Which takes ~93ms on average for the first generated bench.sql


-   99.88%     0.00%  postgres  postgres           [.] standard_ExecutorRun
   - standard_ExecutorRun
      - 95.78% ExecSeqScanWithQual
         - 57.65% ExecInterpExpr
            - 29.08% slot_getsomeattrs_int
                 tts_buffer_heap_getsomeattrs
              0.49% tts_buffer_heap_getsomeattrs
         - 25.40% heap_getnextslot
            + 15.00% heapgettup_pagemode
            + 4.71% ExecStoreBufferHeapTuple
              0.05% UnlockBuffer
           1.80% MemoryContextReset
           0.77% int4lt
           0.52% heapgettup_pagemode
           0.47% ExecStoreBufferHeapTuple
           0.37% slot_getsomeattrs_int
        2.11% heap_getnextslot
        1.49% ExecInterpExpr
        0.50% MemoryContextReset

Same data, but with a WHERE a < 0, takes on average ~74m.


I wonder if it's worth writing a C helper to test deformation in a bit more
targeted way.


Looking at the profile of ExecSeqScanWithQual() made me a bit sad, turns out
that some of the generated code isn't great :(. I'll start a separate thread
about that.

Greetings,

Andres Freund

view thread (19+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: More speedups for tuple deformation
  In-Reply-To: <rvlc7pb6zn4kydqovcqh72lf2qfcgs3qkj2seq7tcpvxyqwtqt@nrvv6lpehwwa>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox