public inbox for [email protected]
help / color / mirror / Atom feedFrom: Andres Freund <[email protected]>
To: David Rowley <[email protected]>
Cc: John Naylor <[email protected]>
Cc: Chao Li <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Subject: Re: More speedups for tuple deformation
Date: Wed, 25 Feb 2026 15:29:01 -0500
Message-ID: <uhqul2ryci4tyg5ylddjrmf4kybzwb7m5z7rmurhhjp37vrn5f@zgxil7egr62n> (raw)
In-Reply-To: <mq6ddpgctt42srolsvo5kph2s6shfg62meb7i5fbg6n3s73zju@2n7gviiyga3h>
References: <CAApHDvpbntG7V3_EsZ+w-V=jU-y8rFmv9RB1EDJm4sxKno-4UA@mail.gmail.com>
<e7sto7tk5dk5hfyvoocaddnxcngemcmfvbuh23l32w5cssaizy@znuphjqug7qe>
<CAApHDvpuEbhvH1ViCZRz5vks+_bGbEnPoEdZYAZXK76_isb_+Q@mail.gmail.com>
<v6z545yozjtywghn5glujemu72z4i4ynadsc2xks4ejotdg7yl@4rry7ixwr4us>
<CANWCAZabO1oj+khF+YNVpmkTQwRRyNJesbsBhRFL5emZJh3tow@mail.gmail.com>
<lzgoxzbh2gel5w362revuwaecrsbjr44kjdzrewuejugcodkeq@ixymojwnylsy>
<CAApHDvodSVBj3ypOYbYUCJX+NWL=VZs63RNBQ_FxB_F+6QXF-A@mail.gmail.com>
<rbxc2qqhsvzxpukgd36caoa4ydgn5r22fxktxanrkn6nobg7j6@27b4vogohgu2>
<CAApHDvpWQn8sXDYpSNNpieJW-UTG4Nf4TVjT8ew64L073hz-Fw@mail.gmail.com>
<mq6ddpgctt42srolsvo5kph2s6shfg62meb7i5fbg6n3s73zju@2n7gviiyga3h>
Hi,
On 2026-02-25 13:05:14 -0500, Andres Freund wrote:
> At least gcc is doing some truly weird shit in the
> firstNonGuaranteed/firstNonCachedOffsetAttr loop "header" (i.e. just before
> the first entrance to the loop) , which leads to the register pressure being
> high, which leads to spilling on the stack, making the few-tuples case slower:
>
> [ lots of stuff trimmed ]
>
> I.e. the compiler creates an offset version of tts_values[tts_nvalid],
> tts_isnull[tts_nvalid], which then creates register allocation pressure,
> because later the original tts_values/tts_isnulll etc are accessed again and
> thus the underlying registers are preserved. And this is all for zero gain,
> from what I can tell, because the acceses are still done with indexed
> addressing (like mov %rdi,(%r12,%rcx,8)), which would work just as
> well if rcx were indexed based on attnum, not zero indexed within the loop.
>
> I see about a 10% improvement if I dissuade the compiler from doing that by
> adding
> __asm__ volatile ("" : "+r"(attnum) : :);
>
> In the loop body.
>
>
> I'm getting to the point where I'd like to just hand write the assembler for
> this stupid function. Gah.
Huh. It, at least partially, seems to be related to using an integer for
attnum et al. Due to us using -fwrapv, the compiler can't actually assume that
an attnum++ won't overflow. An overflow would make the loop trip counts a lot
more complicated. Even with that I don't understand how it ends up
generating such crappy code, but since using size_t fixes it...
Greetings,
Andres Freund
view thread (30+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: More speedups for tuple deformation
In-Reply-To: <uhqul2ryci4tyg5ylddjrmf4kybzwb7m5z7rmurhhjp37vrn5f@zgxil7egr62n>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox