public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Geier <[email protected]>
To: John Naylor <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Reduce build times of pg_trgm GIN indexes
Date: Tue, 14 Apr 2026 15:05:51 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAEze2WiUL9idZBbuUN+MuWqr6DcPr_-C91E9MTx=H62Xx5fHaQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CANWCAZa-AUG=aK7UGGa+NY0sZ2o09Ko=z_N-irkC7Ura_c3uJg@mail.gmail.com>
	<[email protected]>

On 13.04.2026 17:05, David Geier wrote:
> On 08.04.2026 04:15, John Naylor wrote:
>> On Tue, Apr 7, 2026 at 6:27 PM Heikki Linnakangas <[email protected]> wrote:
>>> But the comments on the pg_cmp functions say:
>>>
>>>>  * NB: If the comparator function is inlined, some compilers may produce
>>>>  * worse code with these helper functions than with code with the
>>>>  * following form:
>>>>  *
>>>>  *     if (a < b)
>>>>  *         return -1;
>>>>  *     if (a > b)
>>>>  *         return 1;
>>>>  *     return 0;
>>>>  *
>>>
>>> So, uh, is that really a universal improvement? Is that comment about
>>> producing worse code outdated?
> 
> Well spotted. Thanks!
> 
>>
>> No, it's quite recent:
>>
>> https://www.postgresql.org/message-id/20240212230423.GA3519%40nathanxps13

FWICS, this would only matter if btint4cmp() would get inlined
somewhere, where the compiler could actually make use of understanding
that parts of the if-cascade are not needed. Andres' example was

return DO_COMPARE(a, b) < 0 ?
	(DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
	: (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a : c));

In the case of btint4cmp(), it's only ever invoked from the function
manager, where it cannot be inlined.

Or are there ways to invoke btint4cmp() that can be inlined, which I'm
unaware of?

> In my original benchmarks it was faster. I'll rebase the remaining
> commits and do some more analysis.

Here is the disassembly and the perf top output of master vs patched. I
compiled with GCC 15.2.0.

The unpatched version of btint4cmp() contains a conditional jump, which
is mispredicted frequently in the sort. The patched version is
completely branchless.

master
======

Dump of assembler code for function btint4cmp:
   0x00005aa9e33ccdb0 <+0>:	endbr64
   0x00005aa9e33ccdb4 <+4>:	mov    0x20(%rdi),%edx
   0x00005aa9e33ccdb7 <+7>:	mov    $0x1,%eax
   0x00005aa9e33ccdbc <+12>:	cmp    %edx,0x30(%rdi)
   0x00005aa9e33ccdbf <+15>:	jl     0x5aa9e33ccdca <btint4cmp+26>
   0x00005aa9e33ccdc1 <+17>:	setne  %al
   0x00005aa9e33ccdc4 <+20>:	movzbl %al,%eax
   0x00005aa9e33ccdc7 <+23>:	neg    %rax
   0x00005aa9e33ccdca <+26>:	ret

  37.22%  pg_trgm.so  [.] trigram_qsort_signed.constprop.0
   7.99%  postgres    [.] cmpEntryAccumulator
   6.60%  postgres    [.] ginCombineData
   6.03%  postgres    [.] FunctionCall2Coll
   3.19%  postgres    [.] btint4cmp
   2.30%  postgres    [.] rbt_insert
   2.29%  pg_trgm.so  [.] generate_trgm
   2.24%  postgres    [.] pg_mblen_range
   1.77%  libc.so.6   [.] __towlower_l
   1.73%  pg_trgm.so  [.] trigram_qsort_signed_med3
   1.56%  postgres    [.] pg_utf2wchar_with_len

Patched
=======

Dump of assembler code for function btint4cmp:
   0x000055a69e87bdb0 <+0>:	endbr64
   0x000055a69e87bdb4 <+4>:	mov    0x20(%rdi),%eax
   0x000055a69e87bdb7 <+7>:	cmp    %eax,0x30(%rdi)
   0x000055a69e87bdba <+10>:	setl   %al
   0x000055a69e87bdbd <+13>:	setg   %dl
   0x000055a69e87bdc0 <+16>:	movzbl %dl,%edx
   0x000055a69e87bdc3 <+19>:	movzbl %al,%eax
   0x000055a69e87bdc6 <+22>:	sub    %edx,%eax
   0x000055a69e87bdc8 <+24>:	cltq
   0x000055a69e87bdca <+26>:	ret

  38.07%  pg_trgm.so        [.] trigram_qsort_signed.constprop.0
   7.69%  postgres          [.] cmpEntryAccumulator
   6.96%  postgres          [.] ginCombineData
   3.90%  postgres          [.] FunctionCall2Coll
   2.54%  postgres          [.] pg_mblen_range
   2.40%  postgres          [.] btint4cmp
   2.38%  pg_trgm.so        [.] generate_trgm
   1.86%  postgres          [.] rbt_insert
   1.80%  libc.so.6         [.] __towlower_l
   1.73%  pg_trgm.so        [.] trigram_qsort_signed_med3
   1.66%  postgres          [.] pg_utf2wchar_with_len

--
David Geier





view thread (31+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Reduce build times of pg_trgm GIN indexes
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox