public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: John Naylor <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Subject: Re: tuple radix sort
Date: Thu, 4 Dec 2025 16:40:18 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANWCAZaBeDPfF0sCdSssRXv6nBZeRP8Z=OO1731zsoUC58U5_Q@mail.gmail.com>
References: <CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com>
	<CANWCAZbsx5VY-F5RVvZ5H1abBWcinMY90gNra6EbdGS1CTx=7g@mail.gmail.com>
	<CANWCAZaN8kW6o7Ymc_jzPO_Z5SqekQTTNra3xTGYTH+cjrVp8g@mail.gmail.com>
	<[email protected]>
	<CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>
	<[email protected]>
	<CANWCAZaKRzsYPn9YWRR8g5dkX-ep_jgyUb4r3KFekK5ZTNo7ew@mail.gmail.com>
	<[email protected]>
	<CANWCAZaBeDPfF0sCdSssRXv6nBZeRP8Z=OO1731zsoUC58U5_Q@mail.gmail.com>



> On Dec 4, 2025, at 13:30, John Naylor <[email protected]> wrote:
> 
> On Wed, Dec 3, 2025 at 3:22 PM Chao Li <[email protected]> wrote:
>> I played with this again today and found an optimization that seems to dramatically improve the performance:
>> 
>> ```
>> +static void
>> +radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
>> +{
>> +       RadixPartitionInfo partitions[256] = {0};
>> +       uint8_t         remaining_partitions[256] = {0};
>> ```
>> 
>> Here partitions and remaining_partitions are just temporary buffers, allocating memory from stack and initialize them seems slow. By passing them as function parameters are much faster. See attached diff for my change.
> 
> The lesson here is: you can make it as fast as you like if you
> accidentally blow away the state that we needed for this to work
> correctly.
> 

Yeah, I quickly realized I was wrong after I clicked “send". I was trying the firs two optimizations as I suggested in my previous email, but the first didn’t help much, and the second just didn’t work. After several hours debugging, I guess my brain got stuck and came out the weird idea.

I continued playing with this again today. I think the execution time is mainly spent on the in-place element switching, which uses 3 levels of loops (while->for->for). If we can use an extra temp array to hold the sorted result, then the 3-level loop can be optimized to 1-level, but that will cost a lot of extra memory which I am afraid not affordable.

Anyway, it’s a fun of playing with this optimization thing. I may play with it again once I get some time.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









view thread (39+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: tuple radix sort
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox