public inbox for [email protected]
help / color / mirror / Atom feedFrom: Chao Li <[email protected]>
To: John Naylor <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Subject: Re: tuple radix sort
Date: Thu, 27 Nov 2025 15:48:58 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>
References: <CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com>
<CANWCAZbsx5VY-F5RVvZ5H1abBWcinMY90gNra6EbdGS1CTx=7g@mail.gmail.com>
<CANWCAZaN8kW6o7Ymc_jzPO_Z5SqekQTTNra3xTGYTH+cjrVp8g@mail.gmail.com>
<[email protected]>
<CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>
Hi John,
I did an initial test before, but I didn’t read the code at the time. Today, I spent time reviewing 0001. Overall, I believe the implementation is solid. I just got a few comments/suggestions.
> On Nov 26, 2025, at 21:11, John Naylor <[email protected]> wrote:
>
>
> For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments.
>
>
> --
> John Naylor
> Amazon Web Services
> <v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch>
1 - 0001
```
+ /* extract the byte for this level from the normalized datum */
+ current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+ level);
+
+ /* save it for the permutation step */
+ tup->current_byte = current_byte;
```
We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, why don’t cache the result in SortTuple. As we have cached current_byte in SortTuple, it shouldn’t be a big deal to add one more field to it.
2 - 0001
```
+ while (num_remaining > 1)
+ {
+ /* start the count over */
+ num_remaining = num_partitions;
```
The swap loop always start the count over, so that sorted partitions are re-scanned as well. I think we can do an optimization like:
```
num_active = num_partitions;
while (num_active > 1)
{
for (int i = 0; i < num_active; )
{
uint8 idx = remaining_partitions[i];
// do the swaps for the partition …
if (partitions[idx].offset == partitions[idx].next_offset)
{
remaining_partitions[i] = remaining_partitions[num_active - 1];
num_active--;
}
else
i++;
}
}
```
This way we move out sorted partitions, so they will not be re-scanned.
3 - 0001
In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL cases, so that they won’t need to run the branchless cyclic permutation and the two “for” loops of assertions. Something like:
```
while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
d1++;
null_count = d1;
not_null_count = state->memtupcount - d1;
/* fast paths: all on one side */
if (null_count == 0 || not_null_count == 0)
{
if (nulls_first)
{
null_start = data;
not_null_start = data + null_count;
}
else
{
not_null_start = data;
null_start = data + not_null_count;
}
/* only one partition is non-empty; sort it and return */
if (not_null_count > 1)
{
if (not_null_count < QSORT_THRESHOLD)
qsort_tuple(not_null_start, not_null_count, state->base.comparetup, state);
else
radix_sort_tuple(not_null_start, not_null_count, 0, state);
}
else if (null_count > 1 && state->base.onlyKey == NULL)
{
qsort_tuple(null_start, null_count, state->base.comparetup_tiebreak, state);
}
return;
}
/* ... existing branchless cyclic permutation ... */
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
view thread (39+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: tuple radix sort
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox