public inbox for [email protected]
help / color / mirror / Atom feedFrom: Chao Li <[email protected]>
To: John Naylor <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Subject: Re: tuple radix sort
Date: Mon, 8 Dec 2025 10:52:02 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com>
<CANWCAZbsx5VY-F5RVvZ5H1abBWcinMY90gNra6EbdGS1CTx=7g@mail.gmail.com>
<CANWCAZaN8kW6o7Ymc_jzPO_Z5SqekQTTNra3xTGYTH+cjrVp8g@mail.gmail.com>
<[email protected]>
<CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>
<[email protected]>
<CANWCAZaKRzsYPn9YWRR8g5dkX-ep_jgyUb4r3KFekK5ZTNo7ew@mail.gmail.com>
<[email protected]>
<CANWCAZaBeDPfF0sCdSssRXv6nBZeRP8Z=OO1731zsoUC58U5_Q@mail.gmail.com>
<[email protected]>
Hi John,
I played the radix sort again during the weekend.
First, I changed my direction and implemented the in-place switching in the other way, where I did a way like chained-switching. Say starting from item0, for example, switching item0 to item5, then check where item5 should be switched to, and makes the switch, till an item is switch to position 0. See my implementation in other-implemation.diff if you are interested in it. This time, I eyeball checked the sort result and confirmed the correction. But my implementation is slightly slower than your implementation, based on the same test procedure I described in my previous email, my implementation is roughly ~3% slower your implementation. So I think that at least proves your current implementation in v5 has been perfectly fine tuned.
Then I went back to read your implementation again, I found a tiny optimization, where we can move “count sorted partitions” to before the “for” loop, which avoid sorted partition to go through the “for” loop, and my test shows that the movement may lead to ~1% improvement. See the change in radixsort_tiny_optimizeation.diff.
I also noticed that, there could be cases where target element is already in the right partition, so that switching could be unnecessary. However if we want to avoid such unnecessary switches, then we will need to add a “if” check. Given the total number of such cases is tiny, the “if” check would be more expensive than performing blindly switching. I tried to add such a check like:
```
if (offset == (size_t) (st - begin))
continue; /* already in correct position */
```
With my test, it just makes the query ~3% slower.
So, now I think I can wrap up this round of playing. My only suggestion is radixsort_tiny_optimizeation.diff. I may revisit this patch again once you make the entire patch set ready for review.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] other-implemation.diff (4.7K, 2-other-implemation.diff)
download | inline diff:
commit ee29bccc4540f61dca283e4404ff140705628552
Author: Chao Li (Evan) <[email protected]>
Date: Sat Dec 6 07:03:14 2025 +0800
the other implmentation
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..5e3e49ba22e 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -632,6 +632,7 @@ typedef struct RadixPartitionInfo
size_t count;
size_t offset;
};
+ size_t begin_offset;
size_t next_offset;
} RadixPartitionInfo;
@@ -2763,13 +2764,14 @@ static void
radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
{
RadixPartitionInfo partitions[256] = {0};
- uint8_t remaining_partitions[256] = {0};
+ uint8_t remaining_partitions[256];
size_t total = 0;
int num_partitions = 0;
- int num_remaining;
+ //int num_remaining;
SortSupport ssup = &state->base.sortKeys[0];
- size_t start_offset = 0;
- SortTuple *partition_begin = begin;
+ SortTuple pending, tmp;
+ //size_t start_offset = 0;
+ //SortTuple *partition_begin = begin;
/* count number of occurrences of each byte */
for (SortTuple *tup = begin; tup < begin + n_elems; tup++)
@@ -2796,6 +2798,7 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
if (count != 0)
{
partitions[i].offset = total;
+ partitions[i].begin_offset = total;
total += count;
remaining_partitions[num_partitions] = i;
num_partitions++;
@@ -2803,7 +2806,7 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
partitions[i].next_offset = total;
}
- num_remaining = num_partitions;
+ //num_remaining = num_partitions;
/*
* Swap tuples to correct partition.
@@ -2825,45 +2828,75 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
* previous loop iteration results in only one partition that hasn't been
* counted as sorted, we know it's actually sorted and can exit the loop.
*/
- while (num_remaining > 1)
+ // while (num_remaining > 1)
+ // {
+ // /* start the count over */
+ // num_remaining = num_partitions;
+
+ // for (int i = 0; i < num_partitions; i++)
+ // {
+ // uint8 idx = remaining_partitions[i];
+
+ // for (SortTuple *st = begin + partitions[idx].offset;
+ // st < begin + partitions[idx].next_offset;
+ // st++)
+ // {
+ // size_t offset = partitions[st->current_byte].offset++;
+ // SortTuple tmp;
+
+ // /* swap current tuple with destination position */
+ // Assert(offset < n_elems);
+ // tmp = *st;
+ // *st = begin[offset];
+ // begin[offset] = tmp;
+
+ // CHECK_FOR_INTERRUPTS();
+ // };
+
+ // /* count sorted partitions */
+ // if (partitions[idx].offset == partitions[idx].next_offset)
+ // num_remaining--;
+ // }
+ // }
+ if (num_partitions > 1)
{
- /* start the count over */
- num_remaining = num_partitions;
-
- for (int i = 0; i < num_partitions; i++)
+ for (int i = 0; i < n_elems; i++)
{
- uint8 idx = remaining_partitions[i];
+ SortTuple *st = begin + i;
+
+ if (i >= partitions[st->current_byte].begin_offset &&
+ i < partitions[st->current_byte].next_offset)
+ continue;
- for (SortTuple *st = begin + partitions[idx].offset;
- st < begin + partitions[idx].next_offset;
- st++)
+ while (true)
{
- size_t offset = partitions[st->current_byte].offset++;
- SortTuple tmp;
+ size_t offset = partitions[st->current_byte].offset;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* target is the empty position, we are done */
+ if (offset == i)
+ {
+ begin[i] = pending;
+ break;
+ }
- /* swap current tuple with destination position */
Assert(offset < n_elems);
tmp = *st;
- *st = begin[offset];
+ pending = begin[offset];
begin[offset] = tmp;
-
- CHECK_FOR_INTERRUPTS();
+ st = &pending;
+ partitions[begin[offset].current_byte].offset ++;
};
-
- /* count sorted partitions */
- if (partitions[idx].offset == partitions[idx].next_offset)
- num_remaining--;
}
}
/* recurse */
- for (uint8_t *rp = remaining_partitions;
- rp < remaining_partitions + num_partitions;
- rp++)
+ for (int i = 0; i < num_partitions; i++)
{
- size_t end_offset = partitions[*rp].next_offset;
- SortTuple *partition_end = begin + end_offset;
- ptrdiff_t num_elements = end_offset - start_offset;
+ uint8_t rp = remaining_partitions[i];
+ SortTuple *partition_begin = begin + partitions[rp].begin_offset;
+ size_t num_elements = partitions[rp].next_offset - partitions[rp].begin_offset;
if (num_elements > 1)
{
@@ -2897,9 +2930,6 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
state);
}
}
-
- start_offset = end_offset;
- partition_begin = partition_end;
}
}
[application/octet-stream] radixsort_tiny_optimizeation.diff (1.3K, 3-radixsort_tiny_optimizeation.diff)
download | inline diff:
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 028c5b71c27..5185d8403d5 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2834,6 +2834,13 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
{
uint8 idx = remaining_partitions[i];
+ /* count sorted partitions */
+ if (partitions[idx].offset == partitions[idx].next_offset)
+ {
+ num_remaining--;
+ continue;
+ }
+
for (SortTuple *st = begin + partitions[idx].offset;
st < begin + partitions[idx].next_offset;
st++)
@@ -2841,6 +2848,9 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
size_t offset = partitions[st->current_byte].offset++;
SortTuple tmp;
+ //if (offset == (size_t) (st - begin))
+ // continue; /* already in correct position */
+
/* swap current tuple with destination position */
Assert(offset < n_elems);
tmp = *st;
@@ -2849,10 +2859,6 @@ radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *st
CHECK_FOR_INTERRUPTS();
};
-
- /* count sorted partitions */
- if (partitions[idx].offset == partitions[idx].next_offset)
- num_remaining--;
}
}
view thread (39+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: tuple radix sort
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox