public inbox for [email protected]  
help / color / mirror / Atom feed
Re: tuple radix sort
15+ messages / 4 participants
[nested] [flat]

* Re: tuple radix sort
@ 2026-01-21 06:40  John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-01-21 06:40 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Chengpeng Yan <[email protected]>; [email protected] <[email protected]>; Peter Geoghegan <[email protected]>

Attached is v6, which seems pretty close to committable. The temporary
GUC is gone, as are all the integer-like qsort specializations. There
are a few cosmetic rearrangements, but the algorithm is pretty much
the same as v5. The only visible behavior difference should be the
addition of a presorted check just like qsort has.

* The only things I have doubts about are the user-visible messages
mentioning "qsort". For trace_sort, it seemed logical to re-use the
term "internal sort" from other such messages, so I've done that for
v6. EXPLAIN ANALYZE is a bit harder: Do we want to even change
anything here? When things like this come up, there is the question of
tools that parse the output. It doesn't seem particularly important to
try to display exact info here, especially since radix sort must
divert to qsort for multiple keys etc.

Other updates:

1) I wanted a better test for small input, in case the threshold
needed adjusting. I lost the test during a fumbled rebase, but it's
easy to describe: instead of calling tuplesort_sort_memtuples(), call
both radix sort and qsort on copies of the input, with RDTSC calls
around each. I've attached the results, which are average ticks per
element sorted, averaging a million runs. The threshold seems to be
fine at 40, even down to a cardinality of 6. It's important to be as
low as we can get away with, since "qsort" here is qsort_tuple, not
the former specialized variants for integer comparators.

2) We don't need to try harder in the NULL partitioning step

For example, the case below with all NULL in the first key. The
underlying behavioral difference is, with qsort the comparator tries
the NULL first key (which ties every time), and then tries the second
key. With radix sort, the first key NULLs are partitioned first, doing
no useful work since they're all NULL, then qsort is called with the
tiebreak comparator on the whole input.

By this test, there is only noise difference with low cardinality
(before I removed the GUC):

drop table if exists test;
create unlogged table test (a bigint, b bigint not null);
insert into test select
  NULL,
  (1_000_000_000 * random())::bigint % 8
from generate_series(1,1_000_000,1) i;
vacuum freeze test;
select pg_prewarm('test');

set work_mem = '500MB';
\timing

set wip_radix_sort = 'off'; select * from test order by a, b offset
1_000_000_000;
280ms

set wip_radix_sort = 'on'; select * from test order by a, b offset
1_000_000_000;
277ms

Speaking of the NULL partitioning step, I've tried to add some
comments that make sense, but the pictures in the linked blog are more
helpful than any words I can come up with. It's easier to understand
than to describe.


--
John Naylor
Amazon Web Services

- num elem is actual size of input array sorted

- measurements are in RDTSC ticks per element sorted

*** means smallest input that radix sort measured equal or faster

select i % 2 as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 26.6 radix: 49.3
NOTICE:  num elem:  40   qsort: 29.4 radix: 43.2
NOTICE:  num elem:  50   qsort: 28.3 radix: 37.7
NOTICE:  num elem:  60   qsort: 27.6 radix: 35.1
NOTICE:  num elem:  70   qsort: 27.0 radix: 32.9
NOTICE:  num elem:  80   qsort: 30.0 radix: 30.5 ***
NOTICE:  num elem:  90   qsort: 29.7 radix: 29.9

select i % 4 as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 38.1 radix: 54.4
NOTICE:  num elem:  40   qsort: 36.5 radix: 46.3
NOTICE:  num elem:  50   qsort: 37.0 radix: 40.4
NOTICE:  num elem:  60   qsort: 35.6 radix: 37.9
NOTICE:  num elem:  70   qsort: 35.4 radix: 34.8 ***
NOTICE:  num elem:  80   qsort: 36.8 radix: 32.4
NOTICE:  num elem:  90   qsort: 38.3 radix: 30.4

select i % 6 as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 48.6 radix: 54.6
NOTICE:  num elem:  40   qsort: 50.6 radix: 46.8 ***
NOTICE:  num elem:  50   qsort: 50.2 radix: 41.9
NOTICE:  num elem:  60   qsort: 43.6 radix: 37.2
NOTICE:  num elem:  70   qsort: 48.6 radix: 36.4
NOTICE:  num elem:  80   qsort: 43.2 radix: 34.3
NOTICE:  num elem:  90   qsort: 45.4 radix: 32.2

select i % 10 as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 66.6 radix: 60.6 ***
NOTICE:  num elem:  40   qsort: 66.9 radix: 52.7
NOTICE:  num elem:  50   qsort: 60.4 radix: 45.9
NOTICE:  num elem:  60   qsort: 58.4 radix: 41.8
NOTICE:  num elem:  70   qsort: 63.9 radix: 37.8
NOTICE:  num elem:  80   qsort: 56.8 radix: 37.1
NOTICE:  num elem:  90   qsort: 58.7 radix: 33.5

select (250*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 77.9 radix: 79.7
NOTICE:  num elem:  40   qsort: 74.2 radix: 72.2 ***
NOTICE:  num elem:  50   qsort: 88.0 radix: 64.6
NOTICE:  num elem:  60   qsort: 90.7 radix: 64.6
NOTICE:  num elem:  70   qsort: 97.2 radix: 61.2
NOTICE:  num elem:  80   qsort: 96.8 radix: 60.3
NOTICE:  num elem:  90   qsort: 94.6 radix: 63.6

select (64000*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 82.6 radix: 83.9
NOTICE:  num elem:  40   qsort: 91.2 radix: 76.9 ***
NOTICE:  num elem:  50   qsort: 86.7 radix: 70.9
NOTICE:  num elem:  60   qsort: 91.4 radix: 75.0
NOTICE:  num elem:  70   qsort: 91.1 radix: 74.5
NOTICE:  num elem:  80   qsort: 96.8 radix: 63.6
NOTICE:  num elem:  90   qsort: 107.9 radix: 61.3

select (1_000_000_000*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 76.0 radix: 84.3
NOTICE:  num elem:  40   qsort: 78.7 radix: 74.8 ***
NOTICE:  num elem:  50   qsort: 91.7 radix: 74.6
NOTICE:  num elem:  60   qsort: 92.7 radix: 70.3
NOTICE:  num elem:  70   qsort: 98.2 radix: 68.1
NOTICE:  num elem:  80   qsort: 104.8 radix: 70.5
NOTICE:  num elem:  90   qsort: 111.0 radix: 70.6



Attachments:

  [text/plain] 20260120-radix-sort-threshold-test.txt (3.2K, 2-20260120-radix-sort-threshold-test.txt)
  download | inline:
- num elem is actual size of input array sorted

- measurements are in RDTSC ticks per element sorted

*** means smallest input that radix sort measured equal or faster

select i % 2 as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 26.6 radix: 49.3
NOTICE:  num elem:  40   qsort: 29.4 radix: 43.2
NOTICE:  num elem:  50   qsort: 28.3 radix: 37.7
NOTICE:  num elem:  60   qsort: 27.6 radix: 35.1
NOTICE:  num elem:  70   qsort: 27.0 radix: 32.9
NOTICE:  num elem:  80   qsort: 30.0 radix: 30.5 ***
NOTICE:  num elem:  90   qsort: 29.7 radix: 29.9

select i % 4 as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 38.1 radix: 54.4
NOTICE:  num elem:  40   qsort: 36.5 radix: 46.3
NOTICE:  num elem:  50   qsort: 37.0 radix: 40.4
NOTICE:  num elem:  60   qsort: 35.6 radix: 37.9
NOTICE:  num elem:  70   qsort: 35.4 radix: 34.8 ***
NOTICE:  num elem:  80   qsort: 36.8 radix: 32.4
NOTICE:  num elem:  90   qsort: 38.3 radix: 30.4

select i % 6 as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 48.6 radix: 54.6
NOTICE:  num elem:  40   qsort: 50.6 radix: 46.8 ***
NOTICE:  num elem:  50   qsort: 50.2 radix: 41.9
NOTICE:  num elem:  60   qsort: 43.6 radix: 37.2
NOTICE:  num elem:  70   qsort: 48.6 radix: 36.4
NOTICE:  num elem:  80   qsort: 43.2 radix: 34.3
NOTICE:  num elem:  90   qsort: 45.4 radix: 32.2

select i % 10 as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 66.6 radix: 60.6 ***
NOTICE:  num elem:  40   qsort: 66.9 radix: 52.7
NOTICE:  num elem:  50   qsort: 60.4 radix: 45.9
NOTICE:  num elem:  60   qsort: 58.4 radix: 41.8
NOTICE:  num elem:  70   qsort: 63.9 radix: 37.8
NOTICE:  num elem:  80   qsort: 56.8 radix: 37.1
NOTICE:  num elem:  90   qsort: 58.7 radix: 33.5

select (250*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 77.9 radix: 79.7
NOTICE:  num elem:  40   qsort: 74.2 radix: 72.2 ***
NOTICE:  num elem:  50   qsort: 88.0 radix: 64.6
NOTICE:  num elem:  60   qsort: 90.7 radix: 64.6
NOTICE:  num elem:  70   qsort: 97.2 radix: 61.2
NOTICE:  num elem:  80   qsort: 96.8 radix: 60.3
NOTICE:  num elem:  90   qsort: 94.6 radix: 63.6

select (64000*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
NOTICE:  num elem:  30   qsort: 82.6 radix: 83.9
NOTICE:  num elem:  40   qsort: 91.2 radix: 76.9 ***
NOTICE:  num elem:  50   qsort: 86.7 radix: 70.9
NOTICE:  num elem:  60   qsort: 91.4 radix: 75.0
NOTICE:  num elem:  70   qsort: 91.1 radix: 74.5
NOTICE:  num elem:  80   qsort: 96.8 radix: 63.6
NOTICE:  num elem:  90   qsort: 107.9 radix: 61.3

select (1_000_000_000*random())::int  as res from generate_series(1,100,1) i order by res offset 1000;
SET
NOTICE:  num elem:  30   qsort: 76.0 radix: 84.3
NOTICE:  num elem:  40   qsort: 78.7 radix: 74.8 ***
NOTICE:  num elem:  50   qsort: 91.7 radix: 74.6
NOTICE:  num elem:  60   qsort: 92.7 radix: 70.3
NOTICE:  num elem:  70   qsort: 98.2 radix: 68.1
NOTICE:  num elem:  80   qsort: 104.8 radix: 70.5
NOTICE:  num elem:  90   qsort: 111.0 radix: 70.6


  [application/x-patch] v6-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch (2.7K, 3-v6-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch)
  download | inline diff:
From 414c3323b8f0fbe845a5f0a43d761bc272eb71a6 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Wed, 12 Nov 2025 14:31:24 +0700
Subject: [PATCH v6 2/2] Detect common prefix to avoid wasted work during radix
 sort

This is particularly useful for integers, since they commonly
have some zero upper bytes.

Reviewed-by: Chengpeng Yan <[email protected]>
---
 src/backend/utils/sort/tuplesort.c | 66 ++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index f87235b90d7..8b573d48d1d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2915,10 +2916,67 @@ sort_byvalue_datum(SortTuple *data, size_t n, Tuplesortstate *state)
 	{
 		if (not_null_count > RADIX_SORT_THRESHOLD)
 		{
-			radix_sort_tuple(not_null_start,
-							 not_null_count,
-							 0,
-							 state);
+			int			common_prefix;
+			Datum		first_datum = 0;
+			Datum		common_upper_bits = 0;
+
+			/*
+			 * Compute the common prefix to skip unproductive recursion steps
+			 * during radix sort.
+			 */
+			for (SortTuple *st = not_null_start;
+				 st < not_null_start + not_null_count;
+				 st++)
+			{
+				Datum		this_datum = st->datum1;
+
+				if (st == not_null_start)
+				{
+					/*
+					 * Need to start with some value, may as well be the first
+					 * one.
+					 */
+					first_datum = this_datum;
+				}
+				else
+				{
+					/*
+					 * Accumulate bits that represent a difference from the
+					 * reference datum.
+					 */
+					common_upper_bits |= first_datum ^ this_datum;
+				}
+			}
+
+			if (common_upper_bits == 0)
+			{
+				/*
+				 * All NOT NULL tuples have the same datum, so we can skip
+				 * radix sort. Sort using the tiebreak comparator if necessary.
+				 */
+				if (state->base.onlyKey == NULL)
+				{
+					qsort_tuple(not_null_start,
+								not_null_count,
+								state->base.comparetup_tiebreak,
+								state);
+				}
+			}
+			else
+			{
+				/*
+				 * The upper bits of common_upper_bits are zero where all
+				 * datums have the same bits. The byte position of the
+				 * leftmost one bit is the byte where radix sort should start.
+				 */
+				common_prefix = SIZEOF_DATUM - 1 -
+					(pg_leftmost_one_pos64(common_upper_bits) / BITS_PER_BYTE);
+
+				radix_sort_tuple(not_null_start,
+								 not_null_count,
+								 common_prefix,
+								 state);
+			}
 		}
 		else
 		{
-- 
2.52.0



  [application/x-patch] v6-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch (26.4K, 4-v6-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch)
  download | inline diff:
From ab05807d661fdf460b313cc74a8788551571aa0b Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v6 1/2] Use radix sort when SortTuple contains a pass-by-value
 datum
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For now this only works for signed and unsigned ints
with the usual comparison semantics, the same types
for which we previously had separate qsort
specializations.

[TODO...]

Reviewed-by: Chengpeng Yan <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Tested-by: Chao Li <[email protected]> (earlier version)
---
 src/backend/utils/sort/tuplesort.c      | 547 ++++++++++++++++++------
 src/include/utils/guc.h                 |   1 +
 src/include/utils/sortsupport.h         | 101 -----
 src/include/utils/tuplesort.h           |   1 +
 src/test/regress/expected/tuplesort.out |   6 +-
 src/tools/pgindent/typedefs.list        |   1 +
 6 files changed, 420 insertions(+), 237 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 9e4cd816137..f87235b90d7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -7,8 +7,8 @@
  * applied to different kinds of sortable objects.  Implementation of
  * the particular sorting variants is given in tuplesortvariants.c.
  * This module works efficiently for both small and large amounts
- * of data.  Small amounts are sorted in-memory using qsort().  Large
- * amounts are sorted using temporary files and a standard external sort
+ * of data.  Small amounts are sorted in-memory.  Large amounts are
+ * sorted using temporary files and a standard external sort
  * algorithm.
  *
  * See Knuth, volume 3, for more than you want to know about external
@@ -32,7 +32,7 @@
  * is specified in kilobytes by the caller (most pass work_mem).  Initially,
  * we absorb tuples and simply store them in an unsorted array as long as
  * we haven't exceeded workMem.  If we reach the end of the input without
- * exceeding workMem, we sort the array using qsort() and subsequently return
+ * exceeding workMem, we sort the array in memory and subsequently return
  * tuples just by scanning the tuple array sequentially.  If we do exceed
  * workMem, we begin to emit tuples into sorted runs in temporary tapes.
  * When tuples are dumped in batch after quicksorting, we begin a new run
@@ -477,121 +477,15 @@ static void free_sort_tuple(Tuplesortstate *state, SortTuple *stup);
 static void tuplesort_free(Tuplesortstate *state);
 static void tuplesort_updatemax(Tuplesortstate *state);
 
-/*
- * Specialized comparators that we can inline into specialized sorts.  The goal
- * is to try to sort two tuples without having to follow the pointers to the
- * comparator or the tuple.
- *
- * XXX: For now, there is no specialization for cases where datum1 is
- * authoritative and we don't even need to fall back to a callback at all (that
- * would be true for types like int4/int8/timestamp/date, but not true for
- * abbreviations of text or multi-key sorts.  There could be!  Is it worth it?
- */
-
-/* Used if first key's comparator is ssup_datum_unsigned_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplyUnsignedSortComparator(a->datum1, a->isnull1,
-										  b->datum1, b->isnull1,
-										  &state->base.sortKeys[0]);
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
-
-/* Used if first key's comparator is ssup_datum_signed_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_signed_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplySignedSortComparator(a->datum1, a->isnull1,
-										b->datum1, b->isnull1,
-										&state->base.sortKeys[0]);
-
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
-
-/* Used if first key's comparator is ssup_datum_int32_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplyInt32SortComparator(a->datum1, a->isnull1,
-									   b->datum1, b->isnull1,
-									   &state->base.sortKeys[0]);
-
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
 
 /*
  * Special versions of qsort just for SortTuple objects.  qsort_tuple() sorts
  * any variant of SortTuples, using the appropriate comparetup function.
  * qsort_ssup() is specialized for the case where the comparetup function
  * reduces to ApplySortComparator(), that is single-key MinimalTuple sorts
- * and Datum sorts.  qsort_tuple_{unsigned,signed,int32} are specialized for
- * common comparison functions on pass-by-value leading datums.
+ * and Datum sorts.
  */
 
-#define ST_SORT qsort_tuple_unsigned
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_unsigned_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
-#define ST_SORT qsort_tuple_signed
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_signed_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
-#define ST_SORT qsort_tuple_int32
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_int32_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
 #define ST_SORT qsort_tuple
 #define ST_ELEMENT_TYPE SortTuple
 #define ST_COMPARE_RUNTIME_POINTER
@@ -613,6 +507,20 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+/* state for radix sort */
+typedef struct RadixPartitionInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+} RadixPartitionInfo;
+
+#define RADIX_SORT_THRESHOLD 40
+
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -1364,7 +1272,7 @@ tuplesort_performsort(Tuplesortstate *state)
 			 */
 			if (SERIAL(state))
 			{
-				/* Just qsort 'em and we're done */
+				/* Sort in memory and we're done */
 				tuplesort_sort_memtuples(state);
 				state->status = TSS_SORTEDINMEM;
 			}
@@ -2332,7 +2240,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	state->currentRun++;
 
 	if (trace_sort)
-		elog(LOG, "worker %d starting quicksort of run %d: %s",
+		elog(LOG, "worker %d starting internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
@@ -2343,7 +2251,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	tuplesort_sort_memtuples(state);
 
 	if (trace_sort)
-		elog(LOG, "worker %d finished quicksort of run %d: %s",
+		elog(LOG, "worker %d finished internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
@@ -2653,10 +2561,391 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+
+/* radix sort routines */
+
+/*
+ * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
+ */
+static inline uint8
+current_byte(Datum key, int level)
+{
+	int			shift = (SIZEOF_DATUM - 1 - level) * BITS_PER_BYTE;
+
+	return (key >> shift) & 0xFF;
+}
+
+/*
+ * Normalize datum such that unsigned comparison is order-preserving,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+normalize_datum(Datum orig, SortSupport ssup)
+{
+	Datum		norm_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First truncate to uint32. Technically, we don't need to do this,
+		 * but it forces the upper half of the datum to be zero regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) + ((uint32) PG_INT32_MAX) + 1;
+
+		norm_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		norm_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		norm_datum1 = ~norm_datum1;
+
+	return norm_datum1;
+}
+
+/*
+ * radix_sort_tuple
+ *
+ * Radix sort by the pass-by-value datum in datum1. This is a modification of
+ * ska_byte_sort() from https://github.com/skarupke/ska_sort
+ * The original copyright notice follows:
+ *
+ * Copyright Malte Skarupke 2016.
+ * Distributed under the Boost Software License, Version 1.0.
+ *
+ * Boost Software License - Version 1.0 - August 17th, 2003
+ *
+ * Permission is hereby granted, free of charge, to any person or organization
+ * obtaining a copy of the software and accompanying documentation covered by
+ * this license (the "Software") to use, reproduce, display, distribute,
+ * execute, and transmit the Software, and to prepare derivative works of the
+ * Software, and to permit third-parties to whom the Software is furnished to
+ * do so, all subject to the following:
+ *
+ * The copyright notices in the Software and this entire statement, including
+ * the above license grant, this restriction and the following disclaimer,
+ * must be included in all copies of the Software, in whole or in part, and
+ * all derivative works of the Software, unless such copies or derivative
+ * works are solely in the form of machine-executable object code generated by
+ * a source language processor.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+ * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+ * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+	RadixPartitionInfo partitions[256] = {0};
+	uint8		remaining_partitions[256];
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortSupport ssup = &state->base.sortKeys[0];
+	size_t		start_offset = 0;
+	SortTuple  *partition_begin = begin;
+
+	/* count number of occurrences of each byte */
+	for (SortTuple *st = begin; st < begin + n_elems; st++)
+	{
+		uint8		this_partition;
+
+		/* extract the byte for this level from the normalized datum */
+		this_partition = current_byte(normalize_datum(st->datum1, ssup),
+									  level);
+
+		/* save it for the permutation step */
+		st->curbyte = this_partition;
+
+		partitions[this_partition].count++;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; i++)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count != 0)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			num_partitions++;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	/*
+	 * Swap tuples to correct partition.
+	 *
+	 * In traditional American flag sort, a swap sends the current element to
+	 * the correct partition, but the array pointer only advances if the
+	 * partner of the swap happens to be an element that belongs to the
+	 * current partition. That only requires one pass through the array, but
+	 * the disadvantage is we don't know if the pointer can advance until the
+	 * swap completes. Here lies the most interesting innovation from the
+	 * upstream ska_byte_sort: After initiating the swap, we immediately
+	 * proceed to the next element. This makes better use of CPU pipelining,
+	 * but also means that we will often need multiple iterations of this
+	 * loop. ska_byte_sort() maintains a separate list of which partitions
+	 * haven't finished, which is updated every loop iteration. Here we simply
+	 * check each partition during every iteration.
+	 *
+	 * If we started with a single partition, there is nothing to do. If a
+	 * previous loop iteration results in only one partition that hasn't been
+	 * counted as sorted, we know it's actually sorted and can exit the loop.
+	 */
+	num_remaining = num_partitions;
+	while (num_remaining > 1)
+	{
+		/* start the count over */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			for (SortTuple *st = begin + partitions[idx].offset;
+				 st < begin + partitions[idx].next_offset;
+				 st++)
+			{
+				size_t		offset = partitions[st->curbyte].offset++;
+				SortTuple	tmp;
+
+				/* swap current tuple with destination position */
+				Assert(offset < n_elems);
+				tmp = *st;
+				*st = begin[offset];
+				begin[offset] = tmp;
+
+				CHECK_FOR_INTERRUPTS();
+			};
+
+			/* count sorted partitions */
+			if (partitions[idx].offset == partitions[idx].next_offset)
+				num_remaining--;
+		}
+	}
+
+	/* recurse */
+	for (uint8 *rp = remaining_partitions;
+		 rp < remaining_partitions + num_partitions;
+		 rp++)
+	{
+		size_t		end_offset = partitions[*rp].next_offset;
+		SortTuple  *partition_end = begin + end_offset;
+		ptrdiff_t	num_elements = end_offset - start_offset;
+
+		if (num_elements > 1)
+		{
+			if (level < SIZEOF_DATUM - 1)
+			{
+				if (num_elements > RADIX_SORT_THRESHOLD)
+				{
+					radix_sort_tuple(partition_begin,
+									 num_elements,
+									 level + 1,
+									 state);
+				}
+				else
+				{
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup,
+								state);
+				}
+			}
+			else if (state->base.onlyKey == NULL)
+			{
+				/*
+				 * We've finished radix sort on all bytes of the pass-by-value
+				 * datum (possibly abbreviated), now sort using the tiebreak
+				 * comparator.
+				 */
+				qsort_tuple(partition_begin,
+							num_elements,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+
+		start_offset = end_offset;
+		partition_begin = partition_end;
+	}
+}
+
+/*
+ * Partition tuples by NULL and NOT NULL first sort key.
+ * Then dispatch to either radix sort or qsort.
+ */
+static void
+sort_byvalue_datum(SortTuple *data, size_t n, Tuplesortstate *state)
+{
+	bool		nulls_first = state->base.sortKeys[0].ssup_nulls_first;
+	SortTuple  *null_start;
+	SortTuple  *not_null_start;
+	size_t		d1 = 0,
+				d2,
+				null_count,
+				not_null_count;
+	bool		presorted = true;
+
+	/* presorted check */
+	for (SortTuple *st = data + 1; st < data + n; st++)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		if (COMPARETUP(state, st - 1, st) > 0)
+		{
+			presorted = false;
+			break;
+		}
+	}
+	if (presorted)
+		return;
+
+	/*
+	 * Partition by NULL-ness of the leading sort key, since we can only radix
+	 * sort on NOT NULL pass-by-value datums.
+	 */
+
+	/*
+	 * Find the first NOT NULL tuple if NULLS FIRST, or first NULL element if
+	 * NULLS LAST. This is a quick check for the common case where all tuples
+	 * are NOT NULL in the first sort key.
+	 */
+	while (d1 < n && data[d1].isnull1 == nulls_first)
+		d1++;
+
+	/*
+	 * If we have more than one tuple left after the quick check, partition
+	 * the remainder using branchless cyclic permutation, based on
+	 * https://orlp.net/blog/branchless-lomuto-partitioning/
+	 */
+	if (d1 < n - 1)
+	{
+		size_t		j = d1;		/* forward index */
+		SortTuple	save = data[d1];	/* create gap at front */
+
+		while (j < n - 1)
+		{
+			/* gap is at j, move d1's element to gap */
+			data[j] = data[d1];
+
+			/* advance j to first unknown element */
+			j += 1;
+
+			/* move j's element back to d1 */
+			data[d1] = data[j];
+
+			/* advance d1 if the element belongs in the left partition */
+			d1 += (data[d1].isnull1 == nulls_first);
+		}
+
+		/* place gap between left and right partitions */
+		data[j] = data[d1];
+
+		/* restore the saved element */
+		data[d1] = save;
+
+		/* assign it to the correct partition */
+		d1 += (data[d1].isnull1 == nulls_first);
+	}
+
+	/* d1 is now the number of elements in the left partition */
+	d2 = n - d1;
+
+	/* set pointers and counts for each partition */
+	if (nulls_first)
+	{
+		null_start = state->memtuples;
+		null_count = d1;
+		not_null_start = state->memtuples + d1;
+		not_null_count = d2;
+	}
+	else
+	{
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+		null_start = state->memtuples + d1;
+		null_count = d2;
+	}
+
+	for (SortTuple *st = null_start;
+		 st < null_start + null_count;
+		 st++)
+		Assert(st->isnull1 == true);
+	for (SortTuple *st = not_null_start;
+		 st < not_null_start + not_null_count;
+		 st++)
+		Assert(st->isnull1 == false);
+
+	/*
+	 * Sort the NULL partition using tiebreak comparator, if necessary. This
+	 * will repeat the comparison on isnull1 for abbreviated keys, but it's
+	 * not worth adding complexity to avoid that.
+	 */
+	if (state->base.onlyKey == NULL && null_count > 1)
+	{
+		qsort_tuple(null_start,
+					null_count,
+					state->base.comparetup_tiebreak,
+					state);
+	}
+
+	/*
+	 * Sort the NOT NULL partition, using radix sort if large enough,
+	 * otherwise fall back to quicksort.
+	 */
+	if (not_null_count > 1)
+	{
+		if (not_null_count > RADIX_SORT_THRESHOLD)
+		{
+			radix_sort_tuple(not_null_start,
+							 not_null_count,
+							 0,
+							 state);
+		}
+		else
+		{
+			qsort_tuple(not_null_start,
+						not_null_count,
+						state->base.comparetup,
+						state);
+		}
+	}
+}
+
+/* Verify in-memory sort using standard comparator. */
+static void
+verify_memtuples_sorted(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *st = state->memtuples + 1;
+		 st < state->memtuples + state->memtupcount;
+		 st++)
+		Assert(COMPARETUP(state, st - 1, st) <= 0);
+#endif
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Sort all memtuples using specialized routines.
  *
- * Quicksort is used for small in-memory sorts, and external sort runs.
+ * Quicksort or radix sort is used for small in-memory sorts, and external sort runs.
  */
 static void
 tuplesort_sort_memtuples(Tuplesortstate *state)
@@ -2666,30 +2955,22 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 	if (state->memtupcount > 1)
 	{
 		/*
-		 * Do we have the leading column's value or abbreviation in datum1,
-		 * and is there a specialization for its comparator?
+		 * Do we have the leading column's value or abbreviation in datum1?
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			/* Does it compare as an integer? */
+			if (state->memtupcount > RADIX_SORT_THRESHOLD &&
+				(ssup.comparator == ssup_datum_unsigned_cmp ||
+				 ssup.comparator == ssup_datum_signed_cmp ||
+				 ssup.comparator == ssup_datum_int32_cmp))
 			{
-				qsort_tuple_signed(state->memtuples,
+				sort_byvalue_datum(state->memtuples,
 								   state->memtupcount,
 								   state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
-			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
+				verify_memtuples_sorted(state);
 				return;
 			}
 		}
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index bf39878c43e..dae35dd3fc0 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -324,6 +324,7 @@ extern PGDLLIMPORT int tcp_user_timeout;
 extern PGDLLIMPORT char *role_string;
 extern PGDLLIMPORT bool in_hot_standby_guc;
 extern PGDLLIMPORT bool trace_sort;
+extern PGDLLIMPORT bool wip_radix_sort;
 
 #ifdef DEBUG_BOUNDED_SORT
 extern PGDLLIMPORT bool optimize_bounded_sort;
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 0083756bbdb..a8f8f9f026a 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -229,107 +229,6 @@ ApplySortComparator(Datum datum1, bool isNull1,
 	return compare;
 }
 
-static inline int
-ApplyUnsignedSortComparator(Datum datum1, bool isNull1,
-							Datum datum2, bool isNull2,
-							SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = datum1 < datum2 ? -1 : datum1 > datum2 ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
-static inline int
-ApplySignedSortComparator(Datum datum1, bool isNull1,
-						  Datum datum2, bool isNull2,
-						  SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = DatumGetInt64(datum1) < DatumGetInt64(datum2) ? -1 :
-			DatumGetInt64(datum1) > DatumGetInt64(datum2) ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
-static inline int
-ApplyInt32SortComparator(Datum datum1, bool isNull1,
-						 Datum datum2, bool isNull2,
-						 SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = DatumGetInt32(datum1) < DatumGetInt32(datum2) ? -1 :
-			DatumGetInt32(datum1) > DatumGetInt32(datum2) ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
 /*
  * Apply a sort comparator function and return a 3-way comparison using full,
  * authoritative comparator.  This takes care of handling reverse-sort and
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 5fe229e211b..da68f45acf2 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -116,6 +116,7 @@ typedef struct
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
 	bool		isnull1;		/* is first key column NULL? */
+	uint8		curbyte;		/* chunk of datum1 for current radix sort pass */
 	int			srctape;		/* source tape number */
 } SortTuple;
 
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 6dd97e7427a..fc1321bf443 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -304,9 +304,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
  20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
  10009 | 00000000-0000-0000-0000-000000010008 | 00000000-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992
  10008 | 00000000-0000-0000-0000-000000010007 | 00000000-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993
 (5 rows)
@@ -335,9 +335,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
- 20003 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
+ 20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
   9993 | 00000000-0000-0000-0000-000000009992 | 00000000-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008
   9994 | 00000000-0000-0000-0000-000000009993 | 00000000-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007
 (5 rows)
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 14dec2d49c1..0c4e9511151 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4048,6 +4048,7 @@ qsort_comparator
 query_pathkeys_callback
 radius_attribute
 radius_packet
+RadixPartitionInfo
 rangeTableEntry_used_context
 rank_context
 rbt_allocfunc
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-01-21 08:12  =?gb18030?B?emVuZ21hbg==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?gb18030?B?emVuZ21hbg==?= @ 2026-01-21 08:12 UTC (permalink / raw)
  To: =?gb18030?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?gb18030?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

Hi,

I wanted to point out a small detail: the line `extern PGDLLIMPORT bool wip_radix_sort;`
 in v6-0001 appears to not have been fully cleaned up — I suspect it might have been overlooked.

--
Regards,
Man Zeng

^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-10 13:37  John Naylor <[email protected]>
  parent: =?gb18030?B?emVuZ21hbg==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-02-10 13:37 UTC (permalink / raw)
  To: zengman <[email protected]>; +Cc: pgsql-hackers <[email protected]>

On Wed, Jan 21, 2026 at 1:40 PM John Naylor <[email protected]> wrote:
> Attached is v6, which seems pretty close to committable. The temporary
> GUC is gone, as are all the integer-like qsort specializations. There
> are a few cosmetic rearrangements, but the algorithm is pretty much
> the same as v5. The only visible behavior difference should be the
> addition of a presorted check just like qsort has.

I did some more self-review after a couple weeks and made some more
minor cosmetic adjustments for v7. I will commit 0001 and 0002 in a
few days unless there are objections.

> * The only things I have doubts about are the user-visible messages
> mentioning "qsort". For trace_sort, it seemed logical to re-use the
> term "internal sort" from other such messages, so I've done that for
> v6. EXPLAIN ANALYZE is a bit harder: Do we want to even change
> anything here? When things like this come up, there is the question of
> tools that parse the output. It doesn't seem particularly important to
> try to display exact info here, especially since radix sort must
> divert to qsort for multiple keys etc.

I decided to split this out to a separate patch, 0003. It doesn't seem
important, and may not be desirable after all. There is also a 0004
which is just verifying sort order in assert builds on the existing
qsort calls and not just the new sort. Looks better this way.

> Speaking of the NULL partitioning step, I've tried to add some
> comments that make sense, but the pictures in the linked blog are more
> helpful than any words I can come up with. It's easier to understand
> than to describe.

I tried to improve readability here and added missing CHECK_FOR_INTERRUPTS().

Also ran UB sanitizer and Valgrind.

On Wed, Jan 21, 2026 at 3:12 PM zengman <[email protected]> wrote:
> I wanted to point out a small detail: the line `extern PGDLLIMPORT bool wip_radix_sort;`
>  in v6-0001 appears to not have been fully cleaned up — I suspect it might have been overlooked.

Fixed as well.

--
John Naylor
Amazon Web Services


Attachments:

  [text/x-patch] v7-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch (3.0K, 2-v7-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch)
  download | inline diff:
From dfefff9d2e28979dee92f3b61e954c3c8b1ee236 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Wed, 12 Nov 2025 14:31:24 +0700
Subject: [PATCH v7 2/4] Detect common prefix to avoid wasted work during radix
 sort

Start radix sort at the most significant byte position that has more
than one distinct byte in the input. This skips passes where radix
sort would count the distinct bytes just to find only a single one,
in which case there is nothing further to do for that pass. This can
give a few percent speedup for integers that have some zero upper
bytes, which is common for those that didn't arrive via abbreviation.

Reviewed-by: Chengpeng Yan <[email protected]>
Discussion: https://postgr.es/m/CANWCAZYpGMDSSwAa18fOxJGXaPzVdyPsWpOkfCX32DWh3Qznzw@mail.gmail.com
---
 src/backend/utils/sort/tuplesort.c | 66 ++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 07fa83c7944..7b22546a811 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2928,10 +2929,67 @@ sort_byvalue_datum(SortTuple *data, size_t n, Tuplesortstate *state)
 	}
 	else
 	{
-		radix_sort_tuple(not_null_start,
-						 not_null_count,
-						 0,
-						 state);
+		int			common_prefix;
+		Datum		first_datum = 0;
+		Datum		common_upper_bits = 0;
+
+		/*
+		 * Compute the common prefix to skip unproductive recursion steps
+		 * during radix sort.
+		 */
+		for (SortTuple *st = not_null_start;
+			 st < not_null_start + not_null_count;
+			 st++)
+		{
+			Datum		this_datum = st->datum1;
+
+			if (st == not_null_start)
+			{
+				/*
+				 * Need to start with some value, may as well be the first
+				 * one.
+				 */
+				first_datum = this_datum;
+			}
+			else
+			{
+				/*
+				 * Accumulate bits that represent a difference from the
+				 * reference datum.
+				 */
+				common_upper_bits |= first_datum ^ this_datum;
+			}
+		}
+
+		if (common_upper_bits == 0)
+		{
+			/*
+			 * All NOT NULL tuples have the same datum, so we can skip radix
+			 * sort. Sort using the tiebreak comparator if necessary.
+			 */
+			if (state->base.onlyKey == NULL)
+			{
+				qsort_tuple(not_null_start,
+							not_null_count,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+		else
+		{
+			/*
+			 * The upper bits of common_upper_bits are zero where all datums
+			 * have the same bits. The byte position of the leftmost one bit
+			 * is the byte where radix sort should start.
+			 */
+			common_prefix = SIZEOF_DATUM - 1 -
+				(pg_leftmost_one_pos64(common_upper_bits) / BITS_PER_BYTE);
+
+			radix_sort_tuple(not_null_start,
+							 not_null_count,
+							 common_prefix,
+							 state);
+		}
 	}
 }
 
-- 
2.53.0



  [text/x-patch] v7-0001-Perform-radix-sort-on-SortTuples-with-pass-by-val.patch (28.5K, 3-v7-0001-Perform-radix-sort-on-SortTuples-with-pass-by-val.patch)
  download | inline diff:
From 81be72e69208b155f5619630043690157871a755 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Fri, 17 Oct 2025 09:57:43 +0700
Subject: [PATCH v7 1/4] Perform radix sort on SortTuples with pass-by-value
 Datums
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Radix sort can be much faster than quicksort, but for our purposes it
is limited to sequences of unsigned bytes. To make tuples with other
types amenable to this technique, several features of tuple comparison
must be accounted for, i.e. the sort keys must be "normalized":

1. Signedness -- It's possible to modify a signed integer such that
it can be compared as unsigned. For example, a signed char has range
-128 to 127. If we cast that to unsigned char and add 128, the range
of values becomes 0 to 255 while preserving order.

2. Direction -- SQL allows specification of ASC or DESC. The
descending case is easily handled by taking the complement of the
unsigned representation.

3. NULL values -- NULLS FIRST and NULLS LAST must work correctly.

This commmit only handles the case where datum1 is pass-by-value
Datum (possibly abbreviated) that compares like an ordinary
integer. (Abbreviations of values of type "numeric" are a convenient
counterexample.) First, tuples are partitioned by nullness in the
correct NULL ordering. Then the NOT NULL tuples are sorted with radix
sort on datum1. For tiebreaks on subsequent sortkeys (including the
first sort key if abbreviated), we divert to the usual qsort.

ORDER BY queries on pre-warmed buffers are up to 2x faster on high
cardinality inputs with radix sort than the sort specializations added
by commit 697492434, so get rid of them. It's sufficient to fall back
to qsort_tuple() for small arrays. Moderately low cardinality inputs
show more modest improvents. Our qsort is strongly optimized for very
low cardinality inputs, but radix sort is usually comparable there.

The changes to the regression tests are caused by under-specified sort
orders, e.g. "SELECT a, b from mytable order by a;". For unstable
sorts, such as our qsort and this in-place radix sort, there is no
guarantee of the order of "b" within each group of "a".

The implementation is taken from ska_byte_sort() (Boost licensed),
which is similar to American flag sort (an in-place radix sort) with
modifications to make it better suited for modern pipelined CPUs.

The technique of normalization described above can also be extended
to the case of multiple keys. That is left for future work (Thanks
to Peter Geoghegan for the suggestion to look into this area).

Reviewed-by: Chengpeng Yan <[email protected]>
Reviewed-by: Álvaro Herrera <[email protected]>
Reviewed-by: zengman <[email protected]>
Reviewed-by: Chao Li <[email protected]> (earlier version)
Discussion: https://postgr.es/m/CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com
---
 src/backend/utils/sort/tuplesort.c      | 556 ++++++++++++++++++------
 src/include/utils/sortsupport.h         | 101 -----
 src/include/utils/tuplesort.h           |   1 +
 src/test/regress/expected/tuplesort.out |   6 +-
 src/tools/pgindent/typedefs.list        |   1 +
 5 files changed, 427 insertions(+), 238 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 1edcad89c88..07fa83c7944 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -7,8 +7,8 @@
  * applied to different kinds of sortable objects.  Implementation of
  * the particular sorting variants is given in tuplesortvariants.c.
  * This module works efficiently for both small and large amounts
- * of data.  Small amounts are sorted in-memory using qsort().  Large
- * amounts are sorted using temporary files and a standard external sort
+ * of data.  Small amounts are sorted in-memory.  Large amounts are
+ * sorted using temporary files and a standard external sort
  * algorithm.
  *
  * See Knuth, volume 3, for more than you want to know about external
@@ -26,16 +26,16 @@
  * Historically, we divided the input into sorted runs using replacement
  * selection, in the form of a priority tree implemented as a heap
  * (essentially Knuth's Algorithm 5.2.3H), but now we always use quicksort
- * for run generation.
+ * or radix sort for run generation.
  *
  * The approximate amount of memory allowed for any one sort operation
  * is specified in kilobytes by the caller (most pass work_mem).  Initially,
  * we absorb tuples and simply store them in an unsorted array as long as
  * we haven't exceeded workMem.  If we reach the end of the input without
- * exceeding workMem, we sort the array using qsort() and subsequently return
+ * exceeding workMem, we sort the array in memory and subsequently return
  * tuples just by scanning the tuple array sequentially.  If we do exceed
  * workMem, we begin to emit tuples into sorted runs in temporary tapes.
- * When tuples are dumped in batch after quicksorting, we begin a new run
+ * When tuples are dumped in batch after in-memory sorting, we begin a new run
  * with a new output tape.  If we reach the max number of tapes, we write
  * subsequent runs on the existing tapes in a round-robin fashion.  We will
  * need multiple merge passes to finish the merge in that case.  After the
@@ -476,121 +476,15 @@ static void free_sort_tuple(Tuplesortstate *state, SortTuple *stup);
 static void tuplesort_free(Tuplesortstate *state);
 static void tuplesort_updatemax(Tuplesortstate *state);
 
-/*
- * Specialized comparators that we can inline into specialized sorts.  The goal
- * is to try to sort two tuples without having to follow the pointers to the
- * comparator or the tuple.
- *
- * XXX: For now, there is no specialization for cases where datum1 is
- * authoritative and we don't even need to fall back to a callback at all (that
- * would be true for types like int4/int8/timestamp/date, but not true for
- * abbreviations of text or multi-key sorts.  There could be!  Is it worth it?
- */
-
-/* Used if first key's comparator is ssup_datum_unsigned_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_unsigned_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplyUnsignedSortComparator(a->datum1, a->isnull1,
-										  b->datum1, b->isnull1,
-										  &state->base.sortKeys[0]);
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
-
-/* Used if first key's comparator is ssup_datum_signed_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_signed_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplySignedSortComparator(a->datum1, a->isnull1,
-										b->datum1, b->isnull1,
-										&state->base.sortKeys[0]);
-
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
-
-/* Used if first key's comparator is ssup_datum_int32_cmp */
-static pg_attribute_always_inline int
-qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
-{
-	int			compare;
-
-	compare = ApplyInt32SortComparator(a->datum1, a->isnull1,
-									   b->datum1, b->isnull1,
-									   &state->base.sortKeys[0]);
-
-	if (compare != 0)
-		return compare;
-
-	/*
-	 * No need to waste effort calling the tiebreak function when there are no
-	 * other keys to sort on.
-	 */
-	if (state->base.onlyKey != NULL)
-		return 0;
-
-	return state->base.comparetup_tiebreak(a, b, state);
-}
 
 /*
  * Special versions of qsort just for SortTuple objects.  qsort_tuple() sorts
  * any variant of SortTuples, using the appropriate comparetup function.
  * qsort_ssup() is specialized for the case where the comparetup function
  * reduces to ApplySortComparator(), that is single-key MinimalTuple sorts
- * and Datum sorts.  qsort_tuple_{unsigned,signed,int32} are specialized for
- * common comparison functions on pass-by-value leading datums.
+ * and Datum sorts.
  */
 
-#define ST_SORT qsort_tuple_unsigned
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_unsigned_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
-#define ST_SORT qsort_tuple_signed
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_signed_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
-#define ST_SORT qsort_tuple_int32
-#define ST_ELEMENT_TYPE SortTuple
-#define ST_COMPARE(a, b, state) qsort_tuple_int32_compare(a, b, state)
-#define ST_COMPARE_ARG_TYPE Tuplesortstate
-#define ST_CHECK_FOR_INTERRUPTS
-#define ST_SCOPE static
-#define ST_DEFINE
-#include "lib/sort_template.h"
-
 #define ST_SORT qsort_tuple
 #define ST_ELEMENT_TYPE SortTuple
 #define ST_COMPARE_RUNTIME_POINTER
@@ -612,6 +506,23 @@ qsort_tuple_int32_compare(SortTuple *a, SortTuple *b, Tuplesortstate *state)
 #define ST_DEFINE
 #include "lib/sort_template.h"
 
+/* state for radix sort */
+typedef struct RadixSortInfo
+{
+	union
+	{
+		size_t		count;
+		size_t		offset;
+	};
+	size_t		next_offset;
+} RadixSortInfo;
+
+/*
+ * Threshold below which qsort_tuple() is generally faster than a radix sort.
+ */
+#define QSORT_THRESHOLD 40
+
+
 /*
  *		tuplesort_begin_xxx
  *
@@ -1363,7 +1274,7 @@ tuplesort_performsort(Tuplesortstate *state)
 			 */
 			if (SERIAL(state))
 			{
-				/* Just qsort 'em and we're done */
+				/* Sort in memory and we're done */
 				tuplesort_sort_memtuples(state);
 				state->status = TSS_SORTEDINMEM;
 			}
@@ -2337,7 +2248,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 
 	/*
 	 * Sort all tuples accumulated within the allowed amount of memory for
-	 * this run using quicksort
+	 * this run.
 	 */
 	tuplesort_sort_memtuples(state);
 
@@ -2652,10 +2563,395 @@ sort_bounded_heap(Tuplesortstate *state)
 	state->boundUsed = true;
 }
 
+
+/* radix sort routines */
+
+/*
+ * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
+ */
+static inline uint8
+current_byte(Datum key, int level)
+{
+	int			shift = (SIZEOF_DATUM - 1 - level) * BITS_PER_BYTE;
+
+	return (key >> shift) & 0xFF;
+}
+
 /*
- * Sort all memtuples using specialized qsort() routines.
+ * Normalize datum such that unsigned comparison is order-preserving,
+ * taking ASC/DESC into account as well.
+ */
+static inline Datum
+normalize_datum(Datum orig, SortSupport ssup)
+{
+	Datum		norm_datum1;
+
+	if (ssup->comparator == ssup_datum_signed_cmp)
+	{
+		norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
+	}
+	else if (ssup->comparator == ssup_datum_int32_cmp)
+	{
+		/*
+		 * First truncate to uint32. Technically, we don't need to do this,
+		 * but it forces the upper half of the datum to be zero regardless of
+		 * sign.
+		 */
+		uint32		u32 = DatumGetUInt32(orig) + ((uint32) PG_INT32_MAX) + 1;
+
+		norm_datum1 = UInt32GetDatum(u32);
+	}
+	else
+	{
+		Assert(ssup->comparator == ssup_datum_unsigned_cmp);
+		norm_datum1 = orig;
+	}
+
+	if (ssup->ssup_reverse)
+		norm_datum1 = ~norm_datum1;
+
+	return norm_datum1;
+}
+
+/*
+ * radix_sort_tuple
+ *
+ * Radix sort by the pass-by-value datum in datum1. This is a modification of
+ * ska_byte_sort() from https://github.com/skarupke/ska_sort
+ * The original copyright notice follows:
+ *
+ * Copyright Malte Skarupke 2016.
+ * Distributed under the Boost Software License, Version 1.0.
+ *
+ * Boost Software License - Version 1.0 - August 17th, 2003
+ *
+ * Permission is hereby granted, free of charge, to any person or organization
+ * obtaining a copy of the software and accompanying documentation covered by
+ * this license (the "Software") to use, reproduce, display, distribute,
+ * execute, and transmit the Software, and to prepare derivative works of the
+ * Software, and to permit third-parties to whom the Software is furnished to
+ * do so, all subject to the following:
+ *
+ * The copyright notices in the Software and this entire statement, including
+ * the above license grant, this restriction and the following disclaimer,
+ * must be included in all copies of the Software, in whole or in part, and
+ * all derivative works of the Software, unless such copies or derivative
+ * works are solely in the form of machine-executable object code generated by
+ * a source language processor.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
+ * SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
+ * FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+static void
+radix_sort_tuple(SortTuple *begin, size_t n_elems, int level, Tuplesortstate *state)
+{
+	RadixSortInfo partitions[256] = {0};
+	uint8		remaining_partitions[256];
+	size_t		total = 0;
+	int			num_partitions = 0;
+	int			num_remaining;
+	SortSupport ssup = &state->base.sortKeys[0];
+	size_t		start_offset = 0;
+	SortTuple  *partition_begin = begin;
+
+	/* count number of occurrences of each byte */
+	for (SortTuple *st = begin; st < begin + n_elems; st++)
+	{
+		uint8		this_partition;
+
+		/* extract the byte for this level from the normalized datum */
+		this_partition = current_byte(normalize_datum(st->datum1, ssup),
+									  level);
+
+		/* save it for the permutation step */
+		st->curbyte = this_partition;
+
+		partitions[this_partition].count++;
+
+		CHECK_FOR_INTERRUPTS();
+	}
+
+	/* compute partition offsets */
+	for (int i = 0; i < 256; i++)
+	{
+		size_t		count = partitions[i].count;
+
+		if (count != 0)
+		{
+			partitions[i].offset = total;
+			total += count;
+			remaining_partitions[num_partitions] = i;
+			num_partitions++;
+		}
+		partitions[i].next_offset = total;
+	}
+
+	/*
+	 * Swap tuples to correct partition.
+	 *
+	 * In traditional American flag sort, a swap sends the current element to
+	 * the correct partition, but the array pointer only advances if the
+	 * partner of the swap happens to be an element that belongs in the
+	 * current partition. That only requires one pass through the array, but
+	 * the disadvantage is we don't know if the pointer can advance until the
+	 * swap completes. Here lies the most interesting innovation from the
+	 * upstream ska_byte_sort: After initiating the swap, we immediately
+	 * proceed to the next element. This makes better use of CPU pipelining,
+	 * but also means that we will often need multiple iterations of this
+	 * loop. ska_byte_sort() maintains a separate list of which partitions
+	 * haven't finished, which is updated every loop iteration. Here we simply
+	 * check each partition during every iteration.
+	 *
+	 * If we started with a single partition, there is nothing to do. If a
+	 * previous loop iteration results in only one partition that hasn't been
+	 * counted as sorted, we know it's actually sorted and can exit the loop.
+	 */
+	num_remaining = num_partitions;
+	while (num_remaining > 1)
+	{
+		/* start the count over */
+		num_remaining = num_partitions;
+
+		for (int i = 0; i < num_partitions; i++)
+		{
+			uint8		idx = remaining_partitions[i];
+
+			for (SortTuple *st = begin + partitions[idx].offset;
+				 st < begin + partitions[idx].next_offset;
+				 st++)
+			{
+				size_t		offset = partitions[st->curbyte].offset++;
+				SortTuple	tmp;
+
+				/* swap current tuple with destination position */
+				Assert(offset < n_elems);
+				tmp = *st;
+				*st = begin[offset];
+				begin[offset] = tmp;
+
+				CHECK_FOR_INTERRUPTS();
+			};
+
+			/* Is this partition sorted? */
+			if (partitions[idx].offset == partitions[idx].next_offset)
+				num_remaining--;
+		}
+	}
+
+	/* recurse */
+	for (uint8 *rp = remaining_partitions;
+		 rp < remaining_partitions + num_partitions;
+		 rp++)
+	{
+		size_t		end_offset = partitions[*rp].next_offset;
+		SortTuple  *partition_end = begin + end_offset;
+		ptrdiff_t	num_elements = end_offset - start_offset;
+
+		if (num_elements > 1)
+		{
+			if (level < SIZEOF_DATUM - 1)
+			{
+				if (num_elements < QSORT_THRESHOLD)
+				{
+					qsort_tuple(partition_begin,
+								num_elements,
+								state->base.comparetup,
+								state);
+				}
+				else
+				{
+					radix_sort_tuple(partition_begin,
+									 num_elements,
+									 level + 1,
+									 state);
+				}
+			}
+			else if (state->base.onlyKey == NULL)
+			{
+				/*
+				 * We've finished radix sort on all bytes of the pass-by-value
+				 * datum (possibly abbreviated), now sort using the tiebreak
+				 * comparator.
+				 */
+				qsort_tuple(partition_begin,
+							num_elements,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
+
+		start_offset = end_offset;
+		partition_begin = partition_end;
+	}
+}
+
+/*
+ * Sort tuples having a pass-by-value Datum in datum1
+ *
+ * Partition tuples by NULL and NOT NULL first sort key.
+ * Then dispatch to either radix sort or quicksort.
+ */
+static void
+sort_byvalue_datum(SortTuple *data, size_t n, Tuplesortstate *state)
+{
+	bool		nulls_first = state->base.sortKeys[0].ssup_nulls_first;
+	SortTuple  *null_start;
+	SortTuple  *not_null_start;
+	size_t		d1 = 0,
+				d2,
+				null_count,
+				not_null_count;
+	bool		presorted = true;
+
+	/* presorted check */
+	for (SortTuple *st = data + 1; st < data + n; st++)
+	{
+		CHECK_FOR_INTERRUPTS();
+
+		if (COMPARETUP(state, st - 1, st) > 0)
+		{
+			presorted = false;
+			break;
+		}
+	}
+	if (presorted)
+		return;
+
+	/*
+	 * Partition by NULL-ness of the leading sort key, since we can only radix
+	 * sort on NOT NULL pass-by-value datums.
+	 */
+
+	/*
+	 * Find the first NOT NULL if NULLS FIRST, or first NULL if NULLS LAST.
+	 * This also serves as a quick check for the common case where all tuples
+	 * are NOT NULL in the first sort key.
+	 */
+	while (d1 < n && data[d1].isnull1 == nulls_first)
+	{
+		CHECK_FOR_INTERRUPTS();
+		d1++;
+	}
+
+	/*
+	 * If we have more than one tuple left after the quick check, partition
+	 * the remainder using branchless cyclic permutation, based on
+	 * https://orlp.net/blog/branchless-lomuto-partitioning/
+	 */
+	Assert(n > 0);
+	if (d1 < n - 1)
+	{
+		size_t		i = d1,
+					j = d1;
+		SortTuple	tmp = data[d1]; /* create gap at front */
+
+		while (j < n - 1)
+		{
+			CHECK_FOR_INTERRUPTS();
+
+			/* gap is at j, move i's element to gap */
+			data[j] = data[i];
+			/* advance j to the first unknown element */
+			j += 1;
+			/* move the first unknown element back to i */
+			data[i] = data[j];
+			/* advance i if this element belongs in the left partition */
+			i += (data[i].isnull1 == nulls_first);
+		}
+
+		/* place gap between left and right partitions */
+		data[j] = data[i];
+		/* restore the saved element */
+		data[i] = tmp;
+		/* assign it to the correct partition */
+		i += (data[i].isnull1 == nulls_first);
+
+		/* d1 is now the number of elements in the left partition */
+		d1 = i;
+	}
+
+	d2 = n - d1;
+
+	/* set pointers and counts for each partition */
+	if (nulls_first)
+	{
+		null_start = state->memtuples;
+		null_count = d1;
+		not_null_start = state->memtuples + d1;
+		not_null_count = d2;
+	}
+	else
+	{
+		not_null_start = state->memtuples;
+		not_null_count = d1;
+		null_start = state->memtuples + d1;
+		null_count = d2;
+	}
+
+	for (SortTuple *st = null_start;
+		 st < null_start + null_count;
+		 st++)
+		Assert(st->isnull1 == true);
+	for (SortTuple *st = not_null_start;
+		 st < not_null_start + not_null_count;
+		 st++)
+		Assert(st->isnull1 == false);
+
+	/*
+	 * Sort the NULL partition using tiebreak comparator, if necessary. This
+	 * will repeat the comparison on isnull1 if the first sort key was
+	 * abbreviated, but it's not worth adding complexity to avoid that.
+	 */
+	if (state->base.onlyKey == NULL && null_count > 1)
+	{
+		qsort_tuple(null_start,
+					null_count,
+					state->base.comparetup_tiebreak,
+					state);
+	}
+
+	/*
+	 * Sort the NOT NULL partition, using radix sort if large enough,
+	 * otherwise fall back to quicksort.
+	 */
+	if (not_null_count < QSORT_THRESHOLD)
+	{
+		qsort_tuple(not_null_start,
+					not_null_count,
+					state->base.comparetup,
+					state);
+	}
+	else
+	{
+		radix_sort_tuple(not_null_start,
+						 not_null_count,
+						 0,
+						 state);
+	}
+}
+
+/* Verify in-memory sort using standard comparator. */
+static void
+verify_memtuples_sorted(Tuplesortstate *state)
+{
+#ifdef USE_ASSERT_CHECKING
+	for (SortTuple *st = state->memtuples + 1;
+		 st < state->memtuples + state->memtupcount;
+		 st++)
+		Assert(COMPARETUP(state, st - 1, st) <= 0);
+#endif
+}
+
+/*
+ * Sort all memtuples using specialized routines.
  *
- * Quicksort is used for small in-memory sorts, and external sort runs.
+ * Quicksort or radix sort is used for small in-memory sorts,
+ * and external sort runs.
  */
 static void
 tuplesort_sort_memtuples(Tuplesortstate *state)
@@ -2665,30 +2961,22 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 	if (state->memtupcount > 1)
 	{
 		/*
-		 * Do we have the leading column's value or abbreviation in datum1,
-		 * and is there a specialization for its comparator?
+		 * Do we have the leading column's value or abbreviation in datum1?
 		 */
 		if (state->base.haveDatum1 && state->base.sortKeys)
 		{
-			if (state->base.sortKeys[0].comparator == ssup_datum_unsigned_cmp)
-			{
-				qsort_tuple_unsigned(state->memtuples,
-									 state->memtupcount,
-									 state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_signed_cmp)
+			SortSupportData ssup = state->base.sortKeys[0];
+
+			/* Does it compare as an integer? */
+			if (state->memtupcount >= QSORT_THRESHOLD &&
+				(ssup.comparator == ssup_datum_unsigned_cmp ||
+				 ssup.comparator == ssup_datum_signed_cmp ||
+				 ssup.comparator == ssup_datum_int32_cmp))
 			{
-				qsort_tuple_signed(state->memtuples,
+				sort_byvalue_datum(state->memtuples,
 								   state->memtupcount,
 								   state);
-				return;
-			}
-			else if (state->base.sortKeys[0].comparator == ssup_datum_int32_cmp)
-			{
-				qsort_tuple_int32(state->memtuples,
-								  state->memtupcount,
-								  state);
+				verify_memtuples_sorted(state);
 				return;
 			}
 		}
diff --git a/src/include/utils/sortsupport.h b/src/include/utils/sortsupport.h
index 0083756bbdb..a8f8f9f026a 100644
--- a/src/include/utils/sortsupport.h
+++ b/src/include/utils/sortsupport.h
@@ -229,107 +229,6 @@ ApplySortComparator(Datum datum1, bool isNull1,
 	return compare;
 }
 
-static inline int
-ApplyUnsignedSortComparator(Datum datum1, bool isNull1,
-							Datum datum2, bool isNull2,
-							SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = datum1 < datum2 ? -1 : datum1 > datum2 ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
-static inline int
-ApplySignedSortComparator(Datum datum1, bool isNull1,
-						  Datum datum2, bool isNull2,
-						  SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = DatumGetInt64(datum1) < DatumGetInt64(datum2) ? -1 :
-			DatumGetInt64(datum1) > DatumGetInt64(datum2) ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
-static inline int
-ApplyInt32SortComparator(Datum datum1, bool isNull1,
-						 Datum datum2, bool isNull2,
-						 SortSupport ssup)
-{
-	int			compare;
-
-	if (isNull1)
-	{
-		if (isNull2)
-			compare = 0;		/* NULL "=" NULL */
-		else if (ssup->ssup_nulls_first)
-			compare = -1;		/* NULL "<" NOT_NULL */
-		else
-			compare = 1;		/* NULL ">" NOT_NULL */
-	}
-	else if (isNull2)
-	{
-		if (ssup->ssup_nulls_first)
-			compare = 1;		/* NOT_NULL ">" NULL */
-		else
-			compare = -1;		/* NOT_NULL "<" NULL */
-	}
-	else
-	{
-		compare = DatumGetInt32(datum1) < DatumGetInt32(datum2) ? -1 :
-			DatumGetInt32(datum1) > DatumGetInt32(datum2) ? 1 : 0;
-		if (ssup->ssup_reverse)
-			INVERT_COMPARE_RESULT(compare);
-	}
-
-	return compare;
-}
-
 /*
  * Apply a sort comparator function and return a 3-way comparison using full,
  * authoritative comparator.  This takes care of handling reverse-sort and
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 5fe229e211b..da68f45acf2 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -116,6 +116,7 @@ typedef struct
 	void	   *tuple;			/* the tuple itself */
 	Datum		datum1;			/* value of first key column */
 	bool		isnull1;		/* is first key column NULL? */
+	uint8		curbyte;		/* chunk of datum1 for current radix sort pass */
 	int			srctape;		/* source tape number */
 } SortTuple;
 
diff --git a/src/test/regress/expected/tuplesort.out b/src/test/regress/expected/tuplesort.out
index 6dd97e7427a..fc1321bf443 100644
--- a/src/test/regress/expected/tuplesort.out
+++ b/src/test/regress/expected/tuplesort.out
@@ -304,9 +304,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
  20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
  10009 | 00000000-0000-0000-0000-000000010008 | 00000000-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992
  10008 | 00000000-0000-0000-0000-000000010007 | 00000000-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993
 (5 rows)
@@ -335,9 +335,9 @@ FROM abbrev_abort_uuids
 ORDER BY ctid DESC LIMIT 5;
   id   |           abort_increasing           |           abort_decreasing           |          noabort_increasing          |          noabort_decreasing          
 -------+--------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------
-     0 |                                      |                                      |                                      | 
- 20003 |                                      |                                      |                                      | 
  20002 |                                      |                                      |                                      | 
+ 20003 |                                      |                                      |                                      | 
+     0 |                                      |                                      |                                      | 
   9993 | 00000000-0000-0000-0000-000000009992 | 00000000-0000-0000-0000-000000010008 | 00009992-0000-0000-0000-000000009992 | 00010008-0000-0000-0000-000000010008
   9994 | 00000000-0000-0000-0000-000000009993 | 00000000-0000-0000-0000-000000010007 | 00009993-0000-0000-0000-000000009993 | 00010007-0000-0000-0000-000000010007
 (5 rows)
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f5ee8fd482..4e494bdc860 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -4058,6 +4058,7 @@ qsort_comparator
 query_pathkeys_callback
 radius_attribute
 radius_packet
+RadixSortInfo
 rangeTableEntry_used_context
 rank_context
 rbt_allocfunc
-- 
2.53.0



  [text/x-patch] v7-0004-WIP-Call-verify_memtuples_sorted-after-qsort-for-.patch (1.5K, 4-v7-0004-WIP-Call-verify_memtuples_sorted-after-qsort-for-.patch)
  download | inline diff:
From 872819673cfd85153cc5e3b8482321a6dfb98f32 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Tue, 10 Feb 2026 19:10:20 +0700
Subject: [PATCH v7 4/4] WIP Call verify_memtuples_sorted() after qsort for
 consistency

The regression test changes are from a user defined function that
is now called a couple more times.
---
 src/backend/utils/sort/tuplesort.c      | 1 +
 src/test/regress/expected/namespace.out | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index deb02eb5ee5..4bb7bcf7814 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3052,6 +3052,7 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 						state->base.comparetup,
 						state);
 		}
+		verify_memtuples_sorted(state);
 	}
 }
 
diff --git a/src/test/regress/expected/namespace.out b/src/test/regress/expected/namespace.out
index dbbda72d395..b78abdf21ea 100644
--- a/src/test/regress/expected/namespace.out
+++ b/src/test/regress/expected/namespace.out
@@ -149,6 +149,8 @@ NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
+NOTICE:  current search_path: pg_catalog, pg_temp
+NOTICE:  current search_path: pg_catalog, pg_temp
 REFRESH MATERIALIZED VIEW test_maint_search_path.test_maint_mv;
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
-- 
2.53.0



  [text/x-patch] v7-0003-WIP-Add-possible-message-changes.patch (1.2K, 5-v7-0003-WIP-Add-possible-message-changes.patch)
  download | inline diff:
From 7f2e95ca483ce34e26f22d9bf466ccf9f4182241 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Tue, 10 Feb 2026 20:02:19 +0700
Subject: [PATCH v7 3/4] WIP Add possible message changes

Not sure if necessary so kept out of main commit
---
 src/backend/utils/sort/tuplesort.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 7b22546a811..deb02eb5ee5 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2243,7 +2243,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	state->currentRun++;
 
 	if (trace_sort)
-		elog(LOG, "worker %d starting quicksort of run %d: %s",
+		elog(LOG, "worker %d starting internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
@@ -2254,7 +2254,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	tuplesort_sort_memtuples(state);
 
 	if (trace_sort)
-		elog(LOG, "worker %d finished quicksort of run %d: %s",
+		elog(LOG, "worker %d finished internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-11 10:25  =?ISO-8859-1?B?emVuZ21hbg==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?ISO-8859-1?B?emVuZ21hbg==?= @ 2026-02-11 10:25 UTC (permalink / raw)
  To: =?ISO-8859-1?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?ISO-8859-1?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

>I did some more self-review after a couple weeks and made some more
>minor cosmetic adjustments for v7. I will commit 0001 and 0002 in a
>few days unless there are objections.

Hi John,

I'm wondering if we should replace `state->memtuples` with `data` in the `sort_byvalue_datum()` function here.
```
	if (nulls_first)
	{
		null_start = state->memtuples;
		null_count = d1;
		not_null_start = state->memtuples + d1;
		not_null_count = d2;
	}
	else
	{
		not_null_start = state->memtuples;
		not_null_count = d1;
		null_start = state->memtuples + d1;
		null_count = d2;
	}
```
If my understanding is wrong, please ignore this.

--
regards,
Man Zeng

^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-11 11:46  =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  parent: =?ISO-8859-1?B?emVuZ21hbg==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-02-11 11:46 UTC (permalink / raw)
  To: =?utf-8?B?emVuZ21hbg==?= <[email protected]>; =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

Hi,

> I'm wondering if we should replace `state->memtuples` with `data` in the `sort_byvalue_datum()` function here.

+1. Now data == state->memtuples, how about remove the parameter "data" and "n" and just
get them from "Tuplesortstate"?

Some comments for v7:

1)

```
/*
 * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
 */
static inline uint8
current_byte(Datum key, int level)
{
	int			shift = (SIZEOF_DATUM - 1 - level) * BITS_PER_BYTE;

	return (key >> shift) & 0xFF;
}
```

Maybe "0 for MSB, 7 for LSB"? If level == 0, this function will return the Most Significant Byte.

2) radix_sort_tuple()

```
		size_t		end_offset = partitions[*rp].next_offset;
		SortTuple  *partition_end = begin + end_offset;
		ptrdiff_t	num_elements = end_offset - start_offset;
```

Why the type of "num_elements" is "ptrdiff_t"? Maybe just "size_t"?

3) tuplesort_sort_memtuples()

```
		/*
		 * Do we have the leading column's value or abbreviation in datum1?
		 */
		if (state->base.haveDatum1 && state->base.sortKeys)
		{
			SortSupportData ssup = state->base.sortKeys[0];
```

I think we should avoid the copy of SortSupportData. We can just use a pointer.

4)

Many places just consider "Datum" as integer, do we need to add a DatumGetUInt**() for them?

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-13 02:32  John Naylor <[email protected]>
  parent: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-02-13 02:32 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

On Wed, Feb 11, 2026 at 5:25 PM zengman <[email protected]> wrote:
> I'm wondering if we should replace `state->memtuples` with `data` in the `sort_byvalue_datum()` function here.
> ```
>         if (nulls_first)
>         {
>                 null_start = state->memtuples;

Sounds good.

On Wed, Feb 11, 2026 at 6:47 PM cca5507 <[email protected]> wrote:
> +1. Now data == state->memtuples, how about remove the parameter "data" and "n" and just
> get them from "Tuplesortstate"?

The parameter list currently looks like a sort function. I'm not sure
why you think that's objectionable?

> Some comments for v7:
>
> 1)
>
> ```
> /*
>  * Retrieve byte from datum, indexed by 'level': 0 for LSB, 7 for MSB
>  */
> static inline uint8
> current_byte(Datum key, int level)
> {
>         int                     shift = (SIZEOF_DATUM - 1 - level) * BITS_PER_BYTE;
>
>         return (key >> shift) & 0xFF;
> }
> ```
>
> Maybe "0 for MSB, 7 for LSB"? If level == 0, this function will return the Most Significant Byte.

Will fix, thanks.

> 2) radix_sort_tuple()
>
> ```
>                 size_t          end_offset = partitions[*rp].next_offset;
>                 SortTuple  *partition_end = begin + end_offset;
>                 ptrdiff_t       num_elements = end_offset - start_offset;
> ```
>
> Why the type of "num_elements" is "ptrdiff_t"? Maybe just "size_t"?

That's from upstream. It's pretty rare in our codebase for in-core
code, so I'll go ahead and change it.

> 3) tuplesort_sort_memtuples()
>
> ```
>                 /*
>                  * Do we have the leading column's value or abbreviation in datum1?
>                  */
>                 if (state->base.haveDatum1 && state->base.sortKeys)
>                 {
>                         SortSupportData ssup = state->base.sortKeys[0];
> ```
>
> I think we should avoid the copy of SortSupportData. We can just use a pointer.

Will do, for consistency.

> 4)
>
> Many places just consider "Datum" as integer, do we need to add a DatumGetUInt**() for them?

It already does that for the case where it specifically needs uint32.
For the general case, I don't think so. We can treat a Datum as an
unsigned integer and don't care about the actual size or what it
contains. There is one exception that I can see: 0002 has a call to
pg_leftmost_one_pos64(), so a macro for that would be more appropriate
than an implicit cast.

Now I'm wondering if 0002 should normalize the datum as it determines
the common prefix. That should only matter for int32 where the input
has a mix of positive and negative, but it could be worth the
additional calculation. I'll hold off on committing 0002 for a bit
longer, but 0001 (plus the changes that I've agreed to) is still on
schedule.

Also, looking at commit 6aebedc38, it's preferred to use
"sizeof(Datum)" rather than the vestigial SIZEOF_DATUM, so I'll change
that in 0001/2.

--
John Naylor
Amazon Web Services






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-13 03:25  =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-02-13 03:25 UTC (permalink / raw)
  To: =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?utf-8?B?emVuZ21hbg==?= <[email protected]>; =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

Hi John,

One additional comment:

```
	/* presorted check */
	for (SortTuple *st = data + 1; st < data + n; st++)
	{
		CHECK_FOR_INTERRUPTS();

		if (COMPARETUP(state, st - 1, st) > 0)
		{
			presorted = false;
			break;
		}
	}
	if (presorted)
		return;
```

I think we need to add a comment to explain why we do the
check. The cost of this check is not small.

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-02-15 08:40  John Naylor <[email protected]>
  parent: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-02-15 08:40 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

I've committed v7-0001 with the above review and a couple more
cosmetic adjustments. Most notably, I've never really liked the name
sort_byvalue_datum(). It really is the entry point to radix sort, and
it only diverts to qsort if after the NULL partitioning phase there
aren't enough non-null tuples to justify the overhead of radix sort.
So for better symmetry with qsort_tuple, I've renamed
sort_byvalue_datum() to radix_sort_tuple(), which then calls out to
now-named radix_sort_recursive(), which will call itself until it
completes or diverts. Another change down below:

On Fri, Feb 13, 2026 at 10:25 AM cca5507 <[email protected]> wrote:
> I think we need to add a comment to explain why we do the
> check. The cost of this check is not small.

(presorted check) The cost should only matter for pathological inputs,
and I haven't found it to matter much overall. I've left it
uncommented for now. However, while rebasing 0002 to deal with recent
review comments, I had an idea: In yesterday's commit, I moved the
presorted check down to just before invoking the actual radix sort.
That way, with the attached v8-0002 the common prefix detection is
done at the same time as the presorted check. That makes 0002 a
smaller patch and by doing both in the same for-loop it's easier to
read and can reduce the number of memory reads. We can consider more
commentary here, but the motivation do avoid unnecessary work should
be fairly obvious.

Next step : Test whether it's worth it for the common prefix detection
to use the normalized datum.

--
John Naylor
Amazon Web Services


Attachments:

  [text/x-patch] v8-0004-WIP-Call-verify_memtuples_sorted-after-qsort-for-.patch (1.5K, 2-v8-0004-WIP-Call-verify_memtuples_sorted-after-qsort-for-.patch)
  download | inline diff:
From 14df5854de4b73da8792c015859e8d7ecf09d4d0 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Tue, 10 Feb 2026 19:10:20 +0700
Subject: [PATCH v8 4/4] WIP Call verify_memtuples_sorted() after qsort for
 consistency

The regression test changes are from a user defined function that
is now called a couple more times.
---
 src/backend/utils/sort/tuplesort.c      | 1 +
 src/test/regress/expected/namespace.out | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 629d598d8b7..dea3347667f 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -3033,6 +3033,7 @@ tuplesort_sort_memtuples(Tuplesortstate *state)
 						state->base.comparetup,
 						state);
 		}
+		verify_memtuples_sorted(state);
 	}
 }
 
diff --git a/src/test/regress/expected/namespace.out b/src/test/regress/expected/namespace.out
index dbbda72d395..b78abdf21ea 100644
--- a/src/test/regress/expected/namespace.out
+++ b/src/test/regress/expected/namespace.out
@@ -149,6 +149,8 @@ NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
+NOTICE:  current search_path: pg_catalog, pg_temp
+NOTICE:  current search_path: pg_catalog, pg_temp
 REFRESH MATERIALIZED VIEW test_maint_search_path.test_maint_mv;
 NOTICE:  current search_path: pg_catalog, pg_temp
 NOTICE:  current search_path: pg_catalog, pg_temp
-- 
2.53.0



  [text/x-patch] v8-0003-WIP-Add-possible-message-changes.patch (1.2K, 3-v8-0003-WIP-Add-possible-message-changes.patch)
  download | inline diff:
From a7dfc1c9fa535dfceac48f8e91ad2cb516872855 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Tue, 10 Feb 2026 20:02:19 +0700
Subject: [PATCH v8 3/4] WIP Add possible message changes

Not sure if necessary so kept out of main commit
---
 src/backend/utils/sort/tuplesort.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 2203cfb725d..629d598d8b7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2243,7 +2243,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	state->currentRun++;
 
 	if (trace_sort)
-		elog(LOG, "worker %d starting quicksort of run %d: %s",
+		elog(LOG, "worker %d starting internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
@@ -2254,7 +2254,7 @@ dumptuples(Tuplesortstate *state, bool alltuples)
 	tuplesort_sort_memtuples(state);
 
 	if (trace_sort)
-		elog(LOG, "worker %d finished quicksort of run %d: %s",
+		elog(LOG, "worker %d finished internal sort of run %d: %s",
 			 state->worker, state->currentRun,
 			 pg_rusage_show(&state->ru_start));
 
-- 
2.53.0



  [text/x-patch] v8-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch (3.0K, 4-v8-0002-Detect-common-prefix-to-avoid-wasted-work-during-.patch)
  download | inline diff:
From ab8ba71b018423779289c442f05a369e373dc41d Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Sat, 14 Feb 2026 11:41:34 +0700
Subject: [PATCH v8 2/4] Detect common prefix to avoid wasted work during radix
 sort

Start radix sort at the most significant byte position that has more
than one distinct byte in the input. This skips passes where radix
sort would count the distinct bytes just to find only a single one,
in which case there is nothing further to do for that pass. This can
give a few percent speedup for integers that have some zero upper
bytes, which is common for those that didn't arrive via abbreviation.

Reviewed-by: Chengpeng Yan <[email protected]>
Discussion: https://postgr.es/m/CANWCAZYpGMDSSwAa18fOxJGXaPzVdyPsWpOkfCX32DWh3Qznzw@mail.gmail.com
---
 src/backend/utils/sort/tuplesort.c | 48 ++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 1fc440ea6ca..2203cfb725d 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2909,28 +2910,65 @@ radix_sort_tuple(SortTuple *data, size_t n, Tuplesortstate *state)
 	}
 	else
 	{
+		int			common_prefix;
+		Datum		ref_datum;
+		Datum		common_upper_bits = 0;
 		bool		presorted = true;
 
+		Assert(not_null_count > 0);
+		ref_datum = not_null_start[0].datum1;
+
+		/* compute the common prefix of all datums */
 		for (SortTuple *st = not_null_start + 1;
 			 st < not_null_start + not_null_count;
 			 st++)
 		{
-			if (COMPARETUP(state, st - 1, st) > 0)
-			{
+			Datum		this_datum = st->datum1;
+
+			/*
+			 * Accumulate bits that represent a difference from the reference
+			 * datum.
+			 */
+			common_upper_bits |= ref_datum ^ this_datum;
+
+			/* do a presorted check while we're at it */
+			if (presorted && COMPARETUP(state, st - 1, st) > 0)
 				presorted = false;
-				break;
-			}
 
 			CHECK_FOR_INTERRUPTS();
 		}
 
 		if (presorted)
 			return;
+		else if (common_upper_bits == 0)
+		{
+			/*
+			 * All NOT NULL tuples have the same datum, so we can skip radix
+			 * sort. Sort using the tiebreak comparator if necessary.
+			 */
+			if (state->base.onlyKey == NULL)
+			{
+				qsort_tuple(not_null_start,
+							not_null_count,
+							state->base.comparetup_tiebreak,
+							state);
+			}
+		}
 		else
 		{
+			int			diffpos;
+
+			/*
+			 * The upper bits of common_upper_bits are zero where all datums
+			 * have the same bits. The byte position of the leftmost one bit
+			 * is the byte where radix sort should start.
+			 */
+			diffpos = pg_leftmost_one_pos64(DatumGetUInt64(common_upper_bits));
+			common_prefix = sizeof(Datum) - 1 - (diffpos / BITS_PER_BYTE);
+
 			radix_sort_recursive(not_null_start,
 								 not_null_count,
-								 0,
+								 common_prefix,
 								 state);
 		}
 	}
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-03-30 05:57  John Naylor <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-03-30 05:57 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

I wrote:
> Next step : Test whether it's worth it for the common prefix detection
> to use the normalized datum.

This turned out to be a loser, but in the course of trying it a better
idea occurred to me. v8's prefix detection was really a special-case
optimization where the sort key is all non-negative integers (or all
negative, but that's not common). It's wasted work when the input is
mixed in sign, and for abbreviated keys. It's not much of a waste, but
we can do better.

v9 computes the common prefix during every recursion at the same time
we populate the SortTuple's current byte. That should be practically
free given a modest amount of instruction-level parallelism.

As a demo, I slightly modified the "p5" test (an adversarial case for
radix sort) from

https://www.postgresql.org/message-id/CANWCAZb0STJRAnn8fcVccQmektizNkgqBD_cg%2B0cs1%3Db4WWajQ%40mail...

...so that all bytes differ only in the least significant byte or in
the sign bit.

select setseed(0.935349);
with r (rand, i) as (select random(-100, 100), i
from generate_series(1,1_000_000,1) i )
insert into tsort
select case
 when r.rand < -95 or r.rand > 95
 then r.rand
 else 0 end
 from r;

vacuum freeze tsort ;

cat bench.sql
select * from tsort order by i offset 100000000;

pgbench -n -t 500 -f bench.sql | grep latency
(on pre-warmed buffers with work_mem 64MB)

master:
latency average = 99.663 ms
v8:
latency average = 102.040 ms
v9:
latency average = 84.856 ms

Here we can see that common prefix detection is useless in v8. It adds
work, but it's not worse than noise in this benchmark, so could be
disregarded. In v9 we proceed directly to performing a radix sort pass
on the most significant byte, which effectively partitions into signed
and unsigned. The next recursion step detects that all bytes are the
same and simultaneously calculates how far to skip so that the next
recursion is productive.

For the case of all non-negative integers:

select setseed(0.935349);
insert into tsort
select random(0, 100)
from generate_series(1,1_000_000,1) i;
vacuum freeze tsort ;

...in v9 the first pass computes the current byte from the normalized
datum and stores it, but this pass is unproductive. There was a risk
that this is more expensive than v8's up-front common prefix detection
because of memory writes. I only see noise-level differences:

master:
latency average = 98.725 ms
v8:
latency average = 75.572 ms
v9:
latency average = 76.635 ms

I intend to commit v9-0001 this week.

I've decided not to commit v8-0003 (message changes) from my last
message since it doesn't really add any value IMO. That can always be
revisited. I'm on the fence about v8-0004 (assert the correct order
also for qsort), but it seems good for consistency. I'd like to at
least continue asserting this for radix sort since it's new this
cycle. To save CI cycles etc, maybe we can eventually hide the check
behind a debug symbol.

-- 
John Naylor
Amazon Web Services


Attachments:

  [text/x-patch] v9-0001-Detect-common-prefixes-during-radix-sort.patch (3.8K, 2-v9-0001-Detect-common-prefixes-during-radix-sort.patch)
  download | inline diff:
From 6e94b0a260bc81248375c7d45568a52f17274bf9 Mon Sep 17 00:00:00 2001
From: John Naylor <[email protected]>
Date: Mon, 30 Mar 2026 12:26:58 +0700
Subject: [PATCH v9] Detect common prefixes during radix sort

During the counting step, keep track of the bits that are the same
for the entire input.  If we counted only a single distinct byte,
the next recursion will start at the next byte position that has
more than one distinct byte in the input. This allows us to skip over
multiple passes where the byte is the same for the entire input.

This provides a significant speedup for integers that have some upper
bytes with all-zeros or all-ones, which is common.

Reviewed-by: Chengpeng Yan <[email protected]>
Discussion: https://postgr.es/m/CANWCAZYpGMDSSwAa18fOxJGXaPzVdyPsWpOkfCX32DWh3Qznzw@mail.gmail.com
---
 src/backend/utils/sort/tuplesort.c | 44 +++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 1fc440ea6ca..72c2c2995d8 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -104,6 +104,7 @@
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
+#include "port/pg_bitutils.h"
 #include "storage/shmem.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2659,17 +2660,25 @@ radix_sort_recursive(SortTuple *begin, size_t n_elems, int level, Tuplesortstate
 	int			num_partitions = 0;
 	int			num_remaining;
 	SortSupport ssup = &state->base.sortKeys[0];
+	Datum		ref_datum;
+	Datum		common_upper_bits = 0;
 	size_t		start_offset = 0;
 	SortTuple  *partition_begin = begin;
+	int			next_level;
 
 	/* count number of occurrences of each byte */
+	ref_datum = normalize_datum(begin[0].datum1, ssup);
 	for (SortTuple *st = begin; st < begin + n_elems; st++)
 	{
+		Datum		this_datum;
 		uint8		this_partition;
 
+		this_datum = normalize_datum(st->datum1, ssup);
+		/* accumulate bits different from the reference datum */
+		common_upper_bits |= ref_datum ^ this_datum;
+
 		/* extract the byte for this level from the normalized datum */
-		this_partition = current_byte(normalize_datum(st->datum1, ssup),
-									  level);
+		this_partition = current_byte(this_datum, level);
 
 		/* save it for the permutation step */
 		st->curbyte = this_partition;
@@ -2747,6 +2756,33 @@ radix_sort_recursive(SortTuple *begin, size_t n_elems, int level, Tuplesortstate
 	}
 
 	/* recurse */
+
+	if (num_partitions == 1)
+	{
+		/*
+		 * There is only one distinct byte at the current level. It can happen
+		 * that some subsequent bytes are also the same for all input values,
+		 * such as the upper bytes of small integers. To skip unproductive
+		 * passes for that case, we compute the level where the input has more
+		 * than one distinct byte, so that the next recursion can start there.
+		 */
+		if (common_upper_bits == 0)
+			next_level = sizeof(Datum);
+		else
+		{
+			int			diffpos;
+
+			/*
+			 * The upper bits of common_upper_bits are zero where all datums
+			 * have the same bits.
+			 */
+			diffpos = pg_leftmost_one_pos64(DatumGetUInt64(common_upper_bits));
+			next_level = sizeof(Datum) - 1 - (diffpos / BITS_PER_BYTE);
+		}
+	}
+	else
+		next_level = level + 1;
+
 	for (uint8 *rp = remaining_partitions;
 		 rp < remaining_partitions + num_partitions;
 		 rp++)
@@ -2757,7 +2793,7 @@ radix_sort_recursive(SortTuple *begin, size_t n_elems, int level, Tuplesortstate
 
 		if (num_elements > 1)
 		{
-			if (level < sizeof(Datum) - 1)
+			if (next_level < sizeof(Datum))
 			{
 				if (num_elements < QSORT_THRESHOLD)
 				{
@@ -2770,7 +2806,7 @@ radix_sort_recursive(SortTuple *begin, size_t n_elems, int level, Tuplesortstate
 				{
 					radix_sort_recursive(partition_begin,
 										 num_elements,
-										 level + 1,
+										 next_level,
 										 state);
 				}
 			}
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-03-31 04:36  =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-03-31 04:36 UTC (permalink / raw)
  To: =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?utf-8?B?emVuZ21hbg==?= <[email protected]>; =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

Hi John,

> This turned out to be a loser, but in the course of trying it a better
> idea occurred to me. v8's prefix detection was really a special-case
> optimization where the sort key is all non-negative integers (or all
> negative, but that's not common). It's wasted work when the input is
> mixed in sign, and for abbreviated keys. It's not much of a waste, but
> we can do better.
> 
> v9 computes the common prefix during every recursion at the same time
> we populate the SortTuple's current byte. That should be practically
> free given a modest amount of instruction-level parallelism.

+1 and v9-0001 LGTM.

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-04-01 07:20  John Naylor <[email protected]>
  parent: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-04-01 07:20 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

On Tue, Mar 31, 2026 at 11:36 AM cca5507 <[email protected]> wrote:
> > v9 computes the common prefix during every recursion at the same time
> > we populate the SortTuple's current byte. That should be practically
> > free given a modest amount of instruction-level parallelism.
>
> +1 and v9-0001 LGTM.

Pushed, thanks for looking!

-- 
John Naylor
Amazon Web Services





^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-04-01 09:27  =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-04-01 09:27 UTC (permalink / raw)
  To: =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?utf-8?B?emVuZ21hbg==?= <[email protected]>; =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

Hi John,

How about adding an assertion here:

```
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 72c2c2995d8..bcc534136d7 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -2783,6 +2783,8 @@ radix_sort_recursive(SortTuple *begin, size_t n_elems, int level, Tuplesortstate
        else
                next_level = level + 1;
 
+       Assert(next_level > level);
+
        for (uint8 *rp = remaining_partitions;
                 rp < remaining_partitions + num_partitions;
                 rp++)
```

It can help us catch some unexpected behaviors.

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-04-08 07:17  John Naylor <[email protected]>
  parent: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: John Naylor @ 2026-04-08 07:17 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

On Wed, Apr 1, 2026 at 4:27 PM cca5507 <[email protected]> wrote:
>
> Hi John,
>
> How about adding an assertion here:

> +       Assert(next_level > level);
> +

Good idea.

I also thought we should change this cast:

        if (ssup->comparator == ssup_datum_signed_cmp)
        {
-               norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
+               norm_datum1 = orig + (Int64GetDatum(PG_INT64_MAX)) + 1;
        }

Upthread you mention something else about treating Datum as an
integer, but I'm not sure if this is what you meant since you didn't
say. If you have concrete suggestions, feel free to share them.

--
John Naylor
Amazon Web Services





^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-04-08 08:20  =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  parent: John Naylor <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-04-08 08:20 UTC (permalink / raw)
  To: =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>; +Cc: =?utf-8?B?emVuZ21hbg==?= <[email protected]>; =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>

> I also thought we should change this cast:
> 
>         if (ssup->comparator == ssup_datum_signed_cmp)
>         {
> -               norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
> +               norm_datum1 = orig + (Int64GetDatum(PG_INT64_MAX)) + 1;
>         }
> 
> Upthread you mention something else about treating Datum as an
> integer, but I'm not sure if this is what you meant since you didn't
> say. If you have concrete suggestions, feel free to share them.

I think we can keep it as is now. The idea of don't treat datum as an integer comes from:

https://www.postgresql.org/message-id/flat/8246d7ff-f4b7-4363-913e-827dadfeb145%40eisentraut.org

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: tuple radix sort
@ 2026-04-29 09:24  John Naylor <[email protected]>
  parent: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
  0 siblings, 0 replies; 15+ messages in thread

From: John Naylor @ 2026-04-29 09:24 UTC (permalink / raw)
  To: cca5507 <[email protected]>; +Cc: zengman <[email protected]>; pgsql-hackers <[email protected]>

On Wed, Apr 8, 2026 at 3:20 PM cca5507 <[email protected]> wrote:
>
> > I also thought we should change this cast:
> >
> >         if (ssup->comparator == ssup_datum_signed_cmp)
> >         {
> > -               norm_datum1 = orig + ((uint64) PG_INT64_MAX) + 1;
> > +               norm_datum1 = orig + (Int64GetDatum(PG_INT64_MAX)) + 1;
> >         }
> >
> > Upthread you mention something else about treating Datum as an
> > integer, but I'm not sure if this is what you meant since you didn't
> > say. If you have concrete suggestions, feel free to share them.
>
> I think we can keep it as is now.

It's harmless, but see commit ff89e182d.

I included the assert along with the above and some small cosmetic
fixes I've been saving.

--
John Naylor
Amazon Web Services





^ permalink  raw  reply  [nested|flat] 15+ messages in thread


end of thread, other threads:[~2026-04-29 09:24 UTC | newest]

Thread overview: 15+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-21 06:40 Re: tuple radix sort John Naylor <[email protected]>
2026-01-21 08:12 ` =?gb18030?B?emVuZ21hbg==?= <[email protected]>
2026-02-10 13:37   ` John Naylor <[email protected]>
2026-02-11 10:25     ` =?ISO-8859-1?B?emVuZ21hbg==?= <[email protected]>
2026-02-11 11:46       ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-02-13 02:32         ` John Naylor <[email protected]>
2026-02-13 03:25           ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-02-15 08:40             ` John Naylor <[email protected]>
2026-03-30 05:57               ` John Naylor <[email protected]>
2026-03-31 04:36                 ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-04-01 07:20                   ` John Naylor <[email protected]>
2026-04-01 09:27                     ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-04-08 07:17                       ` John Naylor <[email protected]>
2026-04-08 08:20                         ` =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
2026-04-29 09:24                           ` John Naylor <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox