public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: John Naylor <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Subject: Re: tuple radix sort
Date: Thu, 27 Nov 2025 15:48:58 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>
References: <CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com>
	<CANWCAZbsx5VY-F5RVvZ5H1abBWcinMY90gNra6EbdGS1CTx=7g@mail.gmail.com>
	<CANWCAZaN8kW6o7Ymc_jzPO_Z5SqekQTTNra3xTGYTH+cjrVp8g@mail.gmail.com>
	<[email protected]>
	<CANWCAZZiMGj6QuHfyBPv9at7xn23NPAz4nit=G6fr26V+h8MKg@mail.gmail.com>

Hi John,

I did an initial test before, but I didn’t read the code at the time. Today, I spent time reviewing 0001. Overall, I believe the implementation is solid. I just got a few comments/suggestions.

> On Nov 26, 2025, at 21:11, John Naylor <[email protected]> wrote:
> 
> 
> For v5 I've also added CHECK_FOR_INTERRUPTS and rewrote some comments.
> 
> 
> --
> John Naylor
> Amazon Web Services
> <v5-0001-Use-radix-sort-when-SortTuple-contains-a-pass-by-.patch><v5-0004-Detect-common-prefix-to-avoid-wasted-work-during-.patch><v5-0003-WIP-make-some-regression-tests-sort-order-more-de.patch><v5-0002-WIP-Adjust-regression-tests.patch>


1 - 0001
```
+		/* extract the byte for this level from the normalized datum */
+		current_byte = extract_byte(normalize_datum(tup->datum1, ssup),
+									level);
+
+		/* save it for the permutation step */
+		tup->current_byte = current_byte;
```

We recompute normalize_datum(tup->datum1, ssup) for every tuple in every level, why don’t cache the result in SortTuple. As we have cached current_byte in SortTuple, it shouldn’t be a big deal to add one more field to it.

2 - 0001
```
+	while (num_remaining > 1)
+	{
+		/* start the count over */
+		num_remaining = num_partitions;
```

The swap loop always start the count over, so that sorted partitions are re-scanned as well. I think we can do an optimization like:

```
num_active = num_partitions;
while (num_active > 1)
{
    for (int i = 0; i < num_active; )
    {
        uint8 idx = remaining_partitions[i];
        // do the swaps for the partition …

        if (partitions[idx].offset == partitions[idx].next_offset)
        {
            remaining_partitions[i] = remaining_partitions[num_active - 1];
            num_active--;
        }
        else
            i++;
    }
}
```

This way we move out sorted partitions, so they will not be re-scanned.

3 - 0001

In sort_byvalue_datum, we can add a fast-path for all NULL and all NOT NULL cases, so that they won’t need to run the branchless cyclic permutation and the two “for” loops of assertions. Something like:

```
while (d1 < state->memtupcount && data[d1].isnull1 == nulls_first)
    d1++;

null_count = d1;
not_null_count = state->memtupcount - d1;

/* fast paths: all on one side */
if (null_count == 0 || not_null_count == 0)
{
    if (nulls_first)
    {
        null_start = data;
        not_null_start = data + null_count;
    }
    else
    {
        not_null_start = data;
        null_start = data + not_null_count;
    }

    /* only one partition is non-empty; sort it and return */
    if (not_null_count > 1)
    {
        if (not_null_count < QSORT_THRESHOLD)
            qsort_tuple(not_null_start, not_null_count, state->base.comparetup, state);
        else
            radix_sort_tuple(not_null_start, not_null_count, 0, state);
    }
    else if (null_count > 1 && state->base.onlyKey == NULL)
    {
        qsort_tuple(null_start, null_count, state->base.comparetup_tiebreak, state);
    }
    return;
}

/* ... existing branchless cyclic permutation ... */

```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









view thread (39+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: tuple radix sort
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox