Re: tuple radix sort - David Geier

public inbox for [email protected]  
help / color / mirror / Atom feed

From: David Geier <[email protected]>
To: John Naylor <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Peter Geoghegan <[email protected]>
Subject: Re: tuple radix sort
Date: Fri, 14 Nov 2025 19:04:57 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANWCAZbwX5wb+pCU8hTQYonDAqBhQt5ZH7EP8cQL7O107fYoKg@mail.gmail.com>
References: <CANWCAZYzx7a7E9AY16Jt_U3+GVKDADfgApZ-42SYNiig8dTnFA@mail.gmail.com>
	<[email protected]>
	<CANWCAZbwX5wb+pCU8hTQYonDAqBhQt5ZH7EP8cQL7O107fYoKg@mail.gmail.com>

Hi John!

On 13.11.2025 05:01, John Naylor wrote:
> If that's the case then I suggest first seeing if dfd8e6c73ee made
> things any worse. A simpler possible improvement is to use a similar
> normalization step for the chars, if needed, then do the sort and
> quinique with a specialization for unsigned chars. (We don't yet
> specialize qunique, but that can be remedied). If you're interested,
> please start a separate thread for that.

It did but only a bit. I worked around it by having two sort
specializations, one for signed and one for unsigned. I also wanted to
try to use a hash table to filter out duplicates and then only sort the
remaining unique trigams, which are, most of the times, a lot less.

Generally speaking, the GIN code is death by a thousand cuts. I've got a
patch coming up that cuts CREATE INDEX runtime in half for columns with
relatively short strings and yields even better results for columns with
longer strings. But that's not only changing the sort but requires a few
 changes in a couple of places. More details in the upcoming thread.

I thought qunique() is already pretty optimal because it's defined in a
header file. I believe that even the comparator gets inlined. What would
be useful though is if qunique() used an equality comparator which only
returns true/false instead of a sort comparator. In the GIN code this
also shaved off a few percent. I'll take a closer look at qunique() at
open a thread with the findings / ideas for changes.

Anyways. In this context GIN was just one example for where a generic
radix sort would be useful and there are certainly more.

> 
> That's moving the goalposts too far IMO. I want to get to a place
> where I feel comfortable with the decisions made, and that already
> requires a lot of testing. Also, I don't want to risk introducing
> abstractions that make future improvements to tuplesort more
> cumbersome.

On a quick glance it looks like you didn't specialize much. So the
testing seems related to if the new algo introduces regressions, not if
the abstraction would cause problems. So it should be possible to
extract out the code fairly easily without invalidating your existing
benchmark results.

I understand that you want to make progress with the use case at hand
but I feel like we're missing out on a lot of opportunity where the
introduced code would also be very beneficial. Beyond that we could
nicely test the new sort code in the spirit of test_rbtree.c and
friends. Maybe you want to give it a 2nd thought.

--
David Geier

view thread (39+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: tuple radix sort
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox