RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]

Fri Sep 8 14:08:39 UTC 2023

On Fri, 1 Sep 2023 06:12:57 GMT, Alan Bateman <alanb at openjdk.org> wrote:

>> Hi team,
>> @AlanBateman, @rose00, @mbreinhold 
>> 
>> There are a big efforts now to improve sorting with x86_64 AVX512
>> https://github.com/openjdk/jdk/pull/14227, no changes of
>> Dual-Pivot Quicksort logic.
>> 
>> But at the same time this PR offers algorithmic improvements,
>> like optimized insertion sort, merging sort, better partitioning
>> and Radix sort for fully random data. Radix sort has linear complexity
>> instead of n*ln(n), much faster!. Also we handle properly situation if
>> no enough memory for additional buffer, now we will face with OOM.
>> 
>> Alan, you mentioned that DualPivotQuicksort will need detailed review.
>> Can we go ahead and start reviewing? Laurent checked performance,
>> JMH results look fine.
>> 
>> I think it would be logical to integrate new DualPivotQuicksort first,
>> and then apply AVX512 changes.
>
>> Alan, you mentioned that DualPivotQuicksort will need detailed review. Can we go ahead and start reviewing? Laurent checked performance, JMH results look fine.
> 
> As before, I think the main question with this change is whether adding radix sort to the mix is worth the complexity and additional code to maintain. Also as we discussed in the previous PR, the additional memory needed for the radix sort may have an effect on other things that are going on concurrently. I know it has been updated to handle OOME but I think potential reviewers would need to be comfortable with that part.

Hi Alan (@AlanBateman) and team,

I understand your doubts about Radix sort, let's me to provide my thoughts.

Code of Radix sort is about 10% of all code and very clear for understanding: scan digits + then place elements according histogram. Other sorting algorithms have more complexity (like merging sort, quicksort partitioning, heap sort). Therefore, I don't expect problems with maintaining.

Why Radix sort is important? Even Quicksort is very fast, but it relies on comparability of elements only and knows nothing about nature of elements to be sorted. Radix sort knows the structure of elements (32 or 64 bits) and therefore works faster (on random data). I was asked many times why we don't use Radix sort, so the idea is not new. Introducing of Radix sort is like using of counting sort for byte/short/char, the same approach.

Existing version of sort() will request additional memory, if tryMergingSort() detects already sorted subsequences. In this case Radix sort or Quicksort will not be invoked at all. Radix sort is invoked, if merging sort is not completed (and therefore, no allocated memory). Additional memory is requested by either merging sort or Radix sort, not by both algorithms on the same input. It means these algorithms never request double memory. Memory effect (merging sort and Radix sort) is not summarized. In case of parallel sorting the additional buffer allocated for parallel merge sort will be reused by tryMergingSort() as well as by Radix sort.

**Summary:** the size of total requested memory always equals to the size of part to be sorted (not more), no any duplicates.

Need to say that Radix sort is invoked not on all inputs, many inputs are covered by Quicksort or merging sort. If Quicksort suggests O(n*ln(n)), Radix sort shows linear, much better, O(n) time! As it was mentioned before, we added check against OOM, sorting will be completed successfully in any case. Existing implementation will fail with error, if there is no enough memory.

**In few words,** Radix sort is simple and it impacts not more than already introduced algorithms in JDK.

I encourage reviewers to look at this PR.

Thank you,
Vladimir

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1711730647