RFR: 8266431: Dual-Pivot Quicksort improvements (Radix sort) [v2]
Vladimir Yaroslavskiy
duke at openjdk.org
Fri Sep 26 22:55:13 UTC 2025
On Fri, 26 Sep 2025 22:39:29 GMT, Vladimir Yaroslavskiy <duke at openjdk.org> wrote:
>> The goal of the PR is to improve both, sequential and parallel, sorting of primitives.
>>
>> **The main achievements**
>>
>> - introduced Radix in parallel sorting which shows several times boost of performance and has linear complexity instead of n*ln(n)
>> - improved mixed insertion sort (makes whole sorting faster)
>> - improved merging sort for almost sorted data
>> - optimized parallel sorting
>> - improved step for pivot candidates and pivot partitioning
>> - suggested better buffer allocation: if no memory, it is switched to in-place sorting with no OutOfMemoryError
>> - minor javadoc and comment changes
>>
>> - extended existing tests
>> - added tests for radix sort, heap sort, insertion sort
>> - added benchmarking JMH tests
>> - improved test coverage
>>
>> **The summary of benchmarking:**
>>
>> **Sequential sorting (Arrays.sort)**
>>
>> byte: up to 50% faster
>> char: 4-7 times faster
>> short: 2-6 times faster
>> int: 1.2-5 times faster
>> long: 1.2-5 times faster
>> float: 1.2-5 times faster
>> double: 1.2-4 times faster
>>
>> **Parallel sorting (Arrays.parallelSort)**
>>
>> int: 1.2-9 times faster
>> long: 1.2-9 times faster
>> float: 1.2-4 times faster
>> double: 1.2-4 times faster
>>
>> **AVX512 support**
>>
>> Vamsi Parasa suggested faster sort routines by taking advantage of AVX512 instructions, see https://github.com/openjdk/jdk/pull/14227, sources of sorting were modified. Therefore, I performed benchmarking of the final version (which includes optimizations by Vamsi Parasa and optimizations from my side) on a server with CPUs supported AVX512 instructions, no regression of performance was found, see detailed benchmarking results.
>>
>> **High-level chart showing how the actual sorting algorithm is selected
>> based on parallel/sequential and input size**
>>
>> **int/long/float/double**
>>
>> if size < MAX_INSERTION_SORT_SIZE(=51) => [ mixed ] insertion sort
>> if array is almost sorted and size > MIN_MERGING_SORT_SIZE(=512) => merging sort
>> if recursion depth > MAX_RECURSION_DEPTH(=64) => heap sort
>> if random data => two pivots quicksort, else => one pivot quicksort
>>
>> **byte**
>>
>> if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
>> else => counting sort
>>
>> **char/short**
>>
>> if size > MIN_COUNTING_SORT_SIZE(640) => counting sort
>> if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
>> if recursion depth > MAX_RECURSION_DEPTH(=64) => counting sort
>> if random data => two pivots quicksort, else => one pivot quicksort
>>
>> **parallel sorting (int/lo...
>
> Vladimir Yaroslavskiy has updated the pull request incrementally with one additional commit since the last revision:
>
> JDK-8266431: Dual-Pivot Quicksort improvements
>
> * Moved Radix sort out from sorting
I moved Radix sort out from sources and re-run benchmarking of parallel sorting.
We can see that new parallel sorting is still faster than existing version in JDK,
no degradation.
**Parallel sorting (without Radix sort)**
Benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup
-- | -- | -- | -- | -- | --
int | RANDOM | 600 | 12,852 | 8,507 | **1.51**
int | RANDOM | 3000 | 168,764 | 152,866 | **1.10**
int | RANDOM | 40000 | 987,272 | 647,195 | **1.53**
int | RANDOM | 800000 | 19120,539 | 13251,660 | **1.44**
int | RANDOM | 5000000 | 133598,983 | 90992,272 | **1.47**
int | SHUFFLE | 600 | 7,165 | 5,136 | **1.40**
int | SHUFFLE | 3000 | 56,363 | 39,100 | **1.44**
int | SHUFFLE | 40000 | 454,965 | 233,097 | **1.95**
int | SHUFFLE | 800000 | 7087,698 | 3153,941 | **2.25**
int | SHUFFLE | 5000000 | 32361,302 | 22618,240 | **1.43**
int | STAGGER | 600 | 12,499 | 2,351 | **5.32**
int | STAGGER | 3000 | 18,045 | 11,093 | **1.63**
int | STAGGER | 40000 | 177,705 | 91,101 | **1.95**
int | STAGGER | 800000 | 3735,779 | 1078,546 | **3.46**
int | STAGGER | 5000000 | 25432,573 | 9705,696 | **2.62**
int | REPEATED | 600 | 1,835 | 1,696 | **1.08**
int | REPEATED | 3000 | 19,594 | 16,612 | **1.18**
int | REPEATED | 40000 | 433,101 | 181,605 | **2.38**
int | REPEATED | 800000 | 8680,406 | 2206,406 | **3.93**
int | REPEATED | 5000000 | 46452,021 | 16940,200 | **2.74**
-- | -- | -- | -- | -- | --
long | RANDOM | 600 | 12,967 | 8,352 | **1.55**
long | RANDOM | 3000 | 170,926 | 150,999 | **1.13**
long | RANDOM | 40000 | 1048,627 | 666,233 | **1.57**
long | RANDOM | 800000 | 19057,260 | 13713,901 | **1.39**
long | RANDOM | 5000000 | 140089,309 | 93820,320 | **1.49**
long | SHUFFLE | 600 | 7,284 | 5,401 | **1.35**
long | SHUFFLE | 3000 | 54,013 | 39,516 | **1.37**
long | SHUFFLE | 40000 | 454,804 | 255,803 | **1.78**
long | SHUFFLE | 800000 | 7222,403 | 3670,639 | **1.97**
long | SHUFFLE | 5000000 | 33815,535 | 31767,738 | **1.06**
long | STAGGER | 600 | 12,821 | 2,311 | **5.55**
long | STAGGER | 3000 | 19,027 | 11,073 | **1.72**
long | STAGGER | 40000 | 188,118 | 111,978 | **1.68**
long | STAGGER | 800000 | 4763,922 | 1630,149 | **2.92**
long | STAGGER | 5000000 | 32556,841 | 19882,107 | **1.64**
long | REPEATED | 600 | 1,961 | 1,770 | **1.11**
long | REPEATED | 3000 | 19,678 | 9,766 | **2.01**
long | REPEATED | 40000 | 439,893 | 201,826 | **2.18**
long | REPEATED | 800000 | 8725,427 | 2817,883 | **3.10**
long | REPEATED | 5000000 | 47688,415 | 26917,131 | **1.77**
-- | -- | -- | -- | -- | --
float | RANDOM | 600 | 13,048 | 10,256 | **1.27**
float | RANDOM | 3000 | 175,047 | 167,967 | **1.04**
float | RANDOM | 40000 | 1089,859 | 740,659 | **1.47**
float | RANDOM | 800000 | 19919,657 | 15086,025 | **1.32**
float | RANDOM | 5000000 | 143704,590 | 105425,171 | **1.36**
float | SHUFFLE | 600 | 7,526 | 6,214 | **1.21**
float | SHUFFLE | 3000 | 54,785 | 49,150 | **1.11**
float | SHUFFLE | 40000 | 475,588 | 289,907 | **1.64**
float | SHUFFLE | 800000 | 7221,754 | 4124,313 | **1.75**
float | SHUFFLE | 5000000 | 35369,094 | 28281,396 | **1.25**
float | STAGGER | 600 | 13,956 | 2,936 | **4.75**
float | STAGGER | 3000 | 23,035 | 13,192 | **1.75**
float | STAGGER | 40000 | 230,380 | 133,386 | **1.73**
float | STAGGER | 800000 | 4614,573 | 1736,213 | **2.66**
float | STAGGER | 5000000 | 34053,504 | 14692,088 | **2.32**
float | REPEATED | 600 | 2,910 | 2,554 | **1.14**
float | REPEATED | 3000 | 40,965 | 16,641 | **2.46**
float | REPEATED | 40000 | 654,329 | 237,761 | **2.75**
float | REPEATED | 800000 | 13094,558 | 3234,352 | **4.05**
float | REPEATED | 5000000 | 71522,839 | 23208,431 | **3.08**
-- | -- | -- | -- | -- | --
double | RANDOM | 600 | 13,103 | 9,628 | **1.36**
double | RANDOM | 3000 | 174,936 | 168,750 | **1.04**
double | RANDOM | 40000 | 1094,699 | 780,827 | **1.40**
double | RANDOM | 800000 | 20205,605 | 15304,877 | **1.32**
double | RANDOM | 5000000 | 147620,455 | 107560,209 | **1.37**
double | SHUFFLE | 600 | 7,359 | 6,191 | **1.19**
double | SHUFFLE | 3000 | 55,208 | 48,141 | **1.15**
double | SHUFFLE | 40000 | 485,528 | 311,081 | **1.56**
double | SHUFFLE | 800000 | 7525,265 | 4723,622 | **1.59**
double | SHUFFLE | 5000000 | 38030,224 | 38159,739 | **1.00**
double | STAGGER | 600 | 14,019 | 3,120 | **4.49**
double | STAGGER | 3000 | 24,778 | 14,510 | **1.71**
double | STAGGER | 40000 | 241,340 | 151,728 | **1.59**
double | STAGGER | 800000 | 5367,318 | 2353,046 | **2.28**
double | STAGGER | 5000000 | 38446,590 | 25241,014 | **1.52**
double | REPEATED | 600 | 2,986 | 2,586 | **1.15**
double | REPEATED | 3000 | 32,133 | 16,018 | **2.01**
double | REPEATED | 40000 | 659,864 | 256,873 | **2.57**
double | REPEATED | 800000 | 13045,124 | 3823,682 | **3.41**
double | REPEATED | 5000000 | 75177,194 | 33156,316 | **2.27**
-------------
PR Comment: https://git.openjdk.org/jdk/pull/27411#issuecomment-3340719768
More information about the core-libs-dev
mailing list