RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]
Vladimir Yaroslavskiy
duke at openjdk.org
Sun Jan 28 22:26:40 UTC 2024
On Fri, 26 Jan 2024 17:19:25 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> Hello Vamsi (@vamsi-parasa),
>>
>> Could you please run the benchmarking of new DQPS in your environment with AVX?
>>
>> Take all classes below and put them in the package org.openjdk.bench.java.util.
>> ArraysSort class contains all tests for the new versions and ready to use.
>> (it will run all tests in one execution).
>>
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
>>
>> Many thanks,
>> Vladimir
>
> Hi Vladimir (@iaroslavski),
>
> Please see the JMH data below.
>
> Thanks,
> Vamsi
>
> Benchmark (builder) (size) Mode Cnt Score Error Units
> ArraysSort.Int.a15 RANDOM 600 avgt 4 7.096 ± 0.081 us/op
> ArraysSort.Int.a15 RANDOM 2000 avgt 4 44.014 ± 1.717 us/op
> ArraysSort.Int.a15 RANDOM 90000 avgt 4 4451.444 ± 71.168 us/op
> ArraysSort.Int.a15 RANDOM 400000 avgt 4 22751.966 ± 683.597 us/op
> ArraysSort.Int.a15 RANDOM 3000000 avgt 4 190326.306 ± 8008.512 us/op
> ArraysSort.Int.a15 REPEATED 600 avgt 4 1.044 ± 0.016 us/op
> ArraysSort.Int.a15 REPEATED 2000 avgt 4 2.272 ± 0.287 us/op
> ArraysSort.Int.a15 REPEATED 90000 avgt 4 412.331 ± 11.656 us/op
> ArraysSort.Int.a15 REPEATED 400000 avgt 4 1908.978 ± 30.241 us/op
> ArraysSort.Int.a15 REPEATED 3000000 avgt 4 15163.443 ± 100.425 us/op
> ArraysSort.Int.a15 STAGGER 600 avgt 4 1.055 ± 0.057 us/op
> ArraysSort.Int.a15 STAGGER 2000 avgt 4 3.408 ± 0.096 us/op
> ArraysSort.Int.a15 STAGGER 90000 avgt 4 149.220 ± 4.022 us/op
> ArraysSort.Int.a15 STAGGER 400000 avgt 4 663.096 ± 30.252 us/op
> ArraysSort.Int.a15 STAGGER 3000000 avgt 4 5206.890 ± 234.857 us/op
> ArraysSort.Int.a15 SHUFFLE 600 avgt 4 4.611 ± 0.118 us/op
> ArraysSort.Int.a15 SHUFFLE 2000 avgt 4 17.955 ± 0.356 us/op
> ArraysSort.Int.a15 SHUFFLE 90000 avgt 4 1410.357 ± 41.128 us/op
> ArraysSort.Int.a15 SHUFFLE 400000 avgt 4 5739.311 ± 128.270 us/op
> ArraysSort.Int.a15 SHUFFLE 3000000 avgt 4 41501.980 ± 829.443 us/op
> ArraysSort.Int.jdk RANDOM 600 avgt 4 1.612 ± 0.088 us/op
> ArraysSort.Int.jdk RANDOM 2000 avgt 4 6.893 ± 0.375 us/op
> ArraysSort.Int.jdk RANDOM 90000 avgt 4 522.749 ± 19.386 us/op
> ArraysSort.Int.jdk RANDOM 400000 avgt 4 2424.204 ± 63.844 us/op
> ArraysSort.Int.jdk RANDOM 3000000 avgt 4 21000.434 ± 801.315 us/op
> ArraysSort.Int.jdk REPEATED 600 avgt 4 0.496 ± 0.030 us/op
> ArraysSort.Int.jdk REPEATED 2000 avgt 4 1.037 ± 0.083 us/op
> ArraysSort.Int.jdk REPEATED 90000 avgt 4 57.763 ± 1.955 us/op
> ArraysSort.Int.jd...
Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl),
The latest benchmarking compares compares the following versions:
jdk - direct call of Arrays.sort();
a15 - the current source of DualPivotQuicksort from the latest build (except renaming)
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java
https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
r20s - new version without Radix sort
r20p - new version with Radix sort in parallel case only
It is expected that timing of jdk and a15 should be more or less the same, but please look at the results:
Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source (a15)
-- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM | 600 | 1.612 | 7.096
ArraysSort.Int.testSort | RANDOM | 2000 | 6.893 | 44.014
ArraysSort.Int.testSort | RANDOM | 90000 | 522.749 | 4451.444
ArraysSort.Int.testSort | RANDOM | 400000 | 2424.204 | 22751.966
ArraysSort.Int.testSort | RANDOM | 3000000 | 21000.434 | 190326.306
ArraysSort.Int.testSort | REPEATED | 600 | 0.496 | 1.044
ArraysSort.Int.testSort | REPEATED | 2000 | 1.037 | 2.272
ArraysSort.Int.testSort | REPEATED | 90000 | 57.763 | 412.331
ArraysSort.Int.testSort | REPEATED | 400000 | 182.238 | 1908.978
ArraysSort.Int.testSort | REPEATED | 3000000 | 1708.082 | 15163.443
ArraysSort.Int.testSort | STAGGER | 600 | 1.038 | 1.055
ArraysSort.Int.testSort | STAGGER | 2000 | 3.434 | 3.408
ArraysSort.Int.testSort | STAGGER | 90000 | 148.638 | 149.220
ArraysSort.Int.testSort | STAGGER | 400000 | 663.076 | 663.096
ArraysSort.Int.testSort | STAGGER | 3000000 | 5212.821 | 5206.890
ArraysSort.Int.testSort | SHUFFLE | 600 | 1.926 | 4.611
ArraysSort.Int.testSort | SHUFFLE | 2000 | 6.858 | 17.955
ArraysSort.Int.testSort | SHUFFLE | 90000 | 473.441 | 1410.357
ArraysSort.Int.testSort | SHUFFLE | 400000 | 2153.779 | 5739.311
ArraysSort.Int.testSort | SHUFFLE | 3000000 | 18180.141 | 41501.980
You can see that a15 (current source) works extremly slower than Arrays.sort(), but the code is the same
with minor differences: renaming and repackaging (I put the class to the test package org.openjdk.bench.java.util).
How does other package org.openjdk.bench.java.util effect so much?
I use this pom.xml file https://github.com/iaroslavski/sorting/blob/master/radixsort/pom.xml and run as following script:
`mvn clean package`
`java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -jar target/benchmarks.jar org.openjdk.bench.java.util.ArraysSort`
It looks like intrinsics have not been applied. Check STAGGER inputs: you can see the same results because
STAGGER switches to merging sort wich doesn't contain intrinsics, but other inputs have different behaviour.
Therefore, I compare a15 (current sources) and the new version (r20p).
we can see the new version faster (only one anomaly on REPEATED, 2000).
Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM | 600 | 7.096 | 4.343 | 1.63
ArraysSort.Int.testSort | RANDOM | 2000 | 44.014 | 20.323 | 2.17
ArraysSort.Int.testSort | RANDOM | 90000 | 4451.444 | 4073.793 | 1.09
ArraysSort.Int.testSort | RANDOM | 400000 | 22751.966 | 19768.316 | 1.15
ArraysSort.Int.testSort | RANDOM | 3000000 | 190326.306 | 165296.770 | 1.15
ArraysSort.Int.testSort | REPEATED | 600 | 1.044 | 1.048 | 1.00
ArraysSort.Int.testSort | REPEATED | 2000 | 2.272 | 3.212 | 0.71
ArraysSort.Int.testSort | REPEATED | 90000 | 412.331 | 410.775 | 1.00
ArraysSort.Int.testSort | REPEATED | 400000 | 1908.978 | 1869.529 | 1.02
ArraysSort.Int.testSort | REPEATED | 3000000 | 15163.443 | 14302.261 | 1.06
ArraysSort.Int.testSort | STAGGER | 600 | 1.055 | 0.808 | 1.31
ArraysSort.Int.testSort | STAGGER | 2000 | 3.408 | 2.683 | 1.27
ArraysSort.Int.testSort | STAGGER | 90000 | 149.220 | 121.519 | 1.23
ArraysSort.Int.testSort | STAGGER | 400000 | 663.096 | 548.277 | 1.21
ArraysSort.Int.testSort | STAGGER | 3000000 | 5206.890 | 4496.204 | 1.16
ArraysSort.Int.testSort | SHUFFLE | 600 | 4.611 | 3.383 | 1.36
ArraysSort.Int.testSort | SHUFFLE | 2000 | 17.955 | 11.400 | 1.57
ArraysSort.Int.testSort | SHUFFLE | 90000 | 1410.357 | 1001.561 | 1.41
ArraysSort.Int.testSort | SHUFFLE | 400000 | 5739.311 | 4667.466 | 1.23
ArraysSort.Int.testSort | SHUFFLE | 3000000 | 41501.980 | 36284.178 | 1.14
**Parallel sort**
Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.Int.testParallelSort | RANDOM | 600 | 7.102 | 4.315 | 1.65
ArraysSort.Int.testParallelSort | RANDOM | 2000 | 43.708 | 20.322 | 2.15
ArraysSort.Int.testParallelSort | RANDOM | 90000 | 743.725 | 409.892 | 1.81
ArraysSort.Int.testParallelSort | RANDOM | 400000 | 3099.628 | 1322.129 | 2.34
ArraysSort.Int.testParallelSort | RANDOM | 3000000 | 25830.847 | 9790.138 | 2.64
ArraysSort.Int.testParallelSort | REPEATED | 600 | 1.042 | 1.033 | 1.01
ArraysSort.Int.testParallelSort | REPEATED | 2000 | 2.273 | 3.247 | 0.70
ArraysSort.Int.testParallelSort | REPEATED | 90000 | 204.887 | 184.565 | 1.11
ArraysSort.Int.testParallelSort | REPEATED | 400000 | 758.837 | 745.817 | 1.02
ArraysSort.Int.testParallelSort | REPEATED | 3000000 | 7635.330 | 5881.615 | 1.30
ArraysSort.Int.testParallelSort | STAGGER | 600 | 1.045 | 0.807 | 1.29
ArraysSort.Int.testParallelSort | STAGGER | 2000 | 3.410 | 2.733 | 1.25
ArraysSort.Int.testParallelSort | STAGGER | 90000 | 88.093 | 73.869 | 1.19
ArraysSort.Int.testParallelSort | STAGGER | 400000 | 218.848 | 208.361 | 1.05
ArraysSort.Int.testParallelSort | STAGGER | 3000000 | 2245.299 | 2122.314 | 1.06
ArraysSort.Int.testParallelSort | SHUFFLE | 600 | 4.594 | 3.386 | 1.36
ArraysSort.Int.testParallelSort | SHUFFLE | 2000 | 17.759 | 11.570 | 1.53
ArraysSort.Int.testParallelSort | SHUFFLE | 90000 | 293.962 | 146.273 | 2.01
ArraysSort.Int.testParallelSort | SHUFFLE | 400000 | 1017.350 | 766.287 | 1.33
ArraysSort.Int.testParallelSort | SHUFFLE | 3000000 | 7419.434 | 5917.997 | 1.25
Vamsi (@vamsi-parasa),
Could you please check your pom.xml and script to be sure that we use the same environment?
Are intrinsics not applied in package org.openjdk.bench.java.util?
Can you rerun benchmarking when classes a15, r20s and r20p are under package java.util?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1913741195
More information about the core-libs-dev
mailing list