RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v11]

Vladimir Yaroslavskiy duke at openjdk.org
Sun Jan 28 22:26:40 UTC 2024


On Fri, 26 Jan 2024 17:19:25 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> Hello Vamsi (@vamsi-parasa),
>> 
>> Could you please run the benchmarking of new DQPS in your environment with AVX?
>> 
>> Take all classes below and put them in the package org.openjdk.bench.java.util.
>> ArraysSort class contains all tests for the new versions and ready to use.
>> (it will run all tests in one execution).
>> 
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java
>> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java
>> 
>> Many thanks,
>> Vladimir
>
> Hi Vladimir (@iaroslavski),
> 
> Please see the JMH data below.
> 
> Thanks,
> Vamsi
> 
> Benchmark              (builder)   (size)  Mode  Cnt       Score      Error  Units
> ArraysSort.Int.a15        RANDOM      600  avgt    4       7.096 ±    0.081  us/op
> ArraysSort.Int.a15        RANDOM     2000  avgt    4      44.014 ±    1.717  us/op
> ArraysSort.Int.a15        RANDOM    90000  avgt    4    4451.444 ±   71.168  us/op
> ArraysSort.Int.a15        RANDOM   400000  avgt    4   22751.966 ±  683.597  us/op
> ArraysSort.Int.a15        RANDOM  3000000  avgt    4  190326.306 ± 8008.512  us/op
> ArraysSort.Int.a15      REPEATED      600  avgt    4       1.044 ±    0.016  us/op
> ArraysSort.Int.a15      REPEATED     2000  avgt    4       2.272 ±    0.287  us/op
> ArraysSort.Int.a15      REPEATED    90000  avgt    4     412.331 ±   11.656  us/op
> ArraysSort.Int.a15      REPEATED   400000  avgt    4    1908.978 ±   30.241  us/op
> ArraysSort.Int.a15      REPEATED  3000000  avgt    4   15163.443 ±  100.425  us/op
> ArraysSort.Int.a15       STAGGER      600  avgt    4       1.055 ±    0.057  us/op
> ArraysSort.Int.a15       STAGGER     2000  avgt    4       3.408 ±    0.096  us/op
> ArraysSort.Int.a15       STAGGER    90000  avgt    4     149.220 ±    4.022  us/op
> ArraysSort.Int.a15       STAGGER   400000  avgt    4     663.096 ±   30.252  us/op
> ArraysSort.Int.a15       STAGGER  3000000  avgt    4    5206.890 ±  234.857  us/op
> ArraysSort.Int.a15       SHUFFLE      600  avgt    4       4.611 ±    0.118  us/op
> ArraysSort.Int.a15       SHUFFLE     2000  avgt    4      17.955 ±    0.356  us/op
> ArraysSort.Int.a15       SHUFFLE    90000  avgt    4    1410.357 ±   41.128  us/op
> ArraysSort.Int.a15       SHUFFLE   400000  avgt    4    5739.311 ±  128.270  us/op
> ArraysSort.Int.a15       SHUFFLE  3000000  avgt    4   41501.980 ±  829.443  us/op
> ArraysSort.Int.jdk        RANDOM      600  avgt    4       1.612 ±    0.088  us/op
> ArraysSort.Int.jdk        RANDOM     2000  avgt    4       6.893 ±    0.375  us/op
> ArraysSort.Int.jdk        RANDOM    90000  avgt    4     522.749 ±   19.386  us/op
> ArraysSort.Int.jdk        RANDOM   400000  avgt    4    2424.204 ±   63.844  us/op
> ArraysSort.Int.jdk        RANDOM  3000000  avgt    4   21000.434 ±  801.315  us/op
> ArraysSort.Int.jdk      REPEATED      600  avgt    4       0.496 ±    0.030  us/op
> ArraysSort.Int.jdk      REPEATED     2000  avgt    4       1.037 ±    0.083  us/op
> ArraysSort.Int.jdk      REPEATED    90000  avgt    4      57.763 ±    1.955  us/op
> ArraysSort.Int.jd...

Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl),

The latest benchmarking compares compares the following versions:
jdk - direct call of Arrays.sort();
a15 - the current source of DualPivotQuicksort from the latest build (except renaming)
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java
https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java
r20s - new version without Radix sort
r20p - new version with Radix sort in parallel case only

It is expected that timing of jdk and a15 should be more or less the same, but please look at the results:

Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source (a15)
-- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM   |     600 | 1.612     | 7.096
ArraysSort.Int.testSort | RANDOM   |    2000 | 6.893     | 44.014
ArraysSort.Int.testSort | RANDOM   |   90000 | 522.749   | 4451.444
ArraysSort.Int.testSort | RANDOM   |  400000 | 2424.204  | 22751.966
ArraysSort.Int.testSort | RANDOM   | 3000000 | 21000.434 | 190326.306
ArraysSort.Int.testSort | REPEATED |     600 | 0.496     | 1.044
ArraysSort.Int.testSort | REPEATED |    2000 | 1.037     | 2.272
ArraysSort.Int.testSort | REPEATED |   90000 | 57.763    | 412.331
ArraysSort.Int.testSort | REPEATED |  400000 | 182.238   | 1908.978
ArraysSort.Int.testSort | REPEATED | 3000000 | 1708.082  | 15163.443
ArraysSort.Int.testSort | STAGGER  |     600 | 1.038     | 1.055
ArraysSort.Int.testSort | STAGGER  |    2000 | 3.434     | 3.408
ArraysSort.Int.testSort | STAGGER  |   90000 | 148.638   | 149.220
ArraysSort.Int.testSort | STAGGER  |  400000 | 663.076   | 663.096
ArraysSort.Int.testSort | STAGGER  | 3000000 | 5212.821  | 5206.890
ArraysSort.Int.testSort | SHUFFLE  |     600 | 1.926     | 4.611
ArraysSort.Int.testSort | SHUFFLE  |    2000 | 6.858     | 17.955
ArraysSort.Int.testSort | SHUFFLE  |   90000 | 473.441   | 1410.357
ArraysSort.Int.testSort | SHUFFLE  |  400000 | 2153.779  | 5739.311
ArraysSort.Int.testSort | SHUFFLE  | 3000000 | 18180.141 | 41501.980

You can see that a15 (current source) works extremly slower than Arrays.sort(), but the code is the same
with minor differences: renaming and repackaging (I put the class to the test package org.openjdk.bench.java.util).
How does other package org.openjdk.bench.java.util effect so much?

I use this pom.xml file https://github.com/iaroslavski/sorting/blob/master/radixsort/pom.xml and run as following script:

`mvn clean package`
`java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -jar target/benchmarks.jar org.openjdk.bench.java.util.ArraysSort`

It looks like intrinsics have not been applied. Check STAGGER inputs: you can see the same results because
STAGGER switches to merging sort wich doesn't contain intrinsics, but other inputs have different behaviour.

Therefore, I compare a15 (current sources) and the new version (r20p).
we can see the new version faster (only one anomaly on REPEATED, 2000).

Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.Int.testSort | RANDOM   |     600 | 7.096      | 4.343      | 1.63
ArraysSort.Int.testSort | RANDOM   |    2000 | 44.014     | 20.323     | 2.17
ArraysSort.Int.testSort | RANDOM   |   90000 | 4451.444   | 4073.793   | 1.09
ArraysSort.Int.testSort | RANDOM   |  400000 | 22751.966  | 19768.316  | 1.15
ArraysSort.Int.testSort | RANDOM   | 3000000 | 190326.306 | 165296.770 | 1.15
ArraysSort.Int.testSort | REPEATED |     600 | 1.044      | 1.048      | 1.00
ArraysSort.Int.testSort | REPEATED |    2000 | 2.272      | 3.212      | 0.71
ArraysSort.Int.testSort | REPEATED |   90000 | 412.331    | 410.775    | 1.00
ArraysSort.Int.testSort | REPEATED |  400000 | 1908.978   | 1869.529   | 1.02
ArraysSort.Int.testSort | REPEATED | 3000000 | 15163.443  | 14302.261  | 1.06
ArraysSort.Int.testSort | STAGGER  |     600 | 1.055      | 0.808      | 1.31
ArraysSort.Int.testSort | STAGGER  |    2000 | 3.408      | 2.683      | 1.27
ArraysSort.Int.testSort | STAGGER  |   90000 | 149.220    | 121.519    | 1.23
ArraysSort.Int.testSort | STAGGER  |  400000 | 663.096    | 548.277    | 1.21
ArraysSort.Int.testSort | STAGGER  | 3000000 | 5206.890   | 4496.204   | 1.16
ArraysSort.Int.testSort | SHUFFLE  |     600 | 4.611      | 3.383      | 1.36
ArraysSort.Int.testSort | SHUFFLE  |    2000 | 17.955     | 11.400     | 1.57
ArraysSort.Int.testSort | SHUFFLE  |   90000 | 1410.357   | 1001.561   | 1.41
ArraysSort.Int.testSort | SHUFFLE  |  400000 | 5739.311   | 4667.466   | 1.23
ArraysSort.Int.testSort | SHUFFLE  | 3000000 | 41501.980  | 36284.178  | 1.14

**Parallel sort**

Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.Int.testParallelSort | RANDOM   |     600 | 7.102     | 4.315    | 1.65
ArraysSort.Int.testParallelSort | RANDOM   |    2000 | 43.708    | 20.322   | 2.15
ArraysSort.Int.testParallelSort | RANDOM   |   90000 | 743.725   | 409.892  | 1.81
ArraysSort.Int.testParallelSort | RANDOM   |  400000 | 3099.628  | 1322.129 | 2.34
ArraysSort.Int.testParallelSort | RANDOM   | 3000000 | 25830.847 | 9790.138 | 2.64
ArraysSort.Int.testParallelSort | REPEATED |     600 | 1.042     | 1.033    | 1.01
ArraysSort.Int.testParallelSort | REPEATED |    2000 | 2.273     | 3.247    | 0.70
ArraysSort.Int.testParallelSort | REPEATED |   90000 | 204.887   | 184.565  | 1.11
ArraysSort.Int.testParallelSort | REPEATED |  400000 | 758.837   | 745.817  | 1.02
ArraysSort.Int.testParallelSort | REPEATED | 3000000 | 7635.330  | 5881.615 | 1.30
ArraysSort.Int.testParallelSort | STAGGER  |     600 | 1.045     | 0.807    | 1.29
ArraysSort.Int.testParallelSort | STAGGER  |    2000 | 3.410     | 2.733    | 1.25
ArraysSort.Int.testParallelSort | STAGGER  |   90000 | 88.093    | 73.869   | 1.19
ArraysSort.Int.testParallelSort | STAGGER  |  400000 | 218.848   | 208.361  | 1.05
ArraysSort.Int.testParallelSort | STAGGER  | 3000000 | 2245.299  | 2122.314 | 1.06
ArraysSort.Int.testParallelSort | SHUFFLE  |     600 | 4.594     | 3.386    | 1.36
ArraysSort.Int.testParallelSort | SHUFFLE  |    2000 | 17.759    | 11.570   | 1.53
ArraysSort.Int.testParallelSort | SHUFFLE  |   90000 | 293.962   | 146.273  | 2.01
ArraysSort.Int.testParallelSort | SHUFFLE  |  400000 | 1017.350  | 766.287  | 1.33
ArraysSort.Int.testParallelSort | SHUFFLE  | 3000000 | 7419.434  | 5917.997 | 1.25

Vamsi (@vamsi-parasa),
Could you please check your pom.xml and script to be sure that we use the same environment?
Are intrinsics not applied in package org.openjdk.bench.java.util?
Can you rerun benchmarking when classes a15, r20s and r20p are under package java.util?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1913741195


More information about the core-libs-dev mailing list