RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
Quan Anh Mai
qamai at openjdk.org
Fri Oct 13 08:21:59 UTC 2023
On Fri, 13 Oct 2023 07:02:18 GMT, himichael <duke at openjdk.org> wrote:
>> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>>
>> - fix code style and formatting
>> - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
>> - Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if
>> - change variable names of indexPivot* to pivotIndex*
>> - Update DualPivotQuicksort.java
>> - Rename arraySort and arrayPartition Java methods to sort and partition. Cleanup some comments
>> - Remove the unnecessary exception in single pivot partitioning fallback method
>> - Move functional interfaces close to the associated methods
>> - Refactor the sort and partition intrinsics to accept method references for fallback functions
>> - Refactor stub handling to use a generic function for all types
>> - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86
>
>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>>
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data**
>>
>> Arrays.sort benchmark Array Size Baseline (us/op) AVX512 Sort (us/op) Speedup
>> ArraysSort.doubleSort 10 0.034 0.035 1.0
>> ArraysSort.doubleSort 25 0.116 0.089 1.3
>> ArraysSort.doubleSort 50 0.282 0.291 1.0
>> ArraysSort.doubleSort 75 0.474 0.358 1.3
>> ArraysSort.doubleSort 100 0.654 0.623 1.0
>> ArraysSort.doubleSort 1000 9.274 6.331 1.5
>> ArraysSort.doubleSort 10000 323.339 71.228 **4.5**
>> ArraysSort.doubleSort 100000 4471.871 1002.748 **4.5**
>> ArraysSort.doubleSort 1000000 51660.742 12921.295 **4.0**
>> ArraysSort.floatSort 10 0.045 0.046 1.0
>> ArraysSort.floatSort 25 0.103 0.084 1.2
>> ArraysSort.floatSort 50 0.285 0.33 0.9
>> ArraysSort.floatSort 75 0.492 0.346 1.4
>> ArraysSort.floatSort 100 0.597 0.326 1.8
>> ArraysSort.floatSort 1000 9.811 5.294 1.9
>> ArraysSort.floatSort 10000 323.955 50.547 **6.4**
>> ArraysSort.floatSort 100000 4326.38 731.152 **5.9**
>> ArraysSort.floatSort 1000000 52413.88 8409.193 **6.2**
>> ArraysSort.intSort 10 0.033 0.033 1.0
>> ArraysSort.intSort 25 0.086 0.051 1.7
>> ArraysSort.intSort 50 0.236 0.151 1.6
>> ArraysSort.intSort 75 0.416 0.332 1.3
>> ArraysSort.intSort 100 0.63 0.521 1.2
>> ArraysSort.intSort 1000 10.518 4.698 2.2
>> ArraysSort.intSort 10000 309.659 42.518 **7.3**
>> ArraysSort.intSort 100000 4130.917 573.956 **7.2**
>> ArraysSort.intSort 1000000 49876.307 6712.812 **7.4**
>> ArraysSort.longSort 10 0.036 0.037 1.0
>> ArraysSort.longSort 25 0.094 0.08 1.2
>> ArraysSort.longSort 50 0.218 0.227 1.0
>> ArraysSort.longSort 75 0.466 0.402 1.2
>> ArraysSort.longSort 100 0.76 0.58 1.3
>> ArraysSort.longSort 1000 10.449 6.239 1.7
>> ArraysSort.longSort 10000 307.074 70.284 **4.4**
>> ArraysSort.longSort 100000 4135.651 1049.12 **3.9**
>> ArraysSort.longSort 1000000 49559.152 12537.53 **4.0**
>> This is the first in the series of PRs related to vectorized sorting. Future planned steps after the integration of this PR:
>>
>> 1. Enabling AVX2 sort and partitioning for Linux ( based on Intel's x86-simd-sort [PR](https://github.com/intel/x86-simd-sort/pull/60)).
>> 2...
@himichael Please refer to [this question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) for how to correctly benchmark Java code.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1761104052
More information about the hotspot-compiler-dev
mailing list