RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]
himichael
duke at openjdk.org
Fri Oct 13 07:41:02 UTC 2023
On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>>
>>
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data**
>>
>> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup |
>> | --- | --- | --- | --- | --- |
>> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 |
>> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 |
>> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 |
>> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 |
>> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 |
>> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 |
>> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** |
>> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** |
>> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** |
>> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 |
>> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 |
>> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 |
>> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 |
>> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 |
>> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 |
>> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** |
>> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** |
>> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** |
>> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 |
>> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 |
>> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 |
>> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 |
>> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 |
>> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 |
>> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** |
>> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** |
>> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** |
>> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 |
>> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 |
>> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 |
>> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 |
>> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 |
>> | ArraysSort.longSort | 1000 | 10.449 | 6....
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
>
> - fix code style and formatting
> - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
> - Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if
> - change variable names of indexPivot* to pivotIndex*
> - Update DualPivotQuicksort.java
> - Rename arraySort and arrayPartition Java methods to sort and partition. Cleanup some comments
> - Remove the unnecessary exception in single pivot partitioning fallback method
> - Move functional interfaces close to the associated methods
> - Refactor the sort and partition intrinsics to accept method references for fallback functions
> - Refactor stub handling to use a generic function for all types
> - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86
> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>
> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>
> **Arrays.sort performance data using JMH benchmarks for arrays with random data**
>
> Arrays.sort benchmark Array Size Baseline (us/op) AVX512 Sort (us/op) Speedup
> ArraysSort.doubleSort 10 0.034 0.035 1.0
> ArraysSort.doubleSort 25 0.116 0.089 1.3
> ArraysSort.doubleSort 50 0.282 0.291 1.0
> ArraysSort.doubleSort 75 0.474 0.358 1.3
> ArraysSort.doubleSort 100 0.654 0.623 1.0
> ArraysSort.doubleSort 1000 9.274 6.331 1.5
> ArraysSort.doubleSort 10000 323.339 71.228 **4.5**
> ArraysSort.doubleSort 100000 4471.871 1002.748 **4.5**
> ArraysSort.doubleSort 1000000 51660.742 12921.295 **4.0**
> ArraysSort.floatSort 10 0.045 0.046 1.0
> ArraysSort.floatSort 25 0.103 0.084 1.2
> ArraysSort.floatSort 50 0.285 0.33 0.9
> ArraysSort.floatSort 75 0.492 0.346 1.4
> ArraysSort.floatSort 100 0.597 0.326 1.8
> ArraysSort.floatSort 1000 9.811 5.294 1.9
> ArraysSort.floatSort 10000 323.955 50.547 **6.4**
> ArraysSort.floatSort 100000 4326.38 731.152 **5.9**
> ArraysSort.floatSort 1000000 52413.88 8409.193 **6.2**
> ArraysSort.intSort 10 0.033 0.033 1.0
> ArraysSort.intSort 25 0.086 0.051 1.7
> ArraysSort.intSort 50 0.236 0.151 1.6
> ArraysSort.intSort 75 0.416 0.332 1.3
> ArraysSort.intSort 100 0.63 0.521 1.2
> ArraysSort.intSort 1000 10.518 4.698 2.2
> ArraysSort.intSort 10000 309.659 42.518 **7.3**
> ArraysSort.intSort 100000 4130.917 573.956 **7.2**
> ArraysSort.intSort 1000000 49876.307 6712.812 **7.4**
> ArraysSort.longSort 10 0.036 0.037 1.0
> ArraysSort.longSort 25 0.094 0.08 1.2
> ArraysSort.longSort 50 0.218 0.227 1.0
> ArraysSort.longSort 75 0.466 0.402 1.2
> ArraysSort.longSort 100 0.76 0.58 1.3
> ArraysSort.longSort 1000 10.449 6.239 1.7
> ArraysSort.longSort 10000 307.074 70.284 **4.4**
> ArraysSort.longSort 100000 4135.651 1049.12 **3.9**
> ArraysSort.longSort 1000000 49559.152 12537.53 **4.0**
> This is the first in the series of PRs related to vectorized sorting. Future planned steps after the integration of this PR:
>
> 1. Enabling AVX2 sort and partitioning for Linux ( based on Intel's x86-simd-sort [PR](https://github.com/intel/x86-simd-sort/pull/60)).
> 2. Enabling SIMD sort (both AVX512 and AVX2) for Short and Char arrays for length < 1750. (Arrays larger than 1750 already use countingSort which is faster than AVX512 sort.)
> 3. Adding Windows support for both AVX512 and AVX2 using assembly stubs.
> 4. Enabling SIMD sort (both AVX512 and AVX2) for MemorySegment.
>
> ### Progress
> * [x] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer))
> * [x] Change must not contain extraneous whitespace
> * [x] Commit message must refer to an issue
>
> ### Issue
> * [JDK-8309130](https://bugs.openjdk.org/browse/JDK-8309130): x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) (**Enhancement** - P4)
>
> ### Reviewers
> * [Jatin Bhateja](https://openjdk.org/census#jbhateja) (@jatin-bhateja - **Reviewer**) ⚠️ Review applies to [e63a2aa0](https://git.openjdk.org/jdk/pull/14227/files/e63a2aa081275c3f1ed2ccc4315a60f304d18b34)
> * [Sandhya Viswanathan](https://openjdk.org/census#sviswanathan) (@sviswa7 - **Reviewer**) ⚠️ Review applies to [9642d852](https://git.openjdk.org/jdk/pull/14227/files/9642d852cce5a9cf8270b850c124ef38fc158c6d)
> * [Paul Sandoz](https://openjdk.org/census#psandoz) (@PaulSandoz - **Reviewer**) ⚠️ Review applies to [b04cb6c3](https://git.openjdk.org/jdk/pull/14227/files/b04cb6c3c41c7327f9dc67653e24b08693329e3e)
> * [Vladimir Kozlov](https://openjdk.org/census#kvn) (@vnkozlov - **Reviewer**)
>
> ### Reviewing
> Using `git`
> Using Skara CLI tools
> Using diff file
> ### Webrev
> [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/14227#issuecomment-1568931001)
hello, I tested this feature in openjdk 22.19, but it didn't seem to work.
my test code is as follows:
public class JDKSort {
public static void main(String[] args) {
new JDKSort().sort();
}
public void sort() {
// 100 million pieces of data
int size = 100000000;
int[] arr = new int[size];
java.util.Random rand = new java.util.Random();
for (int i = 0; i < size; ++i) {
arr[i] = rand.nextInt();
}
long startTime = System.currentTimeMillis();
java.util.Arrays.sort(arr);
long elapseTime = System.currentTimeMillis() - startTime;
System.out.println("elapse time -> " + elapseTime + " ms");
}
}
test result:
- jdk8: 15079 ms
- jdk22: 11619 ms
this feature looks like it's merged into 22.19:
[x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays)](https://bugs.openjdk.org/browse/JDK-8309130)
but it doesn't seem to working, do I need to do anything?
-------------
PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1761008818
More information about the build-dev
mailing list