RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

himichael duke at openjdk.org
Fri Oct 13 07:41:02 UTC 2023


On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data** 
>> 
>> |	Arrays.sort benchmark	|	Array Size	|	Baseline (us/op)	|	AVX512 Sort (us/op)	|	Speedup	|
>> |	---	|	---	|	---	|	---	|	---	|
>> |	ArraysSort.doubleSort	|	10	|	0.034	|	0.035	|	1.0	|
>> |	ArraysSort.doubleSort	|	25	|	0.116	|	0.089	|	1.3	|
>> |	ArraysSort.doubleSort	|	50	|	0.282	|	0.291	|	1.0	|
>> |	ArraysSort.doubleSort	|	75	|	0.474	|	0.358	|	1.3	|
>> |	ArraysSort.doubleSort	|	100	|	0.654	|	0.623	|	1.0	|
>> |	ArraysSort.doubleSort	|	1000	|	9.274	|	6.331	|	1.5	|
>> |	ArraysSort.doubleSort	|	10000	|	323.339	|	71.228	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	100000	|	4471.871	|	1002.748	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	1000000	|	51660.742	|	12921.295	|	**4.0**	|
>> |	ArraysSort.floatSort	|	10	|	0.045	|	0.046	|	1.0	|
>> |	ArraysSort.floatSort	|	25	|	0.103	|	0.084	|	1.2	|
>> |	ArraysSort.floatSort	|	50	|	0.285	|	0.33	|	0.9	|
>> |	ArraysSort.floatSort	|	75	|	0.492	|	0.346	|	1.4	|
>> |	ArraysSort.floatSort	|	100	|	0.597	|	0.326	|	1.8	|
>> |	ArraysSort.floatSort	|	1000	|	9.811	|	5.294	|	1.9	|
>> |	ArraysSort.floatSort	|	10000	|	323.955	|	50.547	|	**6.4**	|
>> |	ArraysSort.floatSort	|	100000	|	4326.38	|	731.152	|	**5.9**	|
>> |	ArraysSort.floatSort	|	1000000	|	52413.88	|	8409.193	|	**6.2**	|
>> |	ArraysSort.intSort	|	10	|	0.033	|	0.033	|	1.0	|
>> |	ArraysSort.intSort	|	25	|	0.086	|	0.051	|	1.7	|
>> |	ArraysSort.intSort	|	50	|	0.236	|	0.151	|	1.6	|
>> |	ArraysSort.intSort	|	75	|	0.416	|	0.332	|	1.3	|
>> |	ArraysSort.intSort	|	100	|	0.63	|	0.521	|	1.2	|
>> |	ArraysSort.intSort	|	1000	|	10.518	|	4.698	|	2.2	|
>> |	ArraysSort.intSort	|	10000	|	309.659	|	42.518	|	**7.3**	|
>> |	ArraysSort.intSort	|	100000	|	4130.917	|	573.956	|	**7.2**	|
>> |	ArraysSort.intSort	|	1000000	|	49876.307	|	6712.812	|	**7.4**	|
>> |	ArraysSort.longSort	|	10	|	0.036	|	0.037	|	1.0	|
>> |	ArraysSort.longSort	|	25	|	0.094	|	0.08	|	1.2	|
>> |	ArraysSort.longSort	|	50	|	0.218	|	0.227	|	1.0	|
>> |	ArraysSort.longSort	|	75	|	0.466	|	0.402	|	1.2	|
>> |	ArraysSort.longSort	|	100	|	0.76	|	0.58	|	1.3	|
>> |	ArraysSort.longSort	|	1000	|	10.449	|	6....
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits:
> 
>  - fix code style and formatting
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
>  - Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if
>  - change variable names of indexPivot* to pivotIndex*
>  - Update DualPivotQuicksort.java
>  - Rename arraySort and arrayPartition Java methods to sort and partition. Cleanup some comments
>  - Remove the unnecessary exception in single pivot partitioning fallback method
>  - Move functional interfaces close to the associated methods
>  - Refactor the sort and partition intrinsics to accept method references for fallback functions
>  - Refactor stub handling to use a generic function for all types
>  - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86

> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
> 
> **Arrays.sort performance data using JMH benchmarks for arrays with random data**
> 
> Arrays.sort benchmark	Array Size	Baseline (us/op)	AVX512 Sort (us/op)	Speedup
> ArraysSort.doubleSort	10	0.034	0.035	1.0
> ArraysSort.doubleSort	25	0.116	0.089	1.3
> ArraysSort.doubleSort	50	0.282	0.291	1.0
> ArraysSort.doubleSort	75	0.474	0.358	1.3
> ArraysSort.doubleSort	100	0.654	0.623	1.0
> ArraysSort.doubleSort	1000	9.274	6.331	1.5
> ArraysSort.doubleSort	10000	323.339	71.228	**4.5**
> ArraysSort.doubleSort	100000	4471.871	1002.748	**4.5**
> ArraysSort.doubleSort	1000000	51660.742	12921.295	**4.0**
> ArraysSort.floatSort	10	0.045	0.046	1.0
> ArraysSort.floatSort	25	0.103	0.084	1.2
> ArraysSort.floatSort	50	0.285	0.33	0.9
> ArraysSort.floatSort	75	0.492	0.346	1.4
> ArraysSort.floatSort	100	0.597	0.326	1.8
> ArraysSort.floatSort	1000	9.811	5.294	1.9
> ArraysSort.floatSort	10000	323.955	50.547	**6.4**
> ArraysSort.floatSort	100000	4326.38	731.152	**5.9**
> ArraysSort.floatSort	1000000	52413.88	8409.193	**6.2**
> ArraysSort.intSort	10	0.033	0.033	1.0
> ArraysSort.intSort	25	0.086	0.051	1.7
> ArraysSort.intSort	50	0.236	0.151	1.6
> ArraysSort.intSort	75	0.416	0.332	1.3
> ArraysSort.intSort	100	0.63	0.521	1.2
> ArraysSort.intSort	1000	10.518	4.698	2.2
> ArraysSort.intSort	10000	309.659	42.518	**7.3**
> ArraysSort.intSort	100000	4130.917	573.956	**7.2**
> ArraysSort.intSort	1000000	49876.307	6712.812	**7.4**
> ArraysSort.longSort	10	0.036	0.037	1.0
> ArraysSort.longSort	25	0.094	0.08	1.2
> ArraysSort.longSort	50	0.218	0.227	1.0
> ArraysSort.longSort	75	0.466	0.402	1.2
> ArraysSort.longSort	100	0.76	0.58	1.3
> ArraysSort.longSort	1000	10.449	6.239	1.7
> ArraysSort.longSort	10000	307.074	70.284	**4.4**
> ArraysSort.longSort	100000	4135.651	1049.12	**3.9**
> ArraysSort.longSort	1000000	49559.152	12537.53	**4.0**
> This is the first in the series of PRs related to vectorized sorting. Future planned steps after the integration of this PR:
> 
> 1. Enabling AVX2 sort and partitioning for Linux ( based on Intel's x86-simd-sort [PR](https://github.com/intel/x86-simd-sort/pull/60)).
> 2. Enabling SIMD sort (both AVX512 and AVX2) for Short and Char arrays for length < 1750. (Arrays larger than 1750 already use countingSort which is faster than AVX512 sort.)
> 3. Adding Windows support for both AVX512 and AVX2 using assembly stubs.
> 4. Enabling SIMD sort (both AVX512 and AVX2) for MemorySegment.
> 
> ### Progress
> * [x]  Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer))
> * [x]  Change must not contain extraneous whitespace
> * [x]  Commit message must refer to an issue
> 
> ### Issue
> * [JDK-8309130](https://bugs.openjdk.org/browse/JDK-8309130): x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) (**Enhancement** - P4)
> 
> ### Reviewers
> * [Jatin Bhateja](https://openjdk.org/census#jbhateja) (@jatin-bhateja - **Reviewer**) ⚠️ Review applies to [e63a2aa0](https://git.openjdk.org/jdk/pull/14227/files/e63a2aa081275c3f1ed2ccc4315a60f304d18b34)
> * [Sandhya Viswanathan](https://openjdk.org/census#sviswanathan) (@sviswa7 - **Reviewer**) ⚠️ Review applies to [9642d852](https://git.openjdk.org/jdk/pull/14227/files/9642d852cce5a9cf8270b850c124ef38fc158c6d)
> * [Paul Sandoz](https://openjdk.org/census#psandoz) (@PaulSandoz - **Reviewer**) ⚠️ Review applies to [b04cb6c3](https://git.openjdk.org/jdk/pull/14227/files/b04cb6c3c41c7327f9dc67653e24b08693329e3e)
> * [Vladimir Kozlov](https://openjdk.org/census#kvn) (@vnkozlov - **Reviewer**)
> 
> ### Reviewing
> Using `git`
> Using Skara CLI tools
> Using diff file
> ### Webrev
> [Link to Webrev Comment](https://git.openjdk.org/jdk/pull/14227#issuecomment-1568931001)

hello, I tested this feature in openjdk 22.19, but it didn't seem to work.   
my test code is as follows:   

public class JDKSort {
    public static void main(String[] args) {
        new JDKSort().sort();
    }

    public void sort() {
        // 100 million pieces of data
        int size = 100000000;
        int[] arr = new int[size];
        java.util.Random rand = new java.util.Random();
        for (int i = 0; i < size; ++i) {
            arr[i] = rand.nextInt();
        }

        long startTime = System.currentTimeMillis();
        java.util.Arrays.sort(arr);
        long elapseTime = System.currentTimeMillis() - startTime;
        System.out.println("elapse time -> " + elapseTime + " ms");
    }
}

test result:
- jdk8:   15079 ms
- jdk22:  11619 ms

this feature looks like it's merged into 22.19:   
[x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays)](https://bugs.openjdk.org/browse/JDK-8309130)   
but it doesn't seem to working, do I need to do anything?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1761008818


More information about the build-dev mailing list