RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v41]

Vladimir Kozlov kvn at openjdk.org
Thu Oct 5 18:50:30 UTC 2023


On Fri, 22 Sep 2023 16:53:21 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data** 
>> 
>> |	Arrays.sort benchmark	|	Array Size	|	Baseline (us/op)	|	AVX512 Sort (us/op)	|	Speedup	|
>> |	---	|	---	|	---	|	---	|	---	|
>> |	ArraysSort.doubleSort	|	10	|	0.034	|	0.035	|	1.0	|
>> |	ArraysSort.doubleSort	|	25	|	0.116	|	0.089	|	1.3	|
>> |	ArraysSort.doubleSort	|	50	|	0.282	|	0.291	|	1.0	|
>> |	ArraysSort.doubleSort	|	75	|	0.474	|	0.358	|	1.3	|
>> |	ArraysSort.doubleSort	|	100	|	0.654	|	0.623	|	1.0	|
>> |	ArraysSort.doubleSort	|	1000	|	9.274	|	6.331	|	1.5	|
>> |	ArraysSort.doubleSort	|	10000	|	323.339	|	71.228	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	100000	|	4471.871	|	1002.748	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	1000000	|	51660.742	|	12921.295	|	**4.0**	|
>> |	ArraysSort.floatSort	|	10	|	0.045	|	0.046	|	1.0	|
>> |	ArraysSort.floatSort	|	25	|	0.103	|	0.084	|	1.2	|
>> |	ArraysSort.floatSort	|	50	|	0.285	|	0.33	|	0.9	|
>> |	ArraysSort.floatSort	|	75	|	0.492	|	0.346	|	1.4	|
>> |	ArraysSort.floatSort	|	100	|	0.597	|	0.326	|	1.8	|
>> |	ArraysSort.floatSort	|	1000	|	9.811	|	5.294	|	1.9	|
>> |	ArraysSort.floatSort	|	10000	|	323.955	|	50.547	|	**6.4**	|
>> |	ArraysSort.floatSort	|	100000	|	4326.38	|	731.152	|	**5.9**	|
>> |	ArraysSort.floatSort	|	1000000	|	52413.88	|	8409.193	|	**6.2**	|
>> |	ArraysSort.intSort	|	10	|	0.033	|	0.033	|	1.0	|
>> |	ArraysSort.intSort	|	25	|	0.086	|	0.051	|	1.7	|
>> |	ArraysSort.intSort	|	50	|	0.236	|	0.151	|	1.6	|
>> |	ArraysSort.intSort	|	75	|	0.416	|	0.332	|	1.3	|
>> |	ArraysSort.intSort	|	100	|	0.63	|	0.521	|	1.2	|
>> |	ArraysSort.intSort	|	1000	|	10.518	|	4.698	|	2.2	|
>> |	ArraysSort.intSort	|	10000	|	309.659	|	42.518	|	**7.3**	|
>> |	ArraysSort.intSort	|	100000	|	4130.917	|	573.956	|	**7.2**	|
>> |	ArraysSort.intSort	|	1000000	|	49876.307	|	6712.812	|	**7.4**	|
>> |	ArraysSort.longSort	|	10	|	0.036	|	0.037	|	1.0	|
>> |	ArraysSort.longSort	|	25	|	0.094	|	0.08	|	1.2	|
>> |	ArraysSort.longSort	|	50	|	0.218	|	0.227	|	1.0	|
>> |	ArraysSort.longSort	|	75	|	0.466	|	0.402	|	1.2	|
>> |	ArraysSort.longSort	|	100	|	0.76	|	0.58	|	1.3	|
>> |	ArraysSort.longSort	|	1000	|	10.449	|	6....
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if

In general it looks good. I have some code style comments and file name change request.
After you fix that I will need to rerun testing for it before approval.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4183:

> 4181: 
> 4182:   // Load x86_64_sort library on supported hardware to enable avx512 sort and partition intrinsics
> 4183:     if (UseAVX > 2 && VM_Version::supports_avx512dq()) {

Indention (spacing) is wrong for lines 4183-4190 after you moved check.

src/hotspot/share/opto/library_call.cpp line 5372:

> 5370: bool LibraryCallKit::inline_array_partition() {
> 5371: 
> 5372:   address stubAddr = nullptr;

Move this declaration to first assignment at line 5387.

src/hotspot/share/opto/library_call.cpp line 5374:

> 5372:   address stubAddr = nullptr;
> 5373:   const char *stubName;
> 5374:   stubName = "array_partition_stub";

It could one line.

src/hotspot/share/opto/library_call.cpp line 5400:

> 5398: 
> 5399:   // create the pivotIndices array of type int and size = 2
> 5400:   Node* pivotIndices = nullptr;

Move to assignment at line 5403.

src/hotspot/share/opto/library_call.cpp line 5414:

> 5412:   make_runtime_call(RC_LEAF|RC_NO_FP, OptoRuntime::array_partition_Type(),
> 5413:                     stubAddr, stubName, TypePtr::BOTTOM,
> 5414:                     obj_adr, elemType, fromIndex, toIndex, pivotIndices_adr, indexPivot1, indexPivot2);

May be put `indexPivot*` argument on new line for fit screen better.

src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 4:

> 2:  * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved.
> 3:  * Copyright (c) 2021 Serge Sans Paille. All rights reserved.
> 4:  * Intel x86-simd-sort source code.

I don't think this line here and in other new files will pass our Copyright header checker. Do you need this line here? You do have comment below `This implementation is based on x86-simd-sort`.
`DO NOT ALTER ..` line should immediately follow `Copyright` lines.

src/java.base/linux/native/libsimdsort/avxsort_linux_x86.cpp line 1:

> 1: /*

I think this file name should follow pattern of other files: `avx512-linux-qsort.cpp

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 418:

> 416: 
> 417:         /*
> 418:         * The first and the last elements to be sorted are moved

Misaligned `/*` here and several places later. `*` should be aligned.

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 1353:

> 1351:         /*
> 1352:             * Swap the pivot into its final position.
> 1353:             */

Indent

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 2941:

> 2939:         /*
> 2940:             * Swap the pivot into its final position.
> 2941:             */

indent

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 3794:

> 3792:         /*
> 3793:             * Swap the pivot into its final position.
> 3794:             */

Indent.

-------------

PR Review: https://git.openjdk.org/jdk/pull/14227#pullrequestreview-1660450589
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347808019
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347810630
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347809581
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347814187
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347816297
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347830769
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347834135
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347828096
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347824813
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347823957
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1347823053


More information about the build-dev mailing list