RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v26]

Sandhya Viswanathan sviswanathan at openjdk.org
Thu Aug 24 22:04:19 UTC 2023


On Wed, 23 Aug 2023 23:29:47 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random data** 
>> 
>> |	Arrays.sort benchmark	|	Array Size	|	Baseline (us/op)	|	AVX512 Sort (us/op)	|	Speedup	|
>> |	---	|	---	|	---	|	---	|	---	|
>> |	ArraysSort.doubleSort	|	10	|	0.034	|	0.035	|	1.0	|
>> |	ArraysSort.doubleSort	|	25	|	0.116	|	0.089	|	1.3	|
>> |	ArraysSort.doubleSort	|	50	|	0.282	|	0.291	|	1.0	|
>> |	ArraysSort.doubleSort	|	75	|	0.474	|	0.358	|	1.3	|
>> |	ArraysSort.doubleSort	|	100	|	0.654	|	0.623	|	1.0	|
>> |	ArraysSort.doubleSort	|	1000	|	9.274	|	6.331	|	1.5	|
>> |	ArraysSort.doubleSort	|	10000	|	323.339	|	71.228	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	100000	|	4471.871	|	1002.748	|	**4.5**	|
>> |	ArraysSort.doubleSort	|	1000000	|	51660.742	|	12921.295	|	**4.0**	|
>> |	ArraysSort.floatSort	|	10	|	0.045	|	0.046	|	1.0	|
>> |	ArraysSort.floatSort	|	25	|	0.103	|	0.084	|	1.2	|
>> |	ArraysSort.floatSort	|	50	|	0.285	|	0.33	|	0.9	|
>> |	ArraysSort.floatSort	|	75	|	0.492	|	0.346	|	1.4	|
>> |	ArraysSort.floatSort	|	100	|	0.597	|	0.326	|	1.8	|
>> |	ArraysSort.floatSort	|	1000	|	9.811	|	5.294	|	1.9	|
>> |	ArraysSort.floatSort	|	10000	|	323.955	|	50.547	|	**6.4**	|
>> |	ArraysSort.floatSort	|	100000	|	4326.38	|	731.152	|	**5.9**	|
>> |	ArraysSort.floatSort	|	1000000	|	52413.88	|	8409.193	|	**6.2**	|
>> |	ArraysSort.intSort	|	10	|	0.033	|	0.033	|	1.0	|
>> |	ArraysSort.intSort	|	25	|	0.086	|	0.051	|	1.7	|
>> |	ArraysSort.intSort	|	50	|	0.236	|	0.151	|	1.6	|
>> |	ArraysSort.intSort	|	75	|	0.416	|	0.332	|	1.3	|
>> |	ArraysSort.intSort	|	100	|	0.63	|	0.521	|	1.2	|
>> |	ArraysSort.intSort	|	1000	|	10.518	|	4.698	|	2.2	|
>> |	ArraysSort.intSort	|	10000	|	309.659	|	42.518	|	**7.3**	|
>> |	ArraysSort.intSort	|	100000	|	4130.917	|	573.956	|	**7.2**	|
>> |	ArraysSort.intSort	|	1000000	|	49876.307	|	6712.812	|	**7.4**	|
>> |	ArraysSort.longSort	|	10	|	0.036	|	0.037	|	1.0	|
>> |	ArraysSort.longSort	|	25	|	0.094	|	0.08	|	1.2	|
>> |	ArraysSort.longSort	|	50	|	0.218	|	0.227	|	1.0	|
>> |	ArraysSort.longSort	|	75	|	0.466	|	0.402	|	1.2	|
>> |	ArraysSort.longSort	|	100	|	0.76	|	0.58	|	1.3	|
>> |	ArraysSort.longSort	|	1000	|	10.449	|	6....
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   add parallelSort benchmarking

src/java.base/share/classes/java/util/Arrays.java line 102:

> 100:     static void arraySort(Class<?> elemType, Object array, long offset, int fromIndex, int toIndex, int end) {
> 101:         DualPivotQuicksort.smallArraySort(array, fromIndex, toIndex, end);
> 102:     }

The arraySort and arrayPartition methods could be part of DualPivotQuicksort.java now.
Then we can also remove the extra indirection from arraySort to smallArraySort and from arrayPartition to partitionSinglePivot and partitionDualPivot.

src/java.base/share/classes/java/util/Arrays.java line 398:

> 396:      */
> 397:     public static void sort(double[] a) {
> 398:          DualPivotQuicksort.sort(a, 0, 0, a.length);

Extra blank space before DualPivotQuicksort.

src/java.base/share/classes/java/util/DualPivotQuicksort.java line 2801:

> 2799:                 Arrays.arrayPartition(float.class, a, baseOffset, low, high, pivotIndices, Unsafe.ARRAY_INT_BASE_OFFSET, isDualPivot);
> 2800:                 lower = pivotIndices[0];
> 2801:                 upper = pivotIndices[1];

lower and upper are not used and overwritten on line 2822-2823.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304694940
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304672283
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1304838101


More information about the hotspot-compiler-dev mailing list