RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v34]

Srinivas Vamsi Parasa duke at openjdk.org
Fri Sep 8 18:10:33 UTC 2023


> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below.
> 
> 
> **Arrays.sort performance data using JMH benchmarks for arrays with random data** 
> 
> |	Arrays.sort benchmark	|	Array Size	|	Baseline (us/op)	|	AVX512 Sort (us/op)	|	Speedup	|
> |	---	|	---	|	---	|	---	|	---	|
> |	ArraysSort.doubleSort	|	10	|	0.034	|	0.035	|	1.0	|
> |	ArraysSort.doubleSort	|	25	|	0.116	|	0.089	|	1.3	|
> |	ArraysSort.doubleSort	|	50	|	0.282	|	0.291	|	1.0	|
> |	ArraysSort.doubleSort	|	75	|	0.474	|	0.358	|	1.3	|
> |	ArraysSort.doubleSort	|	100	|	0.654	|	0.623	|	1.0	|
> |	ArraysSort.doubleSort	|	1000	|	9.274	|	6.331	|	1.5	|
> |	ArraysSort.doubleSort	|	10000	|	323.339	|	71.228	|	**4.5**	|
> |	ArraysSort.doubleSort	|	100000	|	4471.871	|	1002.748	|	**4.5**	|
> |	ArraysSort.doubleSort	|	1000000	|	51660.742	|	12921.295	|	**4.0**	|
> |	ArraysSort.floatSort	|	10	|	0.045	|	0.046	|	1.0	|
> |	ArraysSort.floatSort	|	25	|	0.103	|	0.084	|	1.2	|
> |	ArraysSort.floatSort	|	50	|	0.285	|	0.33	|	0.9	|
> |	ArraysSort.floatSort	|	75	|	0.492	|	0.346	|	1.4	|
> |	ArraysSort.floatSort	|	100	|	0.597	|	0.326	|	1.8	|
> |	ArraysSort.floatSort	|	1000	|	9.811	|	5.294	|	1.9	|
> |	ArraysSort.floatSort	|	10000	|	323.955	|	50.547	|	**6.4**	|
> |	ArraysSort.floatSort	|	100000	|	4326.38	|	731.152	|	**5.9**	|
> |	ArraysSort.floatSort	|	1000000	|	52413.88	|	8409.193	|	**6.2**	|
> |	ArraysSort.intSort	|	10	|	0.033	|	0.033	|	1.0	|
> |	ArraysSort.intSort	|	25	|	0.086	|	0.051	|	1.7	|
> |	ArraysSort.intSort	|	50	|	0.236	|	0.151	|	1.6	|
> |	ArraysSort.intSort	|	75	|	0.416	|	0.332	|	1.3	|
> |	ArraysSort.intSort	|	100	|	0.63	|	0.521	|	1.2	|
> |	ArraysSort.intSort	|	1000	|	10.518	|	4.698	|	2.2	|
> |	ArraysSort.intSort	|	10000	|	309.659	|	42.518	|	**7.3**	|
> |	ArraysSort.intSort	|	100000	|	4130.917	|	573.956	|	**7.2**	|
> |	ArraysSort.intSort	|	1000000	|	49876.307	|	6712.812	|	**7.4**	|
> |	ArraysSort.longSort	|	10	|	0.036	|	0.037	|	1.0	|
> |	ArraysSort.longSort	|	25	|	0.094	|	0.08	|	1.2	|
> |	ArraysSort.longSort	|	50	|	0.218	|	0.227	|	1.0	|
> |	ArraysSort.longSort	|	75	|	0.466	|	0.402	|	1.2	|
> |	ArraysSort.longSort	|	100	|	0.76	|	0.58	|	1.3	|
> |	ArraysSort.longSort	|	1000	|	10.449	|	6.239	|	1.7	|
> |	ArraysSort.longSort	|	10000	|	307.074	|	70.284	|	**4.4**	|
> |	ArraysSor...

Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:

  Fix regression when intrinsics are disabled; enable insertion sort in intrinsic, change library name to libsimdsort

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/14227/files
  - new: https://git.openjdk.org/jdk/pull/14227/files/0ec5f52d..c096ff62

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=33
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=32-33

  Stats: 590 lines in 20 files changed: 210 ins; 172 del; 208 mod
  Patch: https://git.openjdk.org/jdk/pull/14227.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227

PR: https://git.openjdk.org/jdk/pull/14227


More information about the build-dev mailing list