RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v14]

Srinivas Vamsi Parasa duke at openjdk.org
Wed Aug 16 18:25:13 UTC 2023


On Tue, 15 Aug 2023 19:23:24 GMT, iaroslavski <duke at openjdk.org> wrote:

>>> @vamsi-parasa We need to preserve NaNs. The base (https://github.com/intel/x86-simd-sort) algorithm used doesn't preserve NaNs.
>> 
>> Thanks for catching this Sandhya! This is fixed now in the most recent commit. A preprocessing step is added to move the NaNs to the top of the array.
>
> Hello @vamsi-parasa !
> 
> Do you process negative zeros properly? From one hand -0.0f equals to 0.0f, but negative zeros must be placed before 0.0f.
> See javadoc for Arrays.sort(float[] a). The same situation with -0.0d (double type).

@iaroslavski 

Hello Vladimir,

Please see the `Arrays.sort()` performance comparison between the  **enhanced DPQS/Radix sort #13568** vs. **AVX512 sort intrinsic** (this PR)  using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568)

- The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort`
- Please see the performance numbers below. The scores shown are the average time (us/op),  thus lower is better. The last column towards the right shows the speedup. 
- Space complexity was not benchmarked as AVX512 sort is an in-place sorting algorithm while the Radix sort needs extra memory.

|	Benchmark	|	Mode	|	Size	|	Enhanced DPQS/Radix (us/op)	|	AVX512 Sort (us/op)	|	Speedup	|
|	 --- 	|	 --- 	|	 --- 	|	 --- 	|	 --- 	|	 --- 	|
|	ArraysSortTests.Double.testSort	|	RANDOM	|	800	|	8.4	|	2.6	|	3.3x	|
|	ArraysSortTests.Double.testSort	|	RANDOM	|	7000	|	94.4	|	28.0	|	3.4x	|
|	ArraysSortTests.Double.testSort	|	RANDOM	|	50000	|	634.9	|	238.4	|	2.7x	|
|	ArraysSortTests.Double.testSort	|	RANDOM	|	300000	|	3887.6	|	1721.6	|	2.3x	|
|	ArraysSortTests.Double.testSort	|	RANDOM	|	2000000	|	29348.9	|	13854.8	|	2.1x	|
|	ArraysSortTests.Double.testSort	|	REPEATED	|	800	|	2.0	|	1.6	|	1.2x	|
|	ArraysSortTests.Double.testSort	|	REPEATED	|	7000	|	39.6	|	37.2	|	1.1x	|
|	ArraysSortTests.Double.testSort	|	REPEATED	|	50000	|	470.0	|	386.3	|	1.2x	|
|	ArraysSortTests.Double.testSort	|	REPEATED	|	300000	|	2940.8	|	342.7	|	8.6x	|
|	ArraysSortTests.Double.testSort	|	REPEATED	|	2000000	|	19361.4	|	2838.9	|	6.8x	|
|	ArraysSortTests.Double.testSort	|	STAGGER	|	800	|	2.3	|	2.7	|	0.8x	|
|	ArraysSortTests.Double.testSort	|	STAGGER	|	7000	|	25.6	|	28.0	|	0.9x	|
|	ArraysSortTests.Double.testSort	|	STAGGER	|	50000	|	159.8	|	224.0	|	0.7x	|
|	ArraysSortTests.Double.testSort	|	STAGGER	|	300000	|	945.3	|	1566.8	|	0.6x	|
|	ArraysSortTests.Double.testSort	|	STAGGER	|	2000000	|	6398.7	|	12668.3	|	0.5x	|
|	ArraysSortTests.Double.testSort	|	SHUFFLE	|	800	|	4.9	|	2.7	|	1.8x	|
|	ArraysSortTests.Double.testSort	|	SHUFFLE	|	7000	|	55.9	|	27.5	|	2.0x	|
|	ArraysSortTests.Double.testSort	|	SHUFFLE	|	50000	|	419.8	|	233.5	|	1.8x	|
|	ArraysSortTests.Double.testSort	|	SHUFFLE	|	300000	|	2636.2	|	1569.6	|	1.7x	|
|	ArraysSortTests.Double.testSort	|	SHUFFLE	|	2000000	|	21131.7	|	12963.4	|	1.6x	|
|	ArraysSortTests.Float.testSort	|	RANDOM	|	800	|	7.4	|	1.8	|	4.1x	|
|	ArraysSortTests.Float.testSort	|	RANDOM	|	7000	|	46.1	|	18.8	|	2.4x	|
|	ArraysSortTests.Float.testSort	|	RANDOM	|	50000	|	328.5	|	177.2	|	1.9x	|
|	ArraysSortTests.Float.testSort	|	RANDOM	|	300000	|	1960.5	|	1342.3	|	1.5x	|
|	ArraysSortTests.Float.testSort	|	RANDOM	|	2000000	|	14502.4	|	10087.5	|	1.4x	|
|	ArraysSortTests.Float.testSort	|	REPEATED	|	800	|	2.0	|	0.8	|	2.4x	|
|	ArraysSortTests.Float.testSort	|	REPEATED	|	7000	|	30.6	|	6.6	|	4.6x	|
|	ArraysSortTests.Float.testSort	|	REPEATED	|	50000	|	369.0	|	34.9	|	10.6x |
|	ArraysSortTests.Float.testSort	|	REPEATED	|	300000	|	2937.7	|	207.1	|	14.2x	|
|	ArraysSortTests.Float.testSort	|	REPEATED	|	2000000	|	19008.3	|	1503.9	|	12.6x |
|	ArraysSortTests.Float.testSort	|	STAGGER	|	800	|	2.2	|	1.7	|	1.3x	|
|	ArraysSortTests.Float.testSort	|	STAGGER	|	7000	|	25.7	|	19.0	|	1.4x	|
|	ArraysSortTests.Float.testSort	|	STAGGER	|	50000	|	151.2	|	153.6	|	1.0x	|
|	ArraysSortTests.Float.testSort	|	STAGGER	|	300000	|	873.0	|	1165.8	|	0.7x	|
|	ArraysSortTests.Float.testSort	|	STAGGER	|	2000000	|	5720.0	|	8600.0	|	0.7x	|
|	ArraysSortTests.Float.testSort	|	SHUFFLE	|	800	|	4.9	|	1.7	|	2.9x	|
|	ArraysSortTests.Float.testSort	|	SHUFFLE	|	7000	|	41.6	|	18.5	|	2.3x	|
|	ArraysSortTests.Float.testSort	|	SHUFFLE	|	50000	|	362.9	|	156.4	|	2.3x	|
|	ArraysSortTests.Float.testSort	|	SHUFFLE	|	300000	|	2067.8	|	1204.3	|	1.7x	|
|	ArraysSortTests.Float.testSort	|	SHUFFLE	|	2000000	|	14591.3	|	9040.1	|	1.6x	|
|	ArraysSortTests.Int.testSort	|	RANDOM	|	800	|	7.0	|	1.3	|	5.4x	|
|	ArraysSortTests.Int.testSort	|	RANDOM	|	7000	|	30.4	|	14.2	|	2.1x	|
|	ArraysSortTests.Int.testSort	|	RANDOM	|	50000	|	255.2	|	153.4	|	1.7x	|
|	ArraysSortTests.Int.testSort	|	RANDOM	|	300000	|	1618.5	|	1139.2	|	1.4x	|
|	ArraysSortTests.Int.testSort	|	RANDOM	|	2000000	|	11557.7	|	8509.0	|	1.4x	|
|	ArraysSortTests.Int.testSort	|	REPEATED	|	800	|	1.2	|	0.5	|	2.5x	|
|	ArraysSortTests.Int.testSort	|	REPEATED	|	7000	|	10.6	|	3.9	|	2.7x	|
|	ArraysSortTests.Int.testSort	|	REPEATED	|	50000	|	192.7	|	20.8	|	9.3x	|
|	ArraysSortTests.Int.testSort	|	REPEATED	|	300000	|	1952.6	|	138.8	|	14.1x |
|	ArraysSortTests.Int.testSort	|	REPEATED	|	2000000	|	12969.4	|	732.2	|	17.7x	|
|	ArraysSortTests.Int.testSort	|	STAGGER	|	800	|	1.5	|	1.2	|	1.3x	|
|	ArraysSortTests.Int.testSort	|	STAGGER	|	7000	|	13.2	|	14.2	|	0.9x	|
|	ArraysSortTests.Int.testSort	|	STAGGER	|	50000	|	94.1	|	116.7	|	0.8x	|
|	ArraysSortTests.Int.testSort	|	STAGGER	|	300000	|	620.0	|	949.2	|	0.7x	|
|	ArraysSortTests.Int.testSort	|	STAGGER	|	2000000	|	4279.9	|	7195.2	|	0.6x	|
|	ArraysSortTests.Int.testSort	|	SHUFFLE	|	800	|	4.6	|	1.2	|	3.7x	|
|	ArraysSortTests.Int.testSort	|	SHUFFLE	|	7000	|	33.2	|	13.9	|	2.4x	|
|	ArraysSortTests.Int.testSort	|	SHUFFLE	|	50000	|	385.1	|	138.3	|	2.8x	|
|	ArraysSortTests.Int.testSort	|	SHUFFLE	|	300000	|	2031.9	|	1030.3	|	2.0x	|
|	ArraysSortTests.Int.testSort	|	SHUFFLE	|	2000000	|	20460.8	|	7631.4	|	2.7x	|
|	ArraysSortTests.Long.testSort	|	RANDOM	|	800	|	6.8	|	3.0	|	2.3x	|
|	ArraysSortTests.Long.testSort	|	RANDOM	|	7000	|	83.6	|	33.2	|	2.5x	|
|	ArraysSortTests.Long.testSort	|	RANDOM	|	50000	|	616.0	|	281.7	|	2.2x	|
|	ArraysSortTests.Long.testSort	|	RANDOM	|	300000	|	3752.6	|	1988.4	|	1.9x	|
|	ArraysSortTests.Long.testSort	|	RANDOM	|	2000000	|	28236.2	|	15483.1	|	1.8x	|
|	ArraysSortTests.Long.testSort	|	REPEATED	|	800	|	1.3	|	1.7	|	0.8x	|
|	ArraysSortTests.Long.testSort	|	REPEATED	|	7000	|	19.5	|	35.1	|	0.6x	|
|	ArraysSortTests.Long.testSort	|	REPEATED	|	50000	|	309.8	|	390.3	|	0.8x	|
|	ArraysSortTests.Long.testSort	|	REPEATED	|	300000	|	2046.2	|	253.0	|	8.1x	|
|	ArraysSortTests.Long.testSort	|	REPEATED	|	2000000	|	13105.2	|	1985.3	|	6.6x	|
|	ArraysSortTests.Long.testSort	|	STAGGER	|	800	|	1.6	|	3.2	|	0.5x	|
|	ArraysSortTests.Long.testSort	|	STAGGER	|	7000	|	13.3	|	32.8	|	0.4x	|
|	ArraysSortTests.Long.testSort	|	STAGGER	|	50000	|	116.3	|	265.0	|	0.4x	|
|	ArraysSortTests.Long.testSort	|	STAGGER	|	300000	|	684.3	|	1827.9	|	0.4x	|
|	ArraysSortTests.Long.testSort	|	STAGGER	|	2000000	|	5022.3	|	14350.6	|	0.3x	|
|	ArraysSortTests.Long.testSort	|	SHUFFLE	|	800	|	4.4	|	3.3	|	1.3x	|
|	ArraysSortTests.Long.testSort	|	SHUFFLE	|	7000	|	39.9	|	32.6	|	1.2x	|
|	ArraysSortTests.Long.testSort	|	SHUFFLE	|	50000	|	614.9	|	274.2	|	2.2x	|
|	ArraysSortTests.Long.testSort	|	SHUFFLE	|	300000	|	2546.9	|	1999.7	|	1.3x	|
|	ArraysSortTests.Long.testSort	|	SHUFFLE	|	2000000	|	31300.8	|	14655.9	|	2.1x	|

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1681080351


More information about the hotspot-compiler-dev mailing list