RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v22]

Jatin Bhateja jbhateja at openjdk.org
Mon Aug 28 18:59:20 UTC 2023


On Fri, 25 Aug 2023 18:59:47 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>>> Improvements are nice but it would not pay off if you have big regressions. I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and implementation adjusted to avoid them.
>> 
>> Hello Vladimir (@vnkozlov) ,
>> 
>> As per your suggestion, the implementation was adjusted to address the regressions caused for STAGGER and REPEATED type of input data patterns. 
>> Please see below the new JMH performance data using the adjusted implementation.
>> 
>> In the new implementation, we don't call the AVX512 sort intrinsic at the top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building blocks' approach where we only intrinsify  (using AVX512 instructions) the partitioning and small array sort functions used in the current baseline ( `DualPivotQuickSort.sort()` ). Since the current baseline has logic to detect and sort special input patterns like STAGGER, we fallback to the current baseline instead of using AVX512 partitioning and sorting (which works best for RANDOM, REPEATED and SHUFFLE patterns).
>> 
>> Data below shows `Arrays.sort()` performance comparison between the current **Java baseline (DPQS)** vs. **AVX512 sort** (this PR)  using the `ArraysSort.java` JMH [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568)
>> 
>> - The following command line was used to run the benchmarks: ` java -jar $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort`
>> - The scores shown are the average time (us/op),  thus lower is better. The last column towards the right shows the speedup. 
>> 
>> 
>> |	Benchmark	|	Mode	|	Size	|	Baseline DPQS (us/op)	|	AVX512 partitioning & sort (us/op)	|	Speedup	|
>> |	---	|	---	|	---	|	---	|	---	|	---	|
>> |	ArraysSort.Double.testSort	|	RANDOM	|	800	|	6.7	|	4.8	|	1.39x	|
>> |	ArraysSort.Double.testSort	|	RANDOM	|	7000	|	234.1	|	51.5	|	**4.55x**	|
>> |	ArraysSort.Double.testSort	|	RANDOM	|	50000	|	2155.9	|	470.0	|	**4.59x**	|
>> |	ArraysSort.Double.testSort	|	RANDOM	|	300000	|	15076.3	|	3391.3	|	**4.45x**	|
>> |	ArraysSort.Double.testSort	|	RANDOM	|	2000000	|	116445.5	|	27491.7	|	**4.24x**	|
>> |	ArraysSort.Double.testSort	|	REPEATED	|	800	|	2.3	|	1.7	|	1.35x	|
>> |	ArraysSort.Double.testSort	|	REPEATED	|	7000	|	23.3	|	12...
>
>> @vamsi-parasa I submitted our testing of latest v28 version. It found issue in `ArraysSort.java` new benchmark file. You missed `,`after year in copyright line:
>> 
>> ```
>>  * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
>> ```
> 
> Thank you, Vladimir!

Hi @vamsi-parasa ,

I did some runs on linux by disabling intrinsification of  _arraySort and _arrayPartition since these are not handled for windows and non-x86 targets, just to measure the impact of java side code re-factoring. 

![image](https://github.com/openjdk/jdk/assets/59989778/2866e684-955d-4902-af61-e08ac3f548d9)

![image](https://github.com/openjdk/jdk/assets/59989778/0a5e3265-b821-4a43-874e-6f1501423a07)

![image](https://github.com/openjdk/jdk/assets/59989778/f192f259-36e6-4e3e-8082-5cb0c05c9f10)


Please find the results, may I request you to kindly verify performance number on windows to be sure that there are no performance degradation.

Best Regards,
Jatin

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1696209731


More information about the build-dev mailing list