RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v3]

Francesco Nigro duke at openjdk.org
Thu Jun 1 14:01:06 UTC 2023


On Thu, 1 Jun 2023 01:16:22 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:

>> What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10.
>
>> What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10.
> 
> Thanks for the suggestion. Please see the performance for small array sizes below:
> 
> |	Arrays.sort benchmark	|	Array Size	|	Baseline	|	AVX512 Sort	|	Speedup	|
> |	---	|	---	|	---	|	---	|	---	|
> |	ArraysSort.intSort	|	10	|	0.029	|	0.018	|	1.6	|
> |	ArraysSort.intSort	|	25	|	0.086	|	0.032	|	2.7	|
> |	ArraysSort.intSort	|	50	|	0.236	|	0.056	|	4.2	|
> |	ArraysSort.intSort	|	75	|	0.409	|	0.111	|	3.7	|
> |	ArraysSort.longSort	|	10	|	0.031	|	0.033	|	0.9	|
> |	ArraysSort.longSort	|	25	|	0.09	|	0.061	|	1.5	|
> |	ArraysSort.longSort	|	50	|	0.228	|	0.127	|	1.8	|
> |	ArraysSort.longSort	|	75	|	0.382	|	0.28	|	1.4	|
> |	ArraysSort.doubleSort	|	10	|	0.037	|	0.043	|	0.9	|
> |	ArraysSort.doubleSort	|	25	|	0.129	|	0.066	|	2.0	|
> |	ArraysSort.doubleSort	|	50	|	0.267	|	0.115	|	2.3	|
> |	ArraysSort.doubleSort	|	75	|	0.549	|	0.219	|	2.5	|
> |	ArraysSort.floatSort	|	10	|	0.034	|	0.034	|	1.0	|
> |	ArraysSort.floatSort	|	25	|	0.088	|	0.053	|	1.7	|
> |	ArraysSort.floatSort	|	50	|	0.284	|	0.077	|	3.7	|
> |	ArraysSort.floatSort	|	75	|	0.484	|	0.126	|	3.8	|

Hi @vamsi-parasa ! 
Given https://bugs.openjdk.org/browse/JDK-8295496 I have noticed how much important is to add benchmark cases where offset and length parameters change and/or differ from the usual 0 and the whole array length.
Equally important is to warmup with different combinations of them in order to "pollute" the JIT existing decisions, making the compiled method (and stubs) to appear more similar to what users would observe in a real world scenario. Playing with the benchmark parameters like this, together with the advice of @theRealAph to try with small inputs (that matters a lot) would unveil any perf difference with the current impl.
In addition, I understand by https://github.com/openjdk/jdk/pull/14227/files#diff-1929ace9ae6df116e2fa2a718ed3924d9dae9a2daea454ca9a78177c21477aa3R5237 that's still not the case for such, at this implementation stage, hence mine is a wish for the final round impl for this PR. 🙏

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1572108986


More information about the hotspot-compiler-dev mailing list