RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]

Tue Jan 12 10:28:57 UTC 2021

On Mon, 11 Jan 2021 12:28:57 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> The `Math.abs(doublesA[i] - doublesB[i])` has `~36%` improvements.
>> I updated the tests for doubles with `Math.abs()`, it looks more consistent. Thanks.
>> The JMH results of doubles with `Math.abs()`:
>> Benchmark                              (COUNT_DOUBLE)  (COUNT_FLOAT)  (seed)  Mode  Cnt    Score   Error  Units
>> # Kunpeng 916, default
>> VectorReductionFloatingMinMax.maxRedD             512              3       0  avgt   10  681.319 ± 0.658  ns/op
>> VectorReductionFloatingMinMax.minRedD             512              3       0  avgt   10  682.596 ± 4.322  ns/op
>> # Kunpeng 916, fmaxp/fminp
>> VectorReductionFloatingMinMax.maxRedD             512              3       0  avgt   10  439.130 ± 0.450  ns/op => 35.54%
>> VectorReductionFloatingMinMax.minRedD             512              3       0  avgt   10  439.105 ± 0.435  ns/op => 35.67%
>
> For single-precision floating-point operands, as the experiments showed, we can have `Max2F` match only with `COUNT == 3`.
> With such a small loop under superword framework, it is diffcult to tell how much improvements of `fmaxp/fminp` over `fmaxv+ins`.
> 
> Although it sounds unreasonable for an application to use `Float64Vector` rather than `Float128Vecotor`,
> the optimization does being useful for VectorAPI `Float64Vector.reduceLanes(VectorOperators.MAX)` as mentioned previously.
> 
> Do you think we should remove single-precision floating-point parts in this patch?

OK, I guess we'll keep both. Even though the acceleration for single-precision float is disappointing on these cores, it might well be useful for some future processor, and I do care about the Vector API.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1925