RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]

Mon Jan 11 12:31:56 UTC 2021

On Mon, 11 Jan 2021 11:38:01 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> Did you try math.abs() for doubles?
>
> The `Math.abs(doublesA[i] - doublesB[i])` has `~36%` improvements.
> I updated the tests for doubles with `Math.abs()`, it looks more consistent. Thanks.
> The JMH results of doubles with `Math.abs()`:
> Benchmark                              (COUNT_DOUBLE)  (COUNT_FLOAT)  (seed)  Mode  Cnt    Score   Error  Units
> # Kunpeng 916, default
> VectorReductionFloatingMinMax.maxRedD             512              3       0  avgt   10  681.319 ± 0.658  ns/op
> VectorReductionFloatingMinMax.minRedD             512              3       0  avgt   10  682.596 ± 4.322  ns/op
> # Kunpeng 916, fmaxp/fminp
> VectorReductionFloatingMinMax.maxRedD             512              3       0  avgt   10  439.130 ± 0.450  ns/op => 35.54%
> VectorReductionFloatingMinMax.minRedD             512              3       0  avgt   10  439.105 ± 0.435  ns/op => 35.67%

For single-precision floating-point operands, as the experiments showed, we can have `Max2F` match only with `COUNT == 3`.
With such a small loop under superword framework, it is diffcult to tell how much improvements of `fmaxp/fminp` over `fmaxv+ins`.

Although it sounds unreasonable for an application to use `Float64Vector` rather than `Float128Vecotor`,
the optimization does being useful for VectorAPI `Float64Vector.reduceLanes(VectorOperators.MAX)` as mentioned previously.

Do you think we should remove single-precision floating-point parts in this patch?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1925