RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]
Dong Bo
dongbo at openjdk.java.net
Mon Jan 11 12:31:56 UTC 2021
On Mon, 11 Jan 2021 11:38:01 GMT, Dong Bo <dongbo at openjdk.org> wrote:
>> Did you try math.abs() for doubles?
>
> The `Math.abs(doublesA[i] - doublesB[i])` has `~36%` improvements.
> I updated the tests for doubles with `Math.abs()`, it looks more consistent. Thanks.
> The JMH results of doubles with `Math.abs()`:
> Benchmark (COUNT_DOUBLE) (COUNT_FLOAT) (seed) Mode Cnt Score Error Units
> # Kunpeng 916, default
> VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 681.319 ± 0.658 ns/op
> VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 682.596 ± 4.322 ns/op
> # Kunpeng 916, fmaxp/fminp
> VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 439.130 ± 0.450 ns/op => 35.54%
> VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 439.105 ± 0.435 ns/op => 35.67%
For single-precision floating-point operands, as the experiments showed, we can have `Max2F` match only with `COUNT == 3`.
With such a small loop under superword framework, it is diffcult to tell how much improvements of `fmaxp/fminp` over `fmaxv+ins`.
Although it sounds unreasonable for an application to use `Float64Vector` rather than `Float128Vecotor`,
the optimization does being useful for VectorAPI `Float64Vector.reduceLanes(VectorOperators.MAX)` as mentioned previously.
Do you think we should remove single-precision floating-point parts in this patch?
-------------
PR: https://git.openjdk.java.net/jdk/pull/1925
More information about the hotspot-dev
mailing list