RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]
Andrew Haley
aph at openjdk.java.net
Tue Jan 12 10:28:57 UTC 2021
On Mon, 11 Jan 2021 12:28:57 GMT, Dong Bo <dongbo at openjdk.org> wrote:
>> The `Math.abs(doublesA[i] - doublesB[i])` has `~36%` improvements.
>> I updated the tests for doubles with `Math.abs()`, it looks more consistent. Thanks.
>> The JMH results of doubles with `Math.abs()`:
>> Benchmark (COUNT_DOUBLE) (COUNT_FLOAT) (seed) Mode Cnt Score Error Units
>> # Kunpeng 916, default
>> VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 681.319 ± 0.658 ns/op
>> VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 682.596 ± 4.322 ns/op
>> # Kunpeng 916, fmaxp/fminp
>> VectorReductionFloatingMinMax.maxRedD 512 3 0 avgt 10 439.130 ± 0.450 ns/op => 35.54%
>> VectorReductionFloatingMinMax.minRedD 512 3 0 avgt 10 439.105 ± 0.435 ns/op => 35.67%
>
> For single-precision floating-point operands, as the experiments showed, we can have `Max2F` match only with `COUNT == 3`.
> With such a small loop under superword framework, it is diffcult to tell how much improvements of `fmaxp/fminp` over `fmaxv+ins`.
>
> Although it sounds unreasonable for an application to use `Float64Vector` rather than `Float128Vecotor`,
> the optimization does being useful for VectorAPI `Float64Vector.reduceLanes(VectorOperators.MAX)` as mentioned previously.
>
> Do you think we should remove single-precision floating-point parts in this patch?
OK, I guess we'll keep both. Even though the acceleration for single-precision float is disappointing on these cores, it might well be useful for some future processor, and I do care about the Vector API.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1925
More information about the hotspot-dev
mailing list