RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]

Fri Jan 8 02:53:58 UTC 2021

On Wed, 6 Jan 2021 10:04:45 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   put faddp/fmaxp/fminp together in a group
>
> test/micro/org/openjdk/bench/vm/compiler/VectorReductionFloatingMinMax.java line 67:
> 
>> 65:         for (int i = 0; i < COUNT; i++) {
>> 66:             max = Math.max(max, floatsA[i] - floatsB[i]);
>> 67:         }
> 
> This test code looks a bit contrived. If you're looking for the smallest delta it'd be
> 
> Math.max(max, Math.abs(floatsA[i] - floatsB[i]));
> 
> and if you're looking for the smallest value it'd probably be
> 
> Math.max(max, floatsA[i]);
> 
> Do we gain any advantage with these?

Hi,

As the experiment shows, we do not gain improvements with these.

For `Math.max(max, Math.abs(floatsA[i] - floatsB[i]))`, the code is not vectorized if `COUNT < 12`.
When `COUNT == 12`, the node we have is not `Max2F` but `Max4F`.
The `Math.max(max, floatsA[i] - floatsB[i])` suffers same problem that it does not match `Max2F` with small `COUNT`.

For `Math.max(max, floatsA[i])`, it is not auto-vectorized even with `COUNT = 512`.
I think the auto-vectorized optimization for this is disabled by JDK-8078563 [1].

One of the advantages of `Max2F with fmaxp` can gain for `VectorAPI`, the test code is available in [2].
We witnessed about `12%` improvements for `reduceLanes(VectorOperators.MAX)` of `FloatVector.SPECIES_64`:
Benchmark                         (COUNT)  (seed)  Mode  Cnt    Score   Error  Units
# Kunpeng 916, default
VectorReduction2FMinMax.maxRed2F      512       0  avgt   10  667.173 ± 0.576  ns/op
VectorReduction2FMinMax.minRed2F      512       0  avgt   10  667.172 ± 0.649  ns/op
# Kunpeng 916, with fmaxp/fminp
VectorReduction2FMinMax.maxRed2F      512       0  avgt   10  592.404 ± 0.885  ns/op
VectorReduction2FMinMax.minRed2F      512       0  avgt   10  592.293 ± 0.607  ns/op

I agree the testcode for `floats` in `VectorReductionFloatingMinMax.java` is contrived.
Do you think we should replace the tests for `MinMaxF` in `VectorReductionFloatingMinMax` with tests in [2]?

[1] https://bugs.openjdk.java.net/browse/JDK-8078563
[2] [VectorReduction2FMinMax.java.txt](https://github.com/openjdk/jdk/files/5784948/VectorReduction2FMinMax.java.txt)

-------------

PR: https://git.openjdk.java.net/jdk/pull/1925