RFR: 8258932: AArch64: Enhance floating-point Min/MaxReductionV with fminp/fmaxp [v3]

Thu Jan 7 11:21:57 UTC 2021

On Tue, 5 Jan 2021 13:42:21 GMT, Dong Bo <dongbo at openjdk.org> wrote:

>> This patch optimizes vectorial Min/Max reduction of two floating-point numbers on aarch64 with NEON instructions `fmaxp` and `fminp`.
>> 
>> Passed jtreg tier1-3 tests with `linux-aarch64-server-fastdebug` build.
>> Tests under `test/jdk/jdk/incubator/vector/` runned specially for the correctness and passed.
>> 
>> Introduced a new JMH micro `test/micro/org/openjdk/bench/vm/compiler/VectorReductionFloatingMinMax.java` for performance test.
>> Witnessed abount `37%` performance improvements on Kunpeng916. The JMH Results:
>> Benchmark                (COUNT)  (seed)  Mode  Cnt    Score   Error  Units
>> # Kunpeng 916, default
>> VectorReduction.maxRedD      512       0  avgt   10  678.126 ± 0.815  ns/op
>> VectorReduction.maxRedF      512       0  avgt   10  242.958 ± 0.212  ns/op
>> VectorReduction.minRedD      512       0  avgt   10  678.554 ± 0.824  ns/op
>> VectorReduction.minRedF      512       0  avgt   10  243.368 ± 0.205  ns/op
>> 
>> # Kunpeng 916, with fmaxp/fminp
>> VectorReduction.maxRedD      512       0  avgt   10  430.201 ± 0.353  ns/op => 36.56%
>> VectorReduction.maxRedF      512       0  avgt   10  243.404 ± 0.297  ns/op
>> VectorReduction.minRedD      512       0  avgt   10  427.805 ± 0.528  ns/op => 36.89%
>> VectorReduction.minRedF      512       0  avgt   10  242.963 ± 0.210  ns/op
>
> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
> 
>   put faddp/fmaxp/fminp together in a group

Changes requested by aph (Reviewer).

test/micro/org/openjdk/bench/vm/compiler/VectorReductionFloatingMinMax.java line 67:

> 65:         for (int i = 0; i < COUNT; i++) {
> 66:             max = Math.max(max, floatsA[i] - floatsB[i]);
> 67:         }

This test code looks a bit contrived. If you're looking for the smallest delta it'd be

Math.max(max, Math.abs(floatsA[i] - floatsB[i]));

and if you're looking for the smallest value it'd probably be

Math.max(max, floatsA[i]);

Do we gain any advantage with these?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1925