RFR: 8292587: AArch64: Support SVE fabd instruction [v2]

Fri Sep 9 09:57:51 UTC 2022

On Thu, 8 Sep 2022 02:50:52 GMT, Hao Sun <haosun at openjdk.org> wrote:

>> Scalar and NEON fabd instructions were initially supported in
>> JDK-8256318. In this patch, we support SVE fabd instruction [1] and add
>> one Jtreg test case as well.
>> 
>> With this patch, two instructions `fsub + fabs` would be combined into
>> one single `fabd` instruction.
>> 
>> 
>>   fsub    z16.s, z16.s, z17.s
>>   fabs    z16.s, p7/m, z16.s
>> 
>>   -->
>> 
>>   fabd    z16.s, p7/m, z16.s, z17.s
>> 
>> 
>> In the initial evaluation of JMH case, i.e.
>> FloatingScalarVectorAbsDiff.java, we found the performance uplift done
>> by this optimization was easily hidden by the heavy memory load/store
>> instructions. To avoid that, we updated the JMH case a bit, adding one
>> more group of subtraction and Math.abs operations in the loop body.
>> 
>> Here shows the data with the new JMH case on one 256-bit SVE machine. We
>> can observe about 39% and 35% improvements for the two functions
>> respectively.
>> 
>> 
>> Benchmark                                             Before    After  Units
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble  260.468  160.965  ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat   133.963   87.292  ns/op
>> 
>> 
>> Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine.
>> 
>> [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated--
>
> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update the loop limit in VectorAbsDiffTest.java
>   
>   As pointed out by Faye Gao, the test results are not fully verified due
>   to incorrect loop limits.
>   
>   Updated it.
>   
>   Reran the test and no regression.

I tested this in our CI. All tests passed.

-------------

PR: https://git.openjdk.org/jdk/pull/10011