RFR: 8292587: AArch64: Support SVE fabd instruction [v2]
Tobias Hartmann
thartmann at openjdk.org
Fri Sep 9 09:57:51 UTC 2022
On Thu, 8 Sep 2022 02:50:52 GMT, Hao Sun <haosun at openjdk.org> wrote:
>> Scalar and NEON fabd instructions were initially supported in
>> JDK-8256318. In this patch, we support SVE fabd instruction [1] and add
>> one Jtreg test case as well.
>>
>> With this patch, two instructions `fsub + fabs` would be combined into
>> one single `fabd` instruction.
>>
>>
>> fsub z16.s, z16.s, z17.s
>> fabs z16.s, p7/m, z16.s
>>
>> -->
>>
>> fabd z16.s, p7/m, z16.s, z17.s
>>
>>
>> In the initial evaluation of JMH case, i.e.
>> FloatingScalarVectorAbsDiff.java, we found the performance uplift done
>> by this optimization was easily hidden by the heavy memory load/store
>> instructions. To avoid that, we updated the JMH case a bit, adding one
>> more group of subtraction and Math.abs operations in the loop body.
>>
>> Here shows the data with the new JMH case on one 256-bit SVE machine. We
>> can observe about 39% and 35% improvements for the two functions
>> respectively.
>>
>>
>> Benchmark Before After Units
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op
>> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op
>>
>>
>> Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine.
>>
>> [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated--
>
> Hao Sun has updated the pull request incrementally with one additional commit since the last revision:
>
> Update the loop limit in VectorAbsDiffTest.java
>
> As pointed out by Faye Gao, the test results are not fully verified due
> to incorrect loop limits.
>
> Updated it.
>
> Reran the test and no regression.
I tested this in our CI. All tests passed.
-------------
PR: https://git.openjdk.org/jdk/pull/10011
More information about the hotspot-compiler-dev
mailing list