RFR: 8292587: AArch64: Support SVE fabd instruction
Hao Sun
haosun at openjdk.org
Thu Aug 25 01:59:56 UTC 2022
Scalar and NEON fabd instructions were initially supported in
JDK-8256318. In this patch, we support SVE fabd instruction [1] and add
one Jtreg test case as well.
With this patch, two instructions `fsub + fabs` would be combined into
one single `fabd` instruction.
fsub z16.s, z16.s, z17.s
fabs z16.s, p7/m, z16.s
-->
fabd z16.s, p7/m, z16.s, z17.s
In the initial evaluation of JMH case, i.e.
FloatingScalarVectorAbsDiff.java, we found the performance uplift done
by this optimization was easily hidden by the heavy memory load/store
instructions. To avoid that, we updated the JMH case a bit, adding one
more group of subtraction and Math.abs operations in the loop body.
Here shows the data with the new JMH case on one 256-bit SVE machine. We
can observe about 39% and 35% improvements for the two functions
respectively.
Benchmark Before After Units
FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op
FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op
Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine.
[1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated--
-------------
Commit messages:
- 8292587: AArch64: Support SVE fabd instruction
Changes: https://git.openjdk.org/jdk/pull/10011/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10011&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8292587
Stats: 277 lines in 7 files changed: 234 ins; 0 del; 43 mod
Patch: https://git.openjdk.org/jdk/pull/10011.diff
Fetch: git fetch https://git.openjdk.org/jdk pull/10011/head:pull/10011
PR: https://git.openjdk.org/jdk/pull/10011
More information about the hotspot-compiler-dev
mailing list