RFR: 8282926: AArch64: Optimize out WHILELO with PTRUE
Eric Liu
eliu at openjdk.java.net
Fri Mar 11 10:47:00 UTC 2022
This patch uses PTRUE instruction instead of WHILELO instruction to
create vector masks for certain length. It would be more efficient than
WHILELO instruction according to the software optimization guide of
Neoverse N2[1], Neoverse V1[2], and A64FX[3].
The final code changes as shown below:
Before:
0x0000ffff6d4747b4: orr x8, xzr, #0x10
0x0000ffff6d4747b8: whilelo p0.b, xzr, x8
After:
0x0000ffff89476aec: ptrue p0.b, vl16
The micro benchmark improves 15% ~ 20% in my SVE test system.
[TEST]
jdk/incubator/vector, hotspot/compiler/vectorapi passed on my SVE test
machine.
[1] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
[2] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
[3] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf
-------------
Commit messages:
- 8282926: AArch64: Optimize out WHILELO with PTRUE
Changes: https://git.openjdk.java.net/jdk/pull/7786/files
Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7786&range=00
Issue: https://bugs.openjdk.java.net/browse/JDK-8282926
Stats: 174 lines in 3 files changed: 0 ins; 71 del; 103 mod
Patch: https://git.openjdk.java.net/jdk/pull/7786.diff
Fetch: git fetch https://git.openjdk.java.net/jdk pull/7786/head:pull/7786
PR: https://git.openjdk.java.net/jdk/pull/7786
More information about the hotspot-compiler-dev
mailing list