RFR: 8282926: AArch64: Optimize out WHILELO with PTRUE

Eric Liu eliu at openjdk.java.net
Fri Mar 11 10:47:00 UTC 2022


This patch uses PTRUE instruction instead of WHILELO instruction to
create vector masks for certain length. It would be more efficient than
WHILELO instruction according to the software optimization guide of
Neoverse N2[1], Neoverse V1[2], and A64FX[3].

The final code changes as shown below:

Before:

    0x0000ffff6d4747b4:   orr     x8, xzr, #0x10
    0x0000ffff6d4747b8:   whilelo p0.b, xzr, x8

After:

    0x0000ffff89476aec:   ptrue   p0.b, vl16

The micro benchmark improves 15% ~ 20% in my SVE test system.

[TEST]
jdk/incubator/vector, hotspot/compiler/vectorapi passed on my SVE test
machine.

[1] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
[2] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
[3] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf

-------------

Commit messages:
 - 8282926: AArch64: Optimize out WHILELO with PTRUE

Changes: https://git.openjdk.java.net/jdk/pull/7786/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7786&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8282926
  Stats: 174 lines in 3 files changed: 0 ins; 71 del; 103 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7786.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7786/head:pull/7786

PR: https://git.openjdk.java.net/jdk/pull/7786


More information about the hotspot-compiler-dev mailing list