RFR: 8282926: AArch64: Optimize out WHILELO with PTRUE
Nick Gasson
ngasson at openjdk.java.net
Wed Mar 30 14:19:46 UTC 2022
On Fri, 11 Mar 2022 10:40:00 GMT, Eric Liu <eliu at openjdk.org> wrote:
> This patch uses PTRUE instruction instead of WHILELO instruction to
> create vector masks for certain length. It would be more efficient than
> WHILELO instruction according to the software optimization guide of
> Neoverse N2[1], Neoverse V1[2], and A64FX[3].
>
> The final code changes as shown below:
>
> Before:
>
> 0x0000ffff6d4747b4: orr x8, xzr, #0x10
> 0x0000ffff6d4747b8: whilelo p0.b, xzr, x8
>
> After:
>
> 0x0000ffff89476aec: ptrue p0.b, vl16
>
> The micro benchmark improves 15% ~ 20% in my SVE test system.
>
> [TEST]
> jdk/incubator/vector, hotspot/compiler/vectorapi passed on my SVE test
> machine.
>
> [1] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
> [2] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
> [3] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf
Marked as reviewed by ngasson (Reviewer).
-------------
PR: https://git.openjdk.java.net/jdk/pull/7786
More information about the hotspot-compiler-dev
mailing list