RFR: 8282926: AArch64: Optimize out WHILELO with PTRUE

Nick Gasson ngasson at openjdk.java.net
Wed Mar 30 14:19:46 UTC 2022


On Fri, 11 Mar 2022 10:40:00 GMT, Eric Liu <eliu at openjdk.org> wrote:

> This patch uses PTRUE instruction instead of WHILELO instruction to
> create vector masks for certain length. It would be more efficient than
> WHILELO instruction according to the software optimization guide of
> Neoverse N2[1], Neoverse V1[2], and A64FX[3].
> 
> The final code changes as shown below:
> 
> Before:
> 
>     0x0000ffff6d4747b4:   orr     x8, xzr, #0x10
>     0x0000ffff6d4747b8:   whilelo p0.b, xzr, x8
> 
> After:
> 
>     0x0000ffff89476aec:   ptrue   p0.b, vl16
> 
> The micro benchmark improves 15% ~ 20% in my SVE test system.
> 
> [TEST]
> jdk/incubator/vector, hotspot/compiler/vectorapi passed on my SVE test
> machine.
> 
> [1] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
> [2] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
> [3] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf

Marked as reviewed by ngasson (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/7786


More information about the hotspot-compiler-dev mailing list