RFR: 8282926: AArch64: Optimize out WHILELO with PTRUE

Ningsheng Jian njian at openjdk.java.net
Mon Mar 14 01:33:40 UTC 2022


On Fri, 11 Mar 2022 10:40:00 GMT, Eric Liu <eliu at openjdk.org> wrote:

> This patch uses PTRUE instruction instead of WHILELO instruction to
> create vector masks for certain length. It would be more efficient than
> WHILELO instruction according to the software optimization guide of
> Neoverse N2[1], Neoverse V1[2], and A64FX[3].
> 
> The final code changes as shown below:
> 
> Before:
> 
>     0x0000ffff6d4747b4:   orr     x8, xzr, #0x10
>     0x0000ffff6d4747b8:   whilelo p0.b, xzr, x8
> 
> After:
> 
>     0x0000ffff89476aec:   ptrue   p0.b, vl16
> 
> The micro benchmark improves 15% ~ 20% in my SVE test system.
> 
> [TEST]
> jdk/incubator/vector, hotspot/compiler/vectorapi passed on my SVE test
> machine.
> 
> [1] https://developer.arm.com/documentation/PJDOC-466751330-18256/0001
> [2] https://developer.arm.com/documentation/pjdoc466751330-9685/latest/
> [3] https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf

Looks good to me.

-------------

Marked as reviewed by njian (Committer).

PR: https://git.openjdk.java.net/jdk/pull/7786


More information about the hotspot-compiler-dev mailing list