RFR: 8282528: AArch64: Incorrect replicate2L_zero rule [v2]

Fri Apr 8 02:31:48 UTC 2022

On Wed, 6 Apr 2022 03:38:27 GMT, Eric Liu <eliu at openjdk.org> wrote:

>> This patch fixes the wrong matching rule of replicate2L_zero. It was
>> matched "ReplicateI" by mistake so that long immediates(not only zero)
>> had to be moved to register first and matched to replicate2L finally. To
>> fix this trivial bug, this patch fixes the typo and extends the rule of
>> replicate2L_zero to replicate2L_imm, which now supports all possible
>> long immediate values.
>> 
>> The final code changes are shown as below:
>> 
>> replicate2L_imm:
>> 
>>         mov   x13, #0xff
>>         movk  x13, #0xff, lsl #16
>>         movk  x13, #0xff, lsl #32
>>         dup   v16.2d, x13
>> 
>>         =>
>> 
>>         movi  v16.2d, #0xff00ff00ff
>> 
>> [Test]
>> test/jdk/jdk/incubator/vector, test/hotspot/jtreg/compiler/vectorapi
>> passed without failure.
>
> Eric Liu has updated the pull request incrementally with one additional commit since the last revision:
> 
>   fix comment
>   
>   Change-Id: Ic51820391d19b61e37847cc04375ecd79fc86779

@theRealAph  Could you help to take a look at this?

The latest commit refines the code
generator for macro mov, which now will generate DUP for those
immediates can not be encoded in MOVI. E.g., for the case of
IntVector.broadcast(0x12345678), the final code changes are shown as
below:

Before:
        movi    v16.4s, #0x78
        orr     v16.4s, #0x56, lsl <span>#</span>8
        orr     v16.4s, #0x34, lsl <span>#<span>16
        orr     v16.4s, #0x12, lsl <span>#<span>24

After:
        mov     w14, #0x5678
        movk    w14, #0x1234, lsl  <span>#<span>16
        dup     v16.4s, w14

LLVM also uses DUP for those unencodable immediates which GCC loads from
constant pool.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7939