RFR: 8282528: AArch64: Incorrect replicate2L_zero rule [v2]
Eric Liu
eliu at openjdk.java.net
Fri Apr 8 02:31:48 UTC 2022
On Wed, 6 Apr 2022 03:38:27 GMT, Eric Liu <eliu at openjdk.org> wrote:
>> This patch fixes the wrong matching rule of replicate2L_zero. It was
>> matched "ReplicateI" by mistake so that long immediates(not only zero)
>> had to be moved to register first and matched to replicate2L finally. To
>> fix this trivial bug, this patch fixes the typo and extends the rule of
>> replicate2L_zero to replicate2L_imm, which now supports all possible
>> long immediate values.
>>
>> The final code changes are shown as below:
>>
>> replicate2L_imm:
>>
>> mov x13, #0xff
>> movk x13, #0xff, lsl #16
>> movk x13, #0xff, lsl #32
>> dup v16.2d, x13
>>
>> =>
>>
>> movi v16.2d, #0xff00ff00ff
>>
>> [Test]
>> test/jdk/jdk/incubator/vector, test/hotspot/jtreg/compiler/vectorapi
>> passed without failure.
>
> Eric Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> fix comment
>
> Change-Id: Ic51820391d19b61e37847cc04375ecd79fc86779
@theRealAph Could you help to take a look at this?
The latest commit refines the code
generator for macro mov, which now will generate DUP for those
immediates can not be encoded in MOVI. E.g., for the case of
IntVector.broadcast(0x12345678), the final code changes are shown as
below:
Before:
movi v16.4s, #0x78
orr v16.4s, #0x56, lsl <span>#</span>8
orr v16.4s, #0x34, lsl <span>#<span>16
orr v16.4s, #0x12, lsl <span>#<span>24
After:
mov w14, #0x5678
movk w14, #0x1234, lsl <span>#<span>16
dup v16.4s, w14
LLVM also uses DUP for those unencodable immediates which GCC loads from
constant pool.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7939
More information about the hotspot-dev
mailing list