RFR: 8282528: AArch64: Incorrect replicate2L_zero rule
Eric Liu
eliu at openjdk.java.net
Thu Mar 24 12:16:47 UTC 2022
On Thu, 24 Mar 2022 09:47:06 GMT, Andrew Haley <aph at openjdk.org> wrote:
>> I think we need a `can_encode(imm, arrangement)` function.
>
> And then another function that actually does the arranging, and the generation of instructions calls those functions.
Thanks for your review. I agree with that `can_encode(imm, arrangment)` function is better. My concern is that this JBS is just a bug fix for replicate2L_imm backend, and for other SIMD_Arrangment, I found that they can have some other choice for the code generation, but I didn’t touch them in this patch to keep it clear and small. I list two examples below.
Example1:
movi v16.4s, #0x34
orr v16.4s, #0x12, lsl #8
vs
mov w8, #0x1234
dup v16.4s, w8
Example2:
movi v16.4s, #0x78
orr v16.4s, #0x56, lsl #8
orr v16.4s, #0x34, lsl #16
orr v16.4s, #0x12, lsl #24
vs
mov w14, #0x5678
movk w14, #0x1234, lsl #16
dup v16.4s, w14
I'm considering to measure the performance and refine the mov macro assembler if it's necessary. `can_encode` can also be done in the refined work. What do you think?
-------------
PR: https://git.openjdk.java.net/jdk/pull/7939
More information about the hotspot-dev
mailing list