RFR: 8282528: AArch64: Incorrect replicate2L_zero rule

Thu Mar 24 15:11:47 UTC 2022

On Thu, 24 Mar 2022 12:13:51 GMT, Eric Liu <eliu at openjdk.org> wrote:

>> And then another function that actually does the arranging, and the generation of instructions calls those functions.
>
> Thanks for your review. I agree with that `can_encode(imm, arrangment)` function is better. My concern is that this JBS is just a bug fix for replicate2L_imm backend, and for other SIMD_Arrangment, I found that they can have some other choice for the code generation, but I didn’t touch them in this patch to keep it clear and small.  I show two examples below.
> 
> Example1:
> 
>         movi  v16.4s, #0x34
>         orr v16.4s, #0x12, lsl #8
> 
>         vs
> 
>         mov w8, #0x1234
>         dup v16.4s, w8
> 
> 
> Example2:
> 
>         movi    v16.4s, #0x78
>         orr     v16.4s, #0x56, lsl #8
>         orr     v16.4s, #0x34, lsl #16
>         orr     v16.4s, #0x12, lsl #24
> 
>         vs
> 
>         mov     w14, #0x5678
>         movk    w14, #0x1234, lsl #16
>         dup     v16.4s, w14
> 
> 
> I'm considering to measure the performance and refine the mov macro assembler if it's necessary. `can_encode` can also be done in the refined work. What do you think?

Sure. I'm looking at Neoverse V1 Optimization Guide, which suggests a fairly high cost for core - SIMD moves, and also only a 2 (of 4) of the SIMD pipelines can communicate with the integer registers. So I've got an idea.
Please feel free to do any reorganization later, if you like. It's just that the current organization makes it hard to follow, and thus hard to review.

-------------

PR: https://git.openjdk.java.net/jdk/pull/7939