RFR: 8262916: Merge LShiftCntV and RShiftCntV into a single node

Thu Apr 8 11:56:18 UTC 2021

On Thu, 8 Apr 2021 11:19:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:

> Regarding the proposed change itself (`LShiftCntV/RShiftCntV => ShiftCntV`).
> 
> Not sure how important it is, but it has an unfortunate change in generated code for right vector shifts on AArch32: instead of sharing the result of index negation at all use sites, negation is performed at every use site now.
> 
> As a consequence, in an auto-vectorized loop it will lead to:
> 
> * 1 instruction per loop iteration (multiplied by unrolling factor);
> * no way to hoist the negation of loop invariant index.

Thanks for your feedback!

I have checked the generated code on AArch32 and it's a shame that 'vneg' is at every use point.

Before:
0xf46c8338:   add fp, r7, fp
0xf46c833c:   vshl.u16  d9, d9, d8
0xf46c8340:   vstr  d9, [fp, #12]
0xf46c8344:   vshl.u16  d9, d10, d8
0xf46c8348:   vstr  d9, [fp, #20]
0xf46c834c:   vshl.u16  d9, d11, d8
0xf46c8350:   vstr  d9, [fp, #28]

After:
0xf4aa1328:   add fp, r7, fp
0xf4aa132c:   vneg.s8 d13, d8
0xf4aa1330:   vshl.u16  d9, d9, d13
0xf4aa1334:   vstr  d9, [fp, #12]
0xf4aa1338:   vneg.s8 d9, d8
0xf4aa133c:   vshl.u16  d9, d10, d9
0xf4aa1340:   vstr  d9, [fp, #20]
0xf4aa1344:   vneg.s8 d9, d8
0xf4aa1348:   vshl.u16  d9, d11, d9
0xf4aa134c:   vstr  d9, [fp, #28]  

I suppose it's more like a trade off that either remaining those two R/LShiftCntV nodes and change AArch64 and X86  to what AArch32 dose, or merging them and accept this defect on AArch32.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3371