RFR: 8262916: Merge LShiftCntV and RShiftCntV into a single node
Eric Liu
eliu at openjdk.java.net
Thu Apr 8 11:56:18 UTC 2021
On Thu, 8 Apr 2021 11:19:27 GMT, Vladimir Ivanov <vlivanov at openjdk.org> wrote:
> Regarding the proposed change itself (`LShiftCntV/RShiftCntV => ShiftCntV`).
>
> Not sure how important it is, but it has an unfortunate change in generated code for right vector shifts on AArch32: instead of sharing the result of index negation at all use sites, negation is performed at every use site now.
>
> As a consequence, in an auto-vectorized loop it will lead to:
>
> * 1 instruction per loop iteration (multiplied by unrolling factor);
> * no way to hoist the negation of loop invariant index.
Thanks for your feedback!
I have checked the generated code on AArch32 and it's a shame that 'vneg' is at every use point.
Before:
0xf46c8338: add fp, r7, fp
0xf46c833c: vshl.u16 d9, d9, d8
0xf46c8340: vstr d9, [fp, #12]
0xf46c8344: vshl.u16 d9, d10, d8
0xf46c8348: vstr d9, [fp, #20]
0xf46c834c: vshl.u16 d9, d11, d8
0xf46c8350: vstr d9, [fp, #28]
After:
0xf4aa1328: add fp, r7, fp
0xf4aa132c: vneg.s8 d13, d8
0xf4aa1330: vshl.u16 d9, d9, d13
0xf4aa1334: vstr d9, [fp, #12]
0xf4aa1338: vneg.s8 d9, d8
0xf4aa133c: vshl.u16 d9, d10, d9
0xf4aa1340: vstr d9, [fp, #20]
0xf4aa1344: vneg.s8 d9, d8
0xf4aa1348: vshl.u16 d9, d11, d9
0xf4aa134c: vstr d9, [fp, #28]
I suppose it's more like a trade off that either remaining those two R/LShiftCntV nodes and change AArch64 and X86 to what AArch32 dose, or merging them and accept this defect on AArch32.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3371
More information about the hotspot-compiler-dev
mailing list