RFR: 8262916: Merge LShiftCntV and RShiftCntV into a single node [v2]
Eric Liu
eliu at openjdk.java.net
Mon Apr 19 09:02:39 UTC 2021
On Mon, 19 Apr 2021 08:54:49 GMT, Eric Liu <eliu at openjdk.org> wrote:
>> The vector shift count was defined by two separate nodes(LShiftCntV and
>> RShiftCntV), which would prevent them from being shared when the shift
>> counts are the same.
>>
>>
>> public static void test_shiftv(int sh) {
>> for (int i = 0; i < N; i+=1) {
>> a0[i] = a1[i] << sh;
>> b0[i] = b1[i] >> sh;
>> }
>> }
>>
>>
>> Given the example above, by merging the same shift counts into one
>> node, they could be shared by shift nodes(RShiftV or LShiftV) like
>> below:
>>
>>
>> Before:
>> 1184 LShiftCntV === _ 1189 [[ 1185 ... ]]
>> 1190 RShiftCntV === _ 1189 [[ 1191 ... ]]
>> 1185 LShiftVI === _ 1181 1184 [[ 1186 ]]
>> 1191 RShiftVI === _ 1187 1190 [[ 1192 ]]
>>
>> After:
>> 1190 ShiftCntV === _ 1189 [[ 1191 1204 ... ]]
>> 1204 LShiftVI === _ 1211 1190 [[ 1203 ]]
>> 1191 RShiftVI === _ 1187 1190 [[ 1192 ]]
>>
>>
>> The final code could remove one redundant “dup”(scalar->vector),
>> with one register saved.
>>
>>
>> Before:
>> dup v16.16b, w12
>> dup v17.16b, w12
>> ...
>> ldr q18, [x13, #16]
>> sshl v18.4s, v18.4s, v16.4s
>> add x18, x16, x12 ; iaload
>>
>> add x4, x15, x12
>> str q18, [x4, #16] ; iastore
>>
>> ldr q18, [x18, #16]
>> add x12, x14, x12
>> neg v19.16b, v17.16b
>> sshl v18.4s, v18.4s, v19.4s
>> str q18, [x12, #16] ; iastore
>>
>> After:
>> dup v16.16b, w11
>> ...
>> ldr q17, [x13, #16]
>> sshl v17.4s, v17.4s, v16.4s
>> add x2, x22, x11 ; iaload
>>
>> add x4, x16, x11
>> str q17, [x4, #16] ; iastore
>>
>> ldr q17, [x2, #16]
>> add x11, x21, x11
>> neg v18.16b, v16.16b
>> sshl v17.4s, v17.4s, v18.4s
>> str q17, [x11, #16] ; iastore
>
> Eric Liu has updated the pull request incrementally with one additional commit since the last revision:
>
> code backup
>
> Change-Id: Ie9046b1d7e8f5e2669767756b6b074b564523039
VectorAPI would *not* profit from this two nodes' separation as the input of RShiftVNode may *not* be a RShiftCntVNode[1]. Inserting a special 'VNEG' only for AArch64 in mid-end maybe work but seems too ugly and merging those two nodes would harm AArch32. It's quite hard to compromise the benefits between AArch64 and other architectures.
[1] https://github.com/openjdk/jdk/blob/jdk-17%2B18/src/hotspot/cpu/aarch64/aarch64_neon.ad#L5179
-------------
PR: https://git.openjdk.java.net/jdk/pull/3371
More information about the hotspot-compiler-dev
mailing list