RFR: 8262916: Merge LShiftCntV and RShiftCntV into a single node

Eric Liu eliu at openjdk.java.net
Wed Apr 14 10:05:56 UTC 2021


On Fri, 9 Apr 2021 09:26:19 GMT, Andrew Haley <aph at openjdk.org> wrote:

>> The vector shift count was defined by two separate nodes(LShiftCntV and
>> RShiftCntV), which would prevent them from being shared when the shift
>> counts are the same.
>> 
>> 
>> public static void test_shiftv(int sh) {
>>     for (int i = 0; i < N; i+=1) {
>>         a0[i] = a1[i] << sh;
>>         b0[i] = b1[i] >> sh;
>>     }
>> }
>> 
>> 
>> Given the example above, by merging the same shift counts into one
>> node, they could be shared by shift nodes(RShiftV or LShiftV) like
>> below:
>> 
>> 
>> Before:
>> 1184  LShiftCntV  === _  1189  [[ 1185  ... ]]
>> 1190  RShiftCntV  === _  1189  [[ 1191  ... ]]
>> 1185  LShiftVI  === _  1181  1184  [[ 1186 ]]
>> 1191  RShiftVI  === _  1187  1190  [[ 1192 ]]
>> 
>> After:
>> 1190  ShiftCntV  === _  1189  [[ 1191 1204  ... ]]
>> 1204  LShiftVI  === _  1211  1190  [[ 1203 ]]
>> 1191  RShiftVI  === _  1187  1190  [[ 1192 ]]
>> 
>> 
>> The final code could remove one redundant “dup”(scalar->vector),
>> with one register saved.
>> 
>> 
>> Before:
>>         dup     v16.16b, w12
>>         dup     v17.16b, w12
>>         ...
>>         ldr     q18, [x13, #16]
>>         sshl    v18.4s, v18.4s, v16.4s
>>         add     x18, x16, x12           ; iaload
>> 
>>         add     x4, x15, x12
>>         str     q18, [x4, #16]          ; iastore
>> 
>>         ldr     q18, [x18, #16]
>>         add     x12, x14, x12
>>         neg     v19.16b, v17.16b
>>         sshl    v18.4s, v18.4s, v19.4s
>>         str     q18, [x12, #16]         ; iastore
>> 
>> After:
>>         dup	v16.16b, w11
>>         ...
>>         ldr	q17, [x13, #16]
>>         sshl	v17.4s, v17.4s, v16.4s
>>         add	x2, x22, x11            ; iaload
>> 
>>         add	x4, x16, x11
>>         str	q17, [x4, #16]          ; iastore
>> 
>>         ldr	q17, [x2, #16]
>>         add	x11, x21, x11
>>         neg	v18.16b, v16.16b
>>         sshl	v17.4s, v17.4s, v18.4s
>>         str	q17, [x11, #16]         ; iastore
>
>> 
>> It seems that keeping those two RShiftCntV and LShiftCntV is friendly to AArch32/64 in this case, but AArch64 should changed to what AArch32 dose. @theRealAph
> 
> Thanks, but it's been a while since I looked at the vector code. Can you point me to the AArch32 patterns in question, to show me the AArch64 changes needed? Thanks.

@theRealAph Could you please take a look at this?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3371


More information about the hotspot-compiler-dev mailing list