RFR: 8262916: Merge LShiftCntV and RShiftCntV into a single node
Dean Long
dlong at openjdk.java.net
Thu Apr 8 05:21:37 UTC 2021
On Wed, 7 Apr 2021 07:28:08 GMT, Eric Liu <eliu at openjdk.org> wrote:
> The vector shift count was defined by two separate nodes(LShiftCntV and
> RShiftCntV), which would prevent them from being shared when the shift
> counts are the same.
>
> public static void test_shiftv(int sh) {
> for (int i = 0; i < N; i+=1) {
> a0[i] = a1[i] << sh;
> b0[i] = b1[i] >> sh;
> }
> }
>
> Given the example above, by merging the same shift counts into one
> node, they could be shared by shift nodes(RShiftV or LShiftV) like
> below:
>
> Before:
> 1184 LShiftCntV === _ 1189 [[ 1185 ... ]]
> 1190 RShiftCntV === _ 1189 [[ 1191 ... ]]
> 1185 LShiftVI === _ 1181 1184 [[ 1186 ]]
> 1191 RShiftVI === _ 1187 1190 [[ 1192 ]]
>
> After:
> 1190 ShiftCntV === _ 1189 [[ 1191 1204 ... ]]
> 1204 LShiftVI === _ 1211 1190 [[ 1203 ]]
> 1191 RShiftVI === _ 1187 1190 [[ 1192 ]]
>
> The final code could remove one redundant “dup”(scalar->vector),
> with one register saved.
>
> Before:
> dup v16.16b, w12
> dup v17.16b, w12
> ...
> ldr q18, [x13, #16]
> sshl v18.4s, v18.4s, v16.4s
> add x18, x16, x12 ; iaload
>
> add x4, x15, x12
> str q18, [x4, #16] ; iastore
>
> ldr q18, [x18, #16]
> add x12, x14, x12
> neg v19.16b, v17.16b
> sshl v18.4s, v18.4s, v19.4s
> str q18, [x12, #16] ; iastore
>
> After:
> dup v16.16b, w11
> ...
> ldr q17, [x13, #16]
> sshl v17.4s, v17.4s, v16.4s
> add x2, x22, x11 ; iaload
>
> add x4, x16, x11
> str q17, [x4, #16] ; iastore
>
> ldr q17, [x2, #16]
> add x11, x21, x11
> neg v18.16b, v16.16b
> sshl v17.4s, v17.4s, v18.4s
> str q17, [x11, #16] ; iastore
You should be able to do this without introducing a new node type. You could change the shift rules to match a vector register like x86.ad and aarch64_sve.ad already do.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3371
More information about the hotspot-compiler-dev
mailing list