RFR: 8181633: Vectorization fails for some multiplication with constant cases

Yang Zhang yang.zhang at linaro.org
Wed Jun 21 06:34:47 UTC 2017


>
> Do I understand correctly that the problem is we pack not similar nodes into
> the same set? Which cause later non-profitable result for such sets.
> I am trying understand why additional restriction helps.

Yes. Just like the following Packs. In Pack 24 and 25, node pair
(434,117) and (440,157) are packed incorrectly. In IdealGraph, this
problem would be more clear. I also attach the generated assembly
files( test case is previous code. opt is the result with the patch).
Please check it.

Pack: 18
 align: 8 432 StoreI ===  525  477  439  433  [[ 418  192  151  112 ]]
 align: 12 192 StoreI ===  525  432  190  158  [[ 416  533  406 ]]
Pack: 19
 align: 8 442 LoadI ===  228  477  443  [[ 440  441 ]]
 align: 12 112 LoadI ===  228  432  110  [[ 117  116 ]]
Pack: 20
 align: 8 445 LoadI ===  244  477  446  [[ 435  444 ]]
 align: 12 151 LoadI ===  244  432  149  [[ 156  154 ]]
Pack: 21
 align: 8 433 AddI === _  434  440  [[ 432 ]]
 align: 12 158 AddI === _  117  157  [[ 192 ]]
Pack: 22
 align: 8 441 LShiftI === _  442  108  [[ 440 ]]
 align: 12 116 LShiftI === _  112  108  [[ 117 ]]
Pack: 23
 align: 8 435 LShiftI === _  445  40  [[ 434 ]]
 align: 12 154 LShiftI === _  151  40  [[ 157 ]]
Pack: 24
 align: 8 434 AddI === _  435  444  [[ 433 ]]
 align: 12 117 AddI === _  116  112  [[ 158 ]]
Pack: 25
 align: 8 440 AddI === _  441  442  [[ 433 ]]
 align: 12 157 AddI === _  154  156  [[ 158 ]]
Pack: 26
 align: 8 444 LShiftI === _  445  155  [[ 434 ]]
 align: 12 156 LShiftI === _  151  155  [[ 157 ]]


>
> Did you try constants with 1 bit set (which converted to simple shift) or 3
> bits set (which keep multipmultiplication)?
>

In my test, both of constants should be split to shift and add, such
as (5, 10) (9, 17) . For other cases, such as (5, 8) (7, 11), there
won't be such a problem.


>>
>> This bug results from that the rules of matching two similar
>> independent nodes are not strict enough. So that I add more matching
>> rules. With this patch, both on x86 and aarch64, SIMD instructions can
>> be generated for above test case. And there is obvious performance
>> improvement (~30% in jmh).
>
>
> What other performance tests you ran?

No.

Regards,
Yang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMulC.java.x86.asm
Type: application/octet-stream
Size: 52980 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170621/37880c32/TestMulC.java.x86-0001.asm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMulC.java.x86.opt.asm
Type: application/octet-stream
Size: 64634 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170621/37880c32/TestMulC.java.x86.opt-0001.asm>


More information about the hotspot-compiler-dev mailing list