RFR: 8181633: Vectorization fails for some multiplication with constant cases
Yang Zhang
yang.zhang at linaro.org
Wed Jun 21 06:34:47 UTC 2017
>
> Do I understand correctly that the problem is we pack not similar nodes into
> the same set? Which cause later non-profitable result for such sets.
> I am trying understand why additional restriction helps.
Yes. Just like the following Packs. In Pack 24 and 25, node pair
(434,117) and (440,157) are packed incorrectly. In IdealGraph, this
problem would be more clear. I also attach the generated assembly
files( test case is previous code. opt is the result with the patch).
Please check it.
Pack: 18
align: 8 432 StoreI === 525 477 439 433 [[ 418 192 151 112 ]]
align: 12 192 StoreI === 525 432 190 158 [[ 416 533 406 ]]
Pack: 19
align: 8 442 LoadI === 228 477 443 [[ 440 441 ]]
align: 12 112 LoadI === 228 432 110 [[ 117 116 ]]
Pack: 20
align: 8 445 LoadI === 244 477 446 [[ 435 444 ]]
align: 12 151 LoadI === 244 432 149 [[ 156 154 ]]
Pack: 21
align: 8 433 AddI === _ 434 440 [[ 432 ]]
align: 12 158 AddI === _ 117 157 [[ 192 ]]
Pack: 22
align: 8 441 LShiftI === _ 442 108 [[ 440 ]]
align: 12 116 LShiftI === _ 112 108 [[ 117 ]]
Pack: 23
align: 8 435 LShiftI === _ 445 40 [[ 434 ]]
align: 12 154 LShiftI === _ 151 40 [[ 157 ]]
Pack: 24
align: 8 434 AddI === _ 435 444 [[ 433 ]]
align: 12 117 AddI === _ 116 112 [[ 158 ]]
Pack: 25
align: 8 440 AddI === _ 441 442 [[ 433 ]]
align: 12 157 AddI === _ 154 156 [[ 158 ]]
Pack: 26
align: 8 444 LShiftI === _ 445 155 [[ 434 ]]
align: 12 156 LShiftI === _ 151 155 [[ 157 ]]
>
> Did you try constants with 1 bit set (which converted to simple shift) or 3
> bits set (which keep multipmultiplication)?
>
In my test, both of constants should be split to shift and add, such
as (5, 10) (9, 17) . For other cases, such as (5, 8) (7, 11), there
won't be such a problem.
>>
>> This bug results from that the rules of matching two similar
>> independent nodes are not strict enough. So that I add more matching
>> rules. With this patch, both on x86 and aarch64, SIMD instructions can
>> be generated for above test case. And there is obvious performance
>> improvement (~30% in jmh).
>
>
> What other performance tests you ran?
No.
Regards,
Yang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMulC.java.x86.asm
Type: application/octet-stream
Size: 52980 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170621/37880c32/TestMulC.java.x86-0001.asm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TestMulC.java.x86.opt.asm
Type: application/octet-stream
Size: 64634 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170621/37880c32/TestMulC.java.x86.opt-0001.asm>
More information about the hotspot-compiler-dev
mailing list