RFR: 8181633: Vectorization fails for some multiplication with constant cases

Tue Nov 28 00:59:09 UTC 2017

It "falls through cracks" - I forgot about it.
I rerun these changes with latest jdk10/hs sources in pre-integration testing:

http://cr.openjdk.java.net/~njian/8181633/webrev.00/

Testing passed and I pushed it.

Regards,
Vladimir

On 6/21/17 12:43 AM, Vladimir Kozlov wrote:
> Very nice results!
> 
>  From correctness point of view changes seem fine but I may miss something.
> 
> It would be nice if our friends in RedHat and Intel test these changes on regular java benchmarks.
> 
> Thanks,
> Vladimir
> 
> On 6/20/17 11:34 PM, Yang Zhang wrote:
>>>
>>> Do I understand correctly that the problem is we pack not similar nodes into
>>> the same set? Which cause later non-profitable result for such sets.
>>> I am trying understand why additional restriction helps.
>>
>> Yes. Just like the following Packs. In Pack 24 and 25, node pair
>> (434,117) and (440,157) are packed incorrectly. In IdealGraph, this
>> problem would be more clear. I also attach the generated assembly
>> files( test case is previous code. opt is the result with the patch).
>> Please check it.
>>
>> Pack: 18
>>   align: 8 432 StoreI ===  525  477  439  433  [[ 418  192  151  112 ]]
>>   align: 12 192 StoreI ===  525  432  190  158  [[ 416  533  406 ]]
>> Pack: 19
>>   align: 8 442 LoadI ===  228  477  443  [[ 440  441 ]]
>>   align: 12 112 LoadI ===  228  432  110  [[ 117  116 ]]
>> Pack: 20
>>   align: 8 445 LoadI ===  244  477  446  [[ 435  444 ]]
>>   align: 12 151 LoadI ===  244  432  149  [[ 156  154 ]]
>> Pack: 21
>>   align: 8 433 AddI === _  434  440  [[ 432 ]]
>>   align: 12 158 AddI === _  117  157  [[ 192 ]]
>> Pack: 22
>>   align: 8 441 LShiftI === _  442  108  [[ 440 ]]
>>   align: 12 116 LShiftI === _  112  108  [[ 117 ]]
>> Pack: 23
>>   align: 8 435 LShiftI === _  445  40  [[ 434 ]]
>>   align: 12 154 LShiftI === _  151  40  [[ 157 ]]
>> Pack: 24
>>   align: 8 434 AddI === _  435  444  [[ 433 ]]
>>   align: 12 117 AddI === _  116  112  [[ 158 ]]
>> Pack: 25
>>   align: 8 440 AddI === _  441  442  [[ 433 ]]
>>   align: 12 157 AddI === _  154  156  [[ 158 ]]
>> Pack: 26
>>   align: 8 444 LShiftI === _  445  155  [[ 434 ]]
>>   align: 12 156 LShiftI === _  151  155  [[ 157 ]]
>>
>>
>>>
>>> Did you try constants with 1 bit set (which converted to simple shift) or 3
>>> bits set (which keep multipmultiplication)?
>>>
>>
>> In my test, both of constants should be split to shift and add, such
>> as (5, 10) (9, 17) . For other cases, such as (5, 8) (7, 11), there
>> won't be such a problem.
>>
>>
>>>>
>>>> This bug results from that the rules of matching two similar
>>>> independent nodes are not strict enough. So that I add more matching
>>>> rules. With this patch, both on x86 and aarch64, SIMD instructions can
>>>> be generated for above test case. And there is obvious performance
>>>> improvement (~30% in jmh).
>>>
>>>
>>> What other performance tests you ran?
>>
>> No.
>>
>> Regards,
>> Yang
>>