RFR: 8181633: Vectorization fails for some multiplication with constant cases
Ningsheng Jian
ningsheng.jian at linaro.org
Tue Nov 28 01:55:40 UTC 2017
Thank you, Vladimir!
Regards,
Ningsheng
On 28 November 2017 at 08:59, Vladimir Kozlov
<vladimir.kozlov at oracle.com> wrote:
> It "falls through cracks" - I forgot about it.
> I rerun these changes with latest jdk10/hs sources in pre-integration
> testing:
>
> http://cr.openjdk.java.net/~njian/8181633/webrev.00/
>
> Testing passed and I pushed it.
>
> Regards,
> Vladimir
>
>
> On 6/21/17 12:43 AM, Vladimir Kozlov wrote:
>>
>> Very nice results!
>>
>> From correctness point of view changes seem fine but I may miss
>> something.
>>
>> It would be nice if our friends in RedHat and Intel test these changes on
>> regular java benchmarks.
>>
>> Thanks,
>> Vladimir
>>
>> On 6/20/17 11:34 PM, Yang Zhang wrote:
>>>>
>>>>
>>>> Do I understand correctly that the problem is we pack not similar nodes
>>>> into
>>>> the same set? Which cause later non-profitable result for such sets.
>>>> I am trying understand why additional restriction helps.
>>>
>>>
>>> Yes. Just like the following Packs. In Pack 24 and 25, node pair
>>> (434,117) and (440,157) are packed incorrectly. In IdealGraph, this
>>> problem would be more clear. I also attach the generated assembly
>>> files( test case is previous code. opt is the result with the patch).
>>> Please check it.
>>>
>>> Pack: 18
>>> align: 8 432 StoreI === 525 477 439 433 [[ 418 192 151 112 ]]
>>> align: 12 192 StoreI === 525 432 190 158 [[ 416 533 406 ]]
>>> Pack: 19
>>> align: 8 442 LoadI === 228 477 443 [[ 440 441 ]]
>>> align: 12 112 LoadI === 228 432 110 [[ 117 116 ]]
>>> Pack: 20
>>> align: 8 445 LoadI === 244 477 446 [[ 435 444 ]]
>>> align: 12 151 LoadI === 244 432 149 [[ 156 154 ]]
>>> Pack: 21
>>> align: 8 433 AddI === _ 434 440 [[ 432 ]]
>>> align: 12 158 AddI === _ 117 157 [[ 192 ]]
>>> Pack: 22
>>> align: 8 441 LShiftI === _ 442 108 [[ 440 ]]
>>> align: 12 116 LShiftI === _ 112 108 [[ 117 ]]
>>> Pack: 23
>>> align: 8 435 LShiftI === _ 445 40 [[ 434 ]]
>>> align: 12 154 LShiftI === _ 151 40 [[ 157 ]]
>>> Pack: 24
>>> align: 8 434 AddI === _ 435 444 [[ 433 ]]
>>> align: 12 117 AddI === _ 116 112 [[ 158 ]]
>>> Pack: 25
>>> align: 8 440 AddI === _ 441 442 [[ 433 ]]
>>> align: 12 157 AddI === _ 154 156 [[ 158 ]]
>>> Pack: 26
>>> align: 8 444 LShiftI === _ 445 155 [[ 434 ]]
>>> align: 12 156 LShiftI === _ 151 155 [[ 157 ]]
>>>
>>>
>>>>
>>>> Did you try constants with 1 bit set (which converted to simple shift)
>>>> or 3
>>>> bits set (which keep multipmultiplication)?
>>>>
>>>
>>> In my test, both of constants should be split to shift and add, such
>>> as (5, 10) (9, 17) . For other cases, such as (5, 8) (7, 11), there
>>> won't be such a problem.
>>>
>>>
>>>>>
>>>>> This bug results from that the rules of matching two similar
>>>>> independent nodes are not strict enough. So that I add more matching
>>>>> rules. With this patch, both on x86 and aarch64, SIMD instructions can
>>>>> be generated for above test case. And there is obvious performance
>>>>> improvement (~30% in jmh).
>>>>
>>>>
>>>>
>>>> What other performance tests you ran?
>>>
>>>
>>> No.
>>>
>>> Regards,
>>> Yang
>>>
>
More information about the hotspot-compiler-dev
mailing list