RFR(M):8214751: X86: Support for VNNI instruction

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed Dec 12 22:48:15 UTC 2018


Testing passed and I pushed changes.

Vladimir

On 12/12/18 9:30 AM, Deshpande, Vivek R wrote:
> Thanks Vladimir.
> 
> Regards,
> Vivek
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Wednesday, December 12, 2018 8:40 AM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; hotspot-compiler-dev at openjdk.java.net compiler <hotspot-compiler-dev at openjdk.java.net>
> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Raj, Guru <guru.raj at intel.com>
> Subject: Re: RFR(M):8214751: X86: Support for VNNI instruction
> 
> This looks good to me. Let me test it.
> 
> Thanks,
> Vladimir
> 
> On 12/10/18 3:46 PM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> I have the patch which adds the check that all the nodes belong to same loop.
>> Also I have added a jtreg test.
>> Could you please take a look at the patch.
>> http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.02/
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Friday, December 7, 2018 11:22 AM
>> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net compiler
>> <hotspot-compiler-dev at openjdk.java.net>
>> Cc: Viswanathan, Sandhya <sandhya.viswanathan at intel.com>; Raj, Guru
>> <guru.raj at intel.com>
>> Subject: Re: RFR(M):8214751: X86: Support for VNNI instruction
>>
>> On 12/7/18 10:33 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> This patch is useful for AI ML/DL applications such convolution based Neural Nets.
>>
>> Add to RFE's Description this comment. May be have some JMH benchmark results to show improvement.
>>
>>> I have updated the patch with your suggestion.
>>> I am creating the MulAddS2I patch late and before vectoriztion.
>>
>> Add check that all combined nodes belongs to the same loop - you have this information since you inside loopopts.
>>
>> Thanks,
>> Vladimir
>>
>>> The updated webrev is here:
>>> http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.01/
>>>
>>> I am also working on the test.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Thursday, December 6, 2018 11:59 AM
>>> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net compiler
>>> <hotspot-compiler-dev at openjdk.java.net>
>>> Subject: Re: RFR(M):8214751: X86: Support for VNNI instruction
>>>
>>> Hi Vivek,
>>>
>>> What applications benefit this optimizations?
>>>
>>> This optimization may prevent some constant folding and others IGVN optimizations and RA since MulAddS2INode is generated too early I think. The only benefit we will have only if vectors are generated. Can you generate vectors without MulAddS2INode? Or create MulAddS2INode just before vectorization and expand it if vectorization failed? I would prefer first solution to have a struct in SuperWord code which find such pattern and try to vectorize it.
>>>
>>> You need to add test to verify correctness of results.
>>> Add UseAVX == 0 check to predicates which use SSE2 code. Otherwise they may be selected even if UseAVX > 0.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 12/3/18 8:58 PM, Deshpande, Vivek R wrote:
>>>> Hi All
>>>>
>>>> Could you please review the VNNI VPDPWSSD instruction support with autovectorization.
>>>> It can vectorize this operation in the loop:
>>>> out[i] += ((in1[2*i] * in2[2*i]) + (in1[2*i+1] * in2[2*i+1])); More
>>>> information on VNNI can be found here:
>>>> https://software.intel.com/sites/default/files/managed/c5/15/archite
>>>> c t ure-instruction-set-extensions-programming-reference.pdf
>>>>
>>>>
>>>> The initial performance gains with micro on skylake with AVX3 is 10.8x.
>>>>      and it generates
>>>> vmovdqu xmm3, xmmword ptr [rbp+r8*2+0x10] vmovdqu xmm6, xmmword ptr
>>>> [rdx+r8*2+0x10] vpmaddwd xmm3, xmm6, xmm3 vpaddd xmm3, xmm3, xmmword
>>>> ptr [r9+rdi*4+0x10] vmovdqu xmmword ptr [r9+rdi*4+0x10], xmm3
>>>>
>>>> It can generate vpdpwssd instruction on cascadelake.
>>>>
>>>> The webrev is here:
>>>> http://cr.openjdk.java.net/~vdeshpande/8214751/VNNI/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Evdeshpande/8214751/VNNI/webrev.00/>
>>>>
>>>> The jbs entry for the same is here:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8214751
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>


More information about the hotspot-compiler-dev mailing list