RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Fri Sep 6 14:55:27 UTC 2019

Hi Vivek,

Could you please also share us the compilation log with 
-XX:+PrintOptoAssembly for NUM=2048 & NUM=1024 on your AVX-512 machine 
when you take a look at this issue next time?
I think it will be very helpful to analyze this issue.

Thanks a lot.
Best regards,
Jie

On 2019/9/1 上午9:46, Deshpande, Vivek R wrote:
> Hi Jie
>
> I will try with NUM = 2048 and let you know.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Jie Fu [mailto:fujie at loongson.cn]
> Sent: Saturday, August 31, 2019 8:04 AM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
>
> Hi Vivek,
>
> Would you mind if I assign this issue[1] to you?
>
> I can't find an AVX-512 machine in our company to do more investigation.
> I'm sorry for that.
>
> Thanks a lot.
> Best regards,
> Jie
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8227505
>
> On 2019/8/23 上午8:53, Jie Fu wrote:
>> Hi Vivek,
>>
>> Thanks for your clarification.
>> Please seem comments inline.
>>
>> On 2019/8/23 上午3:26, Deshpande, Vivek R wrote:
>>> Hi Jie
>>>
>>> On AVX2 (256 bit vector) machine I did not observe the difference in
>>> the generated code, same as your observation.
>>>
>>> But on AVX3(512 bit/ 64 byte vector) machine the generated code with
>>> the patch was generating the AVX2 (256 bit) instructions instead of
>>> AVX3 (512 bit) instructions.
>>> So it is not able to use the complete vector width with the patch.
>>> As far as performance is concerned with this particular benchmark,
>>> that I have shared,  and with given number of iterations in the
>>> benchmark, I did not observe any difference with the patch from
>>> original.
>> As for your particular case, I don't think it's a problem to compile
>> with vector-256 since there is no performance drop compared with
>> vector-512.
>> Instead, I'd prefer using vector-256 to lower the risk of over loop
>> unrolling.
>>
>> Also I'm not sure whether the power consumption will increase if
>> vector-512 is used on your machine.
>>
>>
>>> So it's the difference in the generated code which is not using full
>>> vector width.
>> According to your performance analysis, vector-256 is good enough for
>> your test case.
>> What's the benefit to generate vector-512 for your case?
>>
>> Well, the patch doesn't disable the generation of vector-512 at all.
>> You can increase the NUM in your program from 1024 to 2048 or more and
>> try again.
>> Thanks.
>>
>> What do you think?
>> Any comments?
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>>> Regards,
>>> Vivek