RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
Jie Fu
fujie at loongson.cn
Sat Sep 7 01:19:20 UTC 2019
Hi Vivek,
Thank you very much.
Will analyze your logs next week.
Best regards,
Jie
On 2019/9/7 上午7:35, Deshpande, Vivek R wrote:
> Hi Jie
>
> I experimented with both the sizes 1024 and 2048 bytes and looks like the 2nd compilation generates the suboptimal code with shorter vector width.
> Please find it attached.
> IMO, the fix you have should be able to unroll enough to use the full available vector width.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Jie Fu [mailto:fujie at loongson.cn]
> Sent: Friday, September 6, 2019 7:55 AM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
>
> Hi Vivek,
>
> Could you please also share us the compilation log with -XX:+PrintOptoAssembly for NUM=2048 & NUM=1024 on your AVX-512 machine when you take a look at this issue next time?
> I think it will be very helpful to analyze this issue.
>
> Thanks a lot.
> Best regards,
> Jie
>
> On 2019/9/1 上午9:46, Deshpande, Vivek R wrote:
>> Hi Jie
>>
>> I will try with NUM = 2048 and let you know.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Jie Fu [mailto:fujie at loongson.cn]
>> Sent: Saturday, August 31, 2019 8:04 AM
>> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net;
>> Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>> Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to
>> over loop unrolling
>>
>> Hi Vivek,
>>
>> Would you mind if I assign this issue[1] to you?
>>
>> I can't find an AVX-512 machine in our company to do more investigation.
>> I'm sorry for that.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8227505
>>
>> On 2019/8/23 上午8:53, Jie Fu wrote:
>>> Hi Vivek,
>>>
>>> Thanks for your clarification.
>>> Please seem comments inline.
>>>
>>> On 2019/8/23 上午3:26, Deshpande, Vivek R wrote:
>>>> Hi Jie
>>>>
>>>> On AVX2 (256 bit vector) machine I did not observe the difference in
>>>> the generated code, same as your observation.
>>>>
>>>> But on AVX3(512 bit/ 64 byte vector) machine the generated code with
>>>> the patch was generating the AVX2 (256 bit) instructions instead of
>>>> AVX3 (512 bit) instructions.
>>>> So it is not able to use the complete vector width with the patch.
>>>> As far as performance is concerned with this particular benchmark,
>>>> that I have shared, and with given number of iterations in the
>>>> benchmark, I did not observe any difference with the patch from
>>>> original.
>>> As for your particular case, I don't think it's a problem to compile
>>> with vector-256 since there is no performance drop compared with
>>> vector-512.
>>> Instead, I'd prefer using vector-256 to lower the risk of over loop
>>> unrolling.
>>>
>>> Also I'm not sure whether the power consumption will increase if
>>> vector-512 is used on your machine.
>>>
>>>
>>>> So it's the difference in the generated code which is not using full
>>>> vector width.
>>> According to your performance analysis, vector-256 is good enough for
>>> your test case.
>>> What's the benefit to generate vector-512 for your case?
>>>
>>> Well, the patch doesn't disable the generation of vector-512 at all.
>>> You can increase the NUM in your program from 1024 to 2048 or more
>>> and try again.
>>> Thanks.
>>>
>>> What do you think?
>>> Any comments?
>>>
>>> Thanks a lot.
>>> Best regards,
>>> Jie
>>>
>>>> Regards,
>>>> Vivek
More information about the hotspot-compiler-dev
mailing list