RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Jie Fu fujie at loongson.cn
Tue Sep 24 14:58:48 UTC 2019


Hi Vivek,

May I get to know whether the not-unroll-after-vectorization problem was 
fixed by webrev.04 on your avx-512 machine?
If not, could you please share me the compile log?

Thanks a lot.
Best regards,
Jie

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html

On 2019/9/18 上午9:46, Jie Fu wrote:
> Hi Vivek,
>
> Thank you for your help.
>
> Does webrev.04 fix the the not-unroll-after-vectorization problem you 
> mentioned in [1] on your avx-512 machine?
>
> The patch just adds a heuristic [2] to protect against over-unrolling 
> with SuperWordLoopUnrollAnalysis.
> In order to use the full available vector width, 
> SuperWordLoopUnrollAnalysis performs loop unrolling much more 
> aggressively, which may hurt the performance for some cases.
> One of the important reasons for the performance degradation of 
> SuperWordLoopUnrollAnalysis is that it doesn't consider the negative 
> impact of pre/post-loop at all.
> The current SuperWordLoopUnrollAnalysis focuses on reducing the 
> iterations of the main-loop, but ignores the increment of iterations 
> in pre/post-loop.
> For a more detailed quantitative analysis of that case, please refer 
> to [2].
>
> Thanks a lot.
> Best regards,
> Jie
>
> [1] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html
> [2] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html
>
> On 2019/9/17 下午10:55, Deshpande, Vivek R wrote:
>> Hi Jie
>>
>> I tried your patch from webrev.04. I still see the similar behavior 
>> as earlier patch. So I am trying to understand what your new patch is 
>> doing and how we can fix it.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Jie Fu [mailto:fujie at loongson.cn]
>> Sent: Tuesday, September 10, 2019 8:42 PM
>> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov 
>> <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net; 
>> Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
>> Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to 
>> over loop unrolling
>>
>> Hi Vivek,
>>
>> Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.04/
>>
>> With the help of your compile logs, I successfully reproduced the 
>> not-unroll-after-vectorization problem you mentioned in [1].
>> It had been fixed on my avx-256 machine with this version.
>> The patch just adds a heuristic [2] to protect against over-unrolling 
>> with SuperWordLoopUnrollAnalysis.
>> Please review it and give me some advice.
>>
>> Again, if you have any questions on your avx-512 machine, could you 
>> please share me the compile logs, especially for NUM = 256, 2048 and 
>> 4096?
>> Please see comments inline.
>>
>> Thanks a lot.
>> Best regards,
>> Jie
>>
>> [1]
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html 
>>
>> [2]
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html 
>>
>>
>> On 2019/9/7 上午7:35, Deshpande, Vivek R wrote:
>>> Hi Jie
>>>
>>> I experimented with both the sizes 1024 and 2048 bytes and looks 
>>> like the 2nd compilation generates the suboptimal code with shorter 
>>> vector width.
>> I still don't think it's a problem since there is no performance gain 
>> with full available vector width according to your performance analysis.
>>
>>
>>> Please find it attached.
>>> IMO, the fix you have should be able to unroll enough to use the 
>>> full available vector width.
>> Why?
>> Unfortunately, compiling with full available vector width can be harmful
>> to performance.
>> I experimented your test case with NUM = 256 and 128 on my avx-256
>> machine, finding that the performance was frustrated with full available
>> vector width (32-byte vectors).
>> After the patch, the performance (16-byte vectors) for NUM = 256 and 128
>> had been improved by 28% and 36% respectively.
>>
>> So I wonder about the performance before and after the patch for NUM =
>> 256 and 128 on your avx-512 machine.
>> Could you please also share us?
>>
>> Thanks.
>>
>>
>>> Regards,
>>> Vivek



More information about the hotspot-compiler-dev mailing list