RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Wed Sep 18 01:46:19 UTC 2019

Hi Vivek,

Thank you for your help.

Does webrev.04 fix the the not-unroll-after-vectorization problem you 
mentioned in [1] on your avx-512 machine?

The patch just adds a heuristic [2] to protect against over-unrolling 
with SuperWordLoopUnrollAnalysis.
In order to use the full available vector width, 
SuperWordLoopUnrollAnalysis performs loop unrolling much more 
aggressively, which may hurt the performance for some cases.
One of the important reasons for the performance degradation of 
SuperWordLoopUnrollAnalysis is that it doesn't consider the negative 
impact of pre/post-loop at all.
The current SuperWordLoopUnrollAnalysis focuses on reducing the 
iterations of the main-loop, but ignores the increment of iterations in 
pre/post-loop.
For a more detailed quantitative analysis of that case, please refer to [2].

Thanks a lot.
Best regards,
Jie

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html
[2] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html

On 2019/9/17 下午10:55, Deshpande, Vivek R wrote:
> Hi Jie
>
> I tried your patch from webrev.04. I still see the similar behavior as earlier patch. So I am trying to understand what your new patch is doing and how we can fix it.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Jie Fu [mailto:fujie at loongson.cn]
> Sent: Tuesday, September 10, 2019 8:42 PM
> To: Deshpande, Vivek R <vivek.r.deshpande at intel.com>; Vladimir Kozlov <vladimir.kozlov at oracle.com>; hotspot-compiler-dev at openjdk.java.net; Viswanathan, Sandhya <sandhya.viswanathan at intel.com>
> Subject: Re: RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
>
> Hi Vivek,
>
> Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.04/
>
> With the help of your compile logs, I successfully reproduced the not-unroll-after-vectorization problem you mentioned in [1].
> It had been fixed on my avx-256 machine with this version.
> The patch just adds a heuristic [2] to protect against over-unrolling with SuperWordLoopUnrollAnalysis.
> Please review it and give me some advice.
>
> Again, if you have any questions on your avx-512 machine, could you please share me the compile logs, especially for NUM = 256, 2048 and 4096?
> Please see comments inline.
>
> Thanks a lot.
> Best regards,
> Jie
>
> [1]
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html
> [2]
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html
>
> On 2019/9/7 上午7:35, Deshpande, Vivek R wrote:
>> Hi Jie
>>
>> I experimented with both the sizes 1024 and 2048 bytes and looks like the 2nd compilation generates the suboptimal code with shorter vector width.
> I still don't think it's a problem since there is no performance gain with full available vector width according to your performance analysis.
>
>
>> Please find it attached.
>> IMO, the fix you have should be able to unroll enough to use the full available vector width.
> Why?
> Unfortunately, compiling with full available vector width can be harmful
> to performance.
> I experimented your test case with NUM = 256 and 128 on my avx-256
> machine, finding that the performance was frustrated with full available
> vector width (32-byte vectors).
> After the patch, the performance (16-byte vectors) for NUM = 256 and 128
> had been improved by 28% and 36% respectively.
>
> So I wonder about the performance before and after the patch for NUM =
> 256 and 128 on your avx-512 machine.
> Could you please also share us?
>
> Thanks.
>
>
>> Regards,
>> Vivek