RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Wed Sep 11 03:41:33 UTC 2019

Hi Vivek,

Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.04/

With the help of your compile logs, I successfully reproduced the 
not-unroll-after-vectorization problem you mentioned in [1].
It had been fixed on my avx-256 machine with this version.
The patch just adds a heuristic [2] to protect against over-unrolling 
with SuperWordLoopUnrollAnalysis.
Please review it and give me some advice.

Again, if you have any questions on your avx-512 machine, could you 
please share me the compile logs, especially for NUM = 256, 2048 and 4096?
Please see comments inline.

Thanks a lot.
Best regards,
Jie

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html
[2] 
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html

On 2019/9/7 上午7:35, Deshpande, Vivek R wrote:
> Hi Jie
>
> I experimented with both the sizes 1024 and 2048 bytes and looks like the 2nd compilation generates the suboptimal code with shorter vector width.

I still don't think it's a problem since there is no performance gain 
with full available vector width according to your performance analysis.

> Please find it attached.
> IMO, the fix you have should be able to unroll enough to use the full available vector width.
Why?
Unfortunately, compiling with full available vector width can be harmful 
to performance.
I experimented your test case with NUM = 256 and 128 on my avx-256 
machine, finding that the performance was frustrated with full available 
vector width (32-byte vectors).
After the patch, the performance (16-byte vectors) for NUM = 256 and 128 
had been improved by 28% and 36% respectively.

So I wonder about the performance before and after the patch for NUM = 
256 and 128 on your avx-512 machine.
Could you please also share us?

Thanks.

>
> Regards,
> Vivek