RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
Jie Fu
fujie at loongson.cn
Wed Sep 11 03:41:33 UTC 2019
Hi Vivek,
Updated: http://cr.openjdk.java.net/~jiefu/8227505/webrev.04/
With the help of your compile logs, I successfully reproduced the
not-unroll-after-vectorization problem you mentioned in [1].
It had been fixed on my avx-256 machine with this version.
The patch just adds a heuristic [2] to protect against over-unrolling
with SuperWordLoopUnrollAnalysis.
Please review it and give me some advice.
Again, if you have any questions on your avx-512 machine, could you
please share me the compile logs, especially for NUM = 256, 2048 and 4096?
Please see comments inline.
Thanks a lot.
Best regards,
Jie
[1]
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034817.html
[2]
https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-August/034783.html
On 2019/9/7 上午7:35, Deshpande, Vivek R wrote:
> Hi Jie
>
> I experimented with both the sizes 1024 and 2048 bytes and looks like the 2nd compilation generates the suboptimal code with shorter vector width.
I still don't think it's a problem since there is no performance gain
with full available vector width according to your performance analysis.
> Please find it attached.
> IMO, the fix you have should be able to unroll enough to use the full available vector width.
Why?
Unfortunately, compiling with full available vector width can be harmful
to performance.
I experimented your test case with NUM = 256 and 128 on my avx-256
machine, finding that the performance was frustrated with full available
vector width (32-byte vectors).
After the patch, the performance (16-byte vectors) for NUM = 256 and 128
had been improved by 28% and 36% respectively.
So I wonder about the performance before and after the patch for NUM =
256 and 128 on your avx-512 machine.
Could you please also share us?
Thanks.
>
> Regards,
> Vivek
More information about the hotspot-compiler-dev
mailing list