RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Fri Aug 23 00:53:56 UTC 2019

Hi Vivek,

Thanks for your clarification.
Please seem comments inline.

On 2019/8/23 上午3:26, Deshpande, Vivek R wrote:
> Hi Jie
>
> On AVX2 (256 bit vector) machine I did not observe the difference in the generated code, same as your observation.
>
> But on AVX3(512 bit/ 64 byte vector) machine the generated code with the patch was generating the AVX2 (256 bit) instructions instead of AVX3 (512 bit) instructions.
> So it is not able to use the complete vector width with the patch.
> As far as performance is concerned with this particular benchmark, that I have shared,  and with given number of iterations in the benchmark, I did not observe any difference with the patch from original.
As for your particular case, I don't think it's a problem to compile 
with vector-256 since there is no performance drop compared with vector-512.
Instead, I'd prefer using vector-256 to lower the risk of over loop 
unrolling.

Also I'm not sure whether the power consumption will increase if 
vector-512 is used on your machine.

> So it's the difference in the generated code which is not using full vector width.
According to your performance analysis, vector-256 is good enough for 
your test case.
What's the benefit to generate vector-512 for your case?

Well, the patch doesn't disable the generation of vector-512 at all.
You can increase the NUM in your program from 1024 to 2048 or more and 
try again.
Thanks.

What do you think?
Any comments?

Thanks a lot.
Best regards,
Jie

>
> Regards,
> Vivek