RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling

Tue Aug 20 08:44:02 UTC 2019

Hi Vivek,

Thanks for your review and valuable comments.

On 2019/8/20 上午11:44, Deshpande, Vivek R wrote:
> Hi All
>
> I tested this patch with small test which adds byte arrays.
> for (int i = 0; i < NUM; i++) {
>                  data[i] = (byte)(data2[i] + data3[i]);
> }
>
> Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits.
As for your particular case, the following is the performance results on 
my i7-8700 machine, which can support vector-256 at most.
(Running with: time java SmallByteAdd)

Original Code:
   ---------------------
   26641963.050 iter/sec

   real  0m0.553s
   user  0m0.595s
   sys   0m0.008s
   ---------------------

After the patch:
   ---------------------
   29152473.435 iter/sec

   real  0m0.521s
   user  0m0.562s
   sys   0m0.012s
   ---------------------

It seems that the patched version is a little better.

I don't have a machine which supports AVX-512 at hand.
And I'm trying to find one to analyze your case.
Could please show me the performance of your case with vector-512 and 
vector-256 on your machine?

> Also the loop did not get unrolled after vectorization.
>
Yes. I can reproduce it on my computer.
I'm surprised that the unrolled version seems a bit slow (on my 
computer) again.
I'll investigate it soon.

Thanks a lot.
Best regards,
Jie