RFR: 8227505: SuperWordLoopUnrollAnalysis may lead to over loop unrolling
Jie Fu
fujie at loongson.cn
Tue Aug 20 08:44:02 UTC 2019
Hi Vivek,
Thanks for your review and valuable comments.
On 2019/8/20 上午11:44, Deshpande, Vivek R wrote:
> Hi All
>
> I tested this patch with small test which adds byte arrays.
> for (int i = 0; i < NUM; i++) {
> data[i] = (byte)(data2[i] + data3[i]);
> }
>
> Since the loop unrolled to half than earlier, the maximum vector length could not be used and generated 256 bit long vector instructions instead of maximum available 512 bits.
As for your particular case, the following is the performance results on
my i7-8700 machine, which can support vector-256 at most.
(Running with: time java SmallByteAdd)
Original Code:
---------------------
26641963.050 iter/sec
real 0m0.553s
user 0m0.595s
sys 0m0.008s
---------------------
After the patch:
---------------------
29152473.435 iter/sec
real 0m0.521s
user 0m0.562s
sys 0m0.012s
---------------------
It seems that the patched version is a little better.
I don't have a machine which supports AVX-512 at hand.
And I'm trying to find one to analyze your case.
Could please show me the performance of your case with vector-512 and
vector-256 on your machine?
> Also the loop did not get unrolled after vectorization.
>
Yes. I can reproduce it on my computer.
I'm surprised that the unrolled version seems a bit slow (on my
computer) again.
I'll investigate it soon.
Thanks a lot.
Best regards,
Jie
More information about the hotspot-compiler-dev
mailing list