SuperWordLoopUnrollAnalysis and loop unrolling

Yang Zhang yang.zhang at linaro.org
Tue Apr 18 06:42:13 UTC 2017


Hi Andrew

I have run this test case.

For aarch64 C2 with SuperWordLoopUnrollAnalysis=false:
Loop unroll is still controlled by comparing body_size. When loop
unroll is 16 times, body_size is big enough and it stops.

For aarch64 C2 with SuperWordLoopUnrollAnalysis=true:
First loop unroll is controlled by policy_unroll_slp_analysis. When
loop unroll is 4 times, vectorization happens.
If vectorization succeeds, loop unroll would be instigated more by
set_major_progress until body_size is big enough.
if vectorization fails, loop unroll stops.

I haven't run the performance test. But I think in modern CPUs
instructions are run out of order. Loop unroll doesn't always bring
performance improvement.

Regards
Yang

On 15 April 2017 at 00:41, Andrew Haley <aph at redhat.com> wrote:
> On 14/04/17 10:21, Yang Zhang wrote:
>> My test result is just opposite with your description. Could you
>> provide your test case?
>
>     // @Benchmark
>     public int[] sameArrayClass(BenchmarkState state) {
>         for (int i = 0; i < INITSIZE; i++) {
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>
>             state.b[0] = state.b[1];
>             state.b[1] = state.b[2];
>             state.b[2] = state.b[3];
>             state.b[3] = state.b[0];
>         }
>         return state.b;
>     }
>
> This is not vectorizable, but SuperWordLoopUnrollAnalysis=true disables
> the unrolling which would make it faster.
>
> Andrew.
>


More information about the hotspot-dev mailing list