RFR: JDK-8187601: Unrolling more when SLP auto-vectorization failed
Zhongwei Yao
zhongwei.yao at linaro.org
Mon Sep 18 09:58:11 UTC 2017
[Forward from aarch64-port-dev to hotspot-compiler-dev]
Hi, all,
Bug:
https://bugs.openjdk.java.net/browse/JDK-8187601
Webrev:
http://cr.openjdk.java.net/~zyao/8187601/webrev.00
In the current implementation, the loop unrolling times are determined
by vector size and element size when SuperWordLoopUnrollAnalysis is
true (both X86 and aarch64 are true for now).
This unrolling policy generates less optimized code when SLP
auto-vectorization fails (as following example shows).
In this patch, I modify the current unrolling policy to do more
unrolling when SLP auto-vectorization fails. So the loop will be
unrolled until reaching the unroll times limitation.
Here is one example:
public static void accessArrayConstants(int[] array) {
for (int j = 0; j < 1024; j++) {
array[0]++;
array[1]++;
}
}
Before this patch, the loop will be unrolled by 4 times. 4 is
determined by: AArch64's vector size 128 bits / array element size 32
bits = 4. On X86, vector size is 256 bits. So the unroll times are 8.
Below is the generated code by C2 on AArch64:
==== generated code start ====
0x0000ffff6caf3180: ldr w10, [x1,#16] ;
0x0000ffff6caf3184: add w13, w10, #0x1
0x0000ffff6caf3188: str w13, [x1,#16] ;
0x0000ffff6caf318c: ldr w12, [x1,#20] ;
0x0000ffff6caf3190: add w13, w10, #0x4
0x0000ffff6caf3194: add w10, w12, #0x4
0x0000ffff6caf3198: str w13, [x1,#16] ;
0x0000ffff6caf319c: add w11, w11, #0x4 ;
0x0000ffff6caf31a0: str w10, [x1,#20] ;
0x0000ffff6caf31a4: cmp w11, #0x3fd
0x0000ffff6caf31a8: b.lt 0x0000ffff6caf3180 ;
==== generated code end ====
After applied this patch, it is unrolled 16 times:
==== generated code start ====
0x0000ffffb0aa6100: ldr w10, [x1,#16] ;
0x0000ffffb0aa6104: add w13, w10, #0x1
0x0000ffffb0aa6108: str w13, [x1,#16] ;
0x0000ffffb0aa610c: ldr w12, [x1,#20] ;
0x0000ffffb0aa6110: add w13, w10, #0x10
0x0000ffffb0aa6114: add w10, w12, #0x10
0x0000ffffb0aa6118: str w13, [x1,#16] ;
0x0000ffffb0aa611c: add w11, w11, #0x10 ;
0x0000ffffb0aa6120: str w10, [x1,#20] ;
0x0000ffffb0aa6124: cmp w11, #0x3f1
0x0000ffffb0aa6128: b.lt 0x0000ffffb0aa6100 ;
==== generated code end ====
This patch passes jtreg tests both on AArch64 and X86.
--
Best regards,
Zhongwei
More information about the hotspot-compiler-dev
mailing list