[aarch64-port-dev ] RFR: JDK-8187601: Unrolling more when SLP auto-vectorization failed
Zhongwei Yao
zhongwei.yao at linaro.org
Mon Sep 18 09:04:58 UTC 2017
Hi, all,
Bug:
https://bugs.openjdk.java.net/browse/JDK-8187601
Webrev:
http://cr.openjdk.java.net/~zyao/8187601/webrev.00
In the current implementation, the loop unrolling times are determined
by vector size and element size when SuperWordLoopUnrollAnalysis is
true (both X86 and aarch64 are true for now).
This unrolling policy generates less optimized code when SLP
auto-vectorization fails (as following example shows).
In this patch, I modify the current unrolling policy to do more
unrolling when SLP auto-vectorization fails. So the loop will be
unrolled until reaching the unroll times limitation.
Here is one example:
public static void accessArrayConstants(int[] array) {
for (int j = 0; j < 1024; j++) {
array[0]++;
array[1]++;
}
}
Before this patch, the loop will be unrolled by 4 times. 4 is
determined by: AArch64's vector size 128 bits / array element size 32
bits = 4. On X86, vector size is 256 bits. So the unroll times are 8.
Below is the generated code by C2 on AArch64:
...
... # omit unrelated code.
...
0x0000ffff6caf3180: ldr w10, [x1,#16] ;*iaload {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 12 (line 6)
0x0000ffff6caf3184: add w13, w10, #0x1
0x0000ffff6caf3188: str w13, [x1,#16] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 15 (line 6)
0x0000ffff6caf318c: ldr w12, [x1,#20] ;*iaload {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 19 (line 7)
0x0000ffff6caf3190: add w13, w10, #0x4
0x0000ffff6caf3194: add w10, w12, #0x4
0x0000ffff6caf3198: str w13, [x1,#16] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 15 (line 6)
0x0000ffff6caf319c: add w11, w11, #0x4 ;*iinc {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 23 (line 5)
0x0000ffff6caf31a0: str w10, [x1,#20] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 22 (line 7)
0x0000ffff6caf31a4: cmp w11, #0x3fd
0x0000ffff6caf31a8: b.lt 0x0000ffff6caf3180 ;*if_icmpge
{reexecute=0 rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 6 (line 5)
...
... # omit unrelated code.
...
After applied this patch, it is unrolled 16 times:
...
... # omit unrelated code.
...
0x0000ffffb0aa6100: ldr w10, [x1,#16] ;*iaload {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 12 (line 6)
0x0000ffffb0aa6104: add w13, w10, #0x1
0x0000ffffb0aa6108: str w13, [x1,#16] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 15 (line 6)
0x0000ffffb0aa610c: ldr w12, [x1,#20] ;*iaload {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 19 (line 7)
0x0000ffffb0aa6110: add w13, w10, #0x10
0x0000ffffb0aa6114: add w10, w12, #0x10
0x0000ffffb0aa6118: str w13, [x1,#16] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 15 (line 6)
0x0000ffffb0aa611c: add w11, w11, #0x10 ;*iinc {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 23 (line 5)
0x0000ffffb0aa6120: str w10, [x1,#20] ;*iastore {reexecute=0
rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 22 (line 7)
0x0000ffffb0aa6124: cmp w11, #0x3f1
0x0000ffffb0aa6128: b.lt 0x0000ffffb0aa6100 ;*if_icmpge
{reexecute=0 rethrow=0 return_oop=0}
; -
ArrayAccess::accessArrayConstants at 6 (line 5)
...
... # omit unrelated code.
...
This patch passes jtreg tests both on AArch64 and X86.
--
Best regards,
Zhongwei
More information about the aarch64-port-dev
mailing list