RFR: JDK-8187601: Unrolling more when SLP auto-vectorization failed

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Sep 18 16:17:26 UTC 2017


Why not use existing set_notpassed_slp() instead of mark_slp_vec_failed()?

Why you need next additional check?:

-        } else if (cl->is_main_loop()) {
+        } else if (cl->is_main_loop() && !cl->is_slp_vec_failed()) {
            sw.transform_loop(lpt, true);


Thanks,
Vladimir

On 9/18/17 2:58 AM, Zhongwei Yao wrote:
> [Forward from aarch64-port-dev to hotspot-compiler-dev]
> 
> Hi, all,
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8187601
> 
> Webrev:
> http://cr.openjdk.java.net/~zyao/8187601/webrev.00
> 
> In the current implementation, the loop unrolling times are determined
> by vector size and element size when SuperWordLoopUnrollAnalysis is
> true (both X86 and aarch64 are true for now).
> 
> This unrolling policy generates less optimized code when SLP
> auto-vectorization fails (as following example shows).
> 
> In this patch, I modify the current unrolling policy to do more
> unrolling when SLP auto-vectorization fails. So the loop will be
> unrolled until reaching the unroll times limitation.
> 
> Here is one example:
>    public static void accessArrayConstants(int[] array) {
>        for (int j = 0; j < 1024; j++) {
>            array[0]++;
>            array[1]++;
>        }
>    }
> 
> Before this patch, the loop will be unrolled by 4 times. 4 is
> determined by: AArch64's vector size 128 bits / array element size 32
> bits = 4. On X86, vector size is 256 bits. So the unroll times are 8.
> 
> Below is the generated code by C2 on AArch64:
> 
> ==== generated code start ====
>    0x0000ffff6caf3180: ldr w10, [x1,#16]   ;
>    0x0000ffff6caf3184: add w13, w10, #0x1
>    0x0000ffff6caf3188: str w13, [x1,#16]   ;
>    0x0000ffff6caf318c: ldr w12, [x1,#20]   ;
>    0x0000ffff6caf3190: add w13, w10, #0x4
>    0x0000ffff6caf3194: add w10, w12, #0x4
>    0x0000ffff6caf3198: str w13, [x1,#16]   ;
>    0x0000ffff6caf319c: add w11, w11, #0x4  ;
>    0x0000ffff6caf31a0: str w10, [x1,#20]   ;
>    0x0000ffff6caf31a4: cmp w11, #0x3fd
>    0x0000ffff6caf31a8: b.lt 0x0000ffff6caf3180  ;
> ==== generated code end ====
> 
> After applied this patch, it is unrolled 16 times:
> 
> ==== generated code start ====
>    0x0000ffffb0aa6100: ldr w10, [x1,#16]   ;
>    0x0000ffffb0aa6104: add w13, w10, #0x1
>    0x0000ffffb0aa6108: str w13, [x1,#16]   ;
>    0x0000ffffb0aa610c: ldr w12, [x1,#20]   ;
>    0x0000ffffb0aa6110: add w13, w10, #0x10
>    0x0000ffffb0aa6114: add w10, w12, #0x10
>    0x0000ffffb0aa6118: str w13, [x1,#16]   ;
>    0x0000ffffb0aa611c: add w11, w11, #0x10  ;
>    0x0000ffffb0aa6120: str w10, [x1,#20]   ;
>    0x0000ffffb0aa6124: cmp w11, #0x3f1
>    0x0000ffffb0aa6128: b.lt 0x0000ffffb0aa6100  ;
> ==== generated code end ====
> 
> This patch passes jtreg tests both on AArch64 and X86.
> 


More information about the hotspot-compiler-dev mailing list