loop invariant code motion and the jit

Roland Westrelin rwestrel at redhat.com
Wed Jan 18 09:38:10 UTC 2017


Hi Peter,

> I'm studying some JIT compiler optimizations to get a better understanding
> of what happens on the lowest level.
>
> I'm running into a problem with loop invariant code motion (or loop
> hoisting) and my impression is that the JIT is not always doing hoisting.
> For example:
>
> E.g.
>
> public class LoopInvariantCodeMotion {
>
>     public static void main(String[] args) {
>         long l =0;
>         for (int k = 0; k < 100000; k++) {
>             l+=doMath(k);
>         }
>         System.out.println(l);
>     }
>
>     public static int doMath(int a) {
>         int result = 0;
>         for (int k = 0; k < a; k++) {
>             int x = a +2;
>             result += x + k;
>         }
>         return result;
>     }
> }
>
> The 'int x = a+2' can  be pulled out of the loop because it is an invariant
> in the loop.  However in the generated Assembly it is still in the loop.
> However if I would change the code to 'int x = a * 2'' it is pulled out of
> the loop (even though the *2 is implemented as a simple bitshift).
>
> So can someone explain why the loop hoisting is so unpredictable?

C2 transforms (a + 2) + k to (a + k) + 2. It needs the result of (a+k)
which is loop variant to then compute (a+k)+2.

If you look at the end of AddNode::Ideal():

  // Convert "x+(y+1)" into "(x+y)+1".  Push constants down the expression tree.

If you comment the block below out, you'll see the add of 2 out of the
loop. I suppose C2 pushes constants down the tree to find opportunities
for constant folding but in that case it hurts code quality. This said,
it looks like a minor issue and AFAICT it's not straightforward to fix
because in AddNode::Ideal() C2 has no knowledge of loops. C2 is designed
so it performs simple local transformation such as this one very often
and has a few passes of global loop optimizations. In the next loop
optimization pass, we could transform that expression tree again to take
loop invariants into account (in IdealLoopTree::reassociate_invariants()
I suppose) but the next round of local transformations would apply
AddNode::Ideal() again and undo it.

Roland.

>
> These are my settings:
>
> -XX:+UnlockDiagnosticVMOptions
> -XX:PrintAssemblyOptions=intel
> -XX:-BackgroundCompilation
> -XX:-TieredCompilation
> -XX:-Inline
> -XX:CompileCommand=print,*LoopInvariantCodeMotion.doMath
> -XX:LoopUnrollLimit=0
>
>
> java version "1.8.0_101"
> Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
>
>
> Linux silversurver 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
>
> -------------------------------------------------------------
> Assembly with the +2
> -------------------------------------------------------------
>
> CompilerOracle: print *LoopInvariantCodeMotion.doMath
> Compiled method (c2)     107    4
> com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>  total in heap  [0x00007f202906d550,0x00007f202906d7b0] = 608
>  relocation     [0x00007f202906d678,0x00007f202906d680] = 8
>  main code      [0x00007f202906d680,0x00007f202906d6e0] = 96
>  stub code      [0x00007f202906d6e0,0x00007f202906d6f8] = 24
>  oops           [0x00007f202906d6f8,0x00007f202906d700] = 8
>  metadata       [0x00007f202906d700,0x00007f202906d708] = 8
>  scopes data    [0x00007f202906d708,0x00007f202906d728] = 32
>  scopes pcs     [0x00007f202906d728,0x00007f202906d7a8] = 128
>  dependencies   [0x00007f202906d7a8,0x00007f202906d7b0] = 8
> Loaded disassembler from /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
> Decoding compiled method 0x00007f202906d550:
> Code:
> [Disassembling for mach='i386:x86-64']
> [Entry Point]
> [Verified Entry Point]
> [Constants]
>   # {method} {0x00007f2031188488} 'doMath' '(I)I' in
> 'com/loop/LoopInvariantCodeMotion'
>   # parm0:    rsi       = int
>   #           [sp+0x20]  (sp of caller)
>   0x00007f202906d680: sub    rsp,0x18
>   0x00007f202906d687: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization
> entry
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
>   0x00007f202906d68c: xor    r11d,r11d
>   0x00007f202906d68f: test   esi,esi
>   0x00007f202906d691: jle    0x00007f202906d6c0  ;*if_icmpge
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
>   0x00007f202906d693: xor    eax,eax
>   0x00007f202906d695: nop    WORD PTR [rax+rax*1+0x0]  ;*iload_0
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
>   0x00007f202906d6a0: mov    r10d,r11d
>   0x00007f202906d6a3: add    r10d,esi
>   0x00007f202906d6a6: add    eax,r10d
>   0x00007f202906d6a9: add    eax,0x2            ;*iadd
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
>   0x00007f202906d6ac: inc    r11d               ;*iinc
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
>   0x00007f202906d6af: cmp    r11d,esi
>   0x00007f202906d6b2: jl     0x00007f202906d6a0  ;*if_icmpge
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
>   0x00007f202906d6b4: add    rsp,0x10
>   0x00007f202906d6b8: pop    rbp
>   0x00007f202906d6b9: test   DWORD PTR [rip+0xca0c941],eax        #
> 0x00007f2035a7a000
>                                                 ;   {poll_return}
>   0x00007f202906d6bf: ret
>   0x00007f202906d6c0: xor    eax,eax
>   0x00007f202906d6c2: jmp    0x00007f202906d6b4
>   0x00007f202906d6c4: hlt
>   0x00007f202906d6c5: hlt
>   0x00007f202906d6c6: hlt
>   0x00007f202906d6c7: hlt
>   0x00007f202906d6c8: hlt
>   0x00007f202906d6c9: hlt
>   0x00007f202906d6ca: hlt
>   0x00007f202906d6cb: hlt
>   0x00007f202906d6cc: hlt
>   0x00007f202906d6cd: hlt
>   0x00007f202906d6ce: hlt
>   0x00007f202906d6cf: hlt
>   0x00007f202906d6d0: hlt
>   0x00007f202906d6d1: hlt
>   0x00007f202906d6d2: hlt
>   0x00007f202906d6d3: hlt
>   0x00007f202906d6d4: hlt
>   0x00007f202906d6d5: hlt
>   0x00007f202906d6d6: hlt
>   0x00007f202906d6d7: hlt
>   0x00007f202906d6d8: hlt
>   0x00007f202906d6d9: hlt
>   0x00007f202906d6da: hlt
>   0x00007f202906d6db: hlt
>   0x00007f202906d6dc: hlt
>   0x00007f202906d6dd: hlt
>   0x00007f202906d6de: hlt
>   0x00007f202906d6df: hlt
> [Exception Handler]
> [Stub Code]
>   0x00007f202906d6e0: jmp    0x00007f202906c420  ;   {no_reloc}
> [Deopt Handler Code]
>   0x00007f202906d6e5: call   0x00007f202906d6ea
>   0x00007f202906d6ea: sub    QWORD PTR [rsp],0x5
>   0x00007f202906d6ef: jmp    0x00007f2029046fc0  ;   {runtime_call}
>   0x00007f202906d6f4: hlt
>   0x00007f202906d6f5: hlt
>   0x00007f202906d6f6: hlt
>   0x00007f202906d6f7: hlt
> OopMapSet contains 0 OopMaps
>
> -------------------------------------------------------------
> Assembly with the * 2
> -------------------------------------------------------------
>
> CompilerOracle: print *LoopInvariantCodeMotion.doMath
> Compiled method (c2)     108    4
> com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>  total in heap  [0x00007f6379078250,0x00007f63790784b0] = 608
>  relocation     [0x00007f6379078378,0x00007f6379078380] = 8
>  main code      [0x00007f6379078380,0x00007f63790783e0] = 96
>  stub code      [0x00007f63790783e0,0x00007f63790783f8] = 24
>  oops           [0x00007f63790783f8,0x00007f6379078400] = 8
>  metadata       [0x00007f6379078400,0x00007f6379078408] = 8
>  scopes data    [0x00007f6379078408,0x00007f6379078428] = 32
>  scopes pcs     [0x00007f6379078428,0x00007f63790784a8] = 128
>  dependencies   [0x00007f63790784a8,0x00007f63790784b0] = 8
> Loaded disassembler from /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
> Decoding compiled method 0x00007f6379078250:
> Code:
> [Disassembling for mach='i386:x86-64']
> [Entry Point]
> [Verified Entry Point]
> [Constants]
>   # {method} {0x00007f6380ae0488} 'doMath' '(I)I' in
> 'com/loop/LoopInvariantCodeMotion'
>   # parm0:    rsi       = int
>   #           [sp+0x20]  (sp of caller)
>   0x00007f6379078380: sub    rsp,0x18
>   0x00007f6379078387: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization
> entry
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
>   0x00007f637907838c: xor    r11d,r11d
>   0x00007f637907838f: test   esi,esi
>   0x00007f6379078391: jle    0x00007f63790783bd  ;*if_icmpge
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
>   0x00007f6379078393: mov    r10d,esi
>   0x00007f6379078396: shl    r10d,1
>   0x00007f6379078399: xor    eax,eax
>   0x00007f637907839b: nop    DWORD PTR [rax+rax*1+0x0]
>                                                 ;*iload_0
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
>   0x00007f63790783a0: mov    r8d,r10d
>   0x00007f63790783a3: add    r8d,r11d
>   0x00007f63790783a6: add    eax,r8d            ;*iadd
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
>   0x00007f63790783a9: inc    r11d               ;*iinc
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
>   0x00007f63790783ac: cmp    r11d,esi
>   0x00007f63790783af: jl     0x00007f63790783a0  ;*if_icmpge
>                                                 ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
>   0x00007f63790783b1: add    rsp,0x10
>   0x00007f63790783b5: pop    rbp
>   0x00007f63790783b6: test   DWORD PTR [rip+0xc359c44],eax        #
> 0x00007f63853d2000
>                                                 ;   {poll_return}
>   0x00007f63790783bc: ret
>   0x00007f63790783bd: xor    eax,eax
>   0x00007f63790783bf: jmp    0x00007f63790783b1
>   0x00007f63790783c1: hlt
>   0x00007f63790783c2: hlt
>   0x00007f63790783c3: hlt
>   0x00007f63790783c4: hlt
>   0x00007f63790783c5: hlt
>   0x00007f63790783c6: hlt
>   0x00007f63790783c7: hlt
>   0x00007f63790783c8: hlt
>   0x00007f63790783c9: hlt
>   0x00007f63790783ca: hlt
>   0x00007f63790783cb: hlt
>   0x00007f63790783cc: hlt
>   0x00007f63790783cd: hlt
>   0x00007f63790783ce: hlt
>   0x00007f63790783cf: hlt
>   0x00007f63790783d0: hlt
>   0x00007f63790783d1: hlt
>   0x00007f63790783d2: hlt
>   0x00007f63790783d3: hlt
>   0x00007f63790783d4: hlt
>   0x00007f63790783d5: hlt
>   0x00007f63790783d6: hlt
>   0x00007f63790783d7: hlt
>   0x00007f63790783d8: hlt
>   0x00007f63790783d9: hlt
>   0x00007f63790783da: hlt
>   0x00007f63790783db: hlt
>   0x00007f63790783dc: hlt
>   0x00007f63790783dd: hlt
>   0x00007f63790783de: hlt
>   0x00007f63790783df: hlt
> [Exception Handler]
> [Stub Code]
>   0x00007f63790783e0: jmp    0x00007f6379075520  ;   {no_reloc}
> [Deopt Handler Code]
>   0x00007f63790783e5: call   0x00007f63790783ea
>   0x00007f63790783ea: sub    QWORD PTR [rsp],0x5
>   0x00007f63790783ef: jmp    0x00007f6379046fc0  ;   {runtime_call}
>   0x00007f63790783f4: hlt
>   0x00007f63790783f5: hlt
>   0x00007f63790783f6: hlt
>   0x00007f63790783f7: hlt
> OopMapSet contains 0 OopMaps


More information about the hotspot-compiler-dev mailing list