loop invariant code motion and the jit
Roland Westrelin
rwestrel at redhat.com
Wed Jan 18 09:38:10 UTC 2017
Hi Peter,
> I'm studying some JIT compiler optimizations to get a better understanding
> of what happens on the lowest level.
>
> I'm running into a problem with loop invariant code motion (or loop
> hoisting) and my impression is that the JIT is not always doing hoisting.
> For example:
>
> E.g.
>
> public class LoopInvariantCodeMotion {
>
> public static void main(String[] args) {
> long l =0;
> for (int k = 0; k < 100000; k++) {
> l+=doMath(k);
> }
> System.out.println(l);
> }
>
> public static int doMath(int a) {
> int result = 0;
> for (int k = 0; k < a; k++) {
> int x = a +2;
> result += x + k;
> }
> return result;
> }
> }
>
> The 'int x = a+2' can be pulled out of the loop because it is an invariant
> in the loop. However in the generated Assembly it is still in the loop.
> However if I would change the code to 'int x = a * 2'' it is pulled out of
> the loop (even though the *2 is implemented as a simple bitshift).
>
> So can someone explain why the loop hoisting is so unpredictable?
C2 transforms (a + 2) + k to (a + k) + 2. It needs the result of (a+k)
which is loop variant to then compute (a+k)+2.
If you look at the end of AddNode::Ideal():
// Convert "x+(y+1)" into "(x+y)+1". Push constants down the expression tree.
If you comment the block below out, you'll see the add of 2 out of the
loop. I suppose C2 pushes constants down the tree to find opportunities
for constant folding but in that case it hurts code quality. This said,
it looks like a minor issue and AFAICT it's not straightforward to fix
because in AddNode::Ideal() C2 has no knowledge of loops. C2 is designed
so it performs simple local transformation such as this one very often
and has a few passes of global loop optimizations. In the next loop
optimization pass, we could transform that expression tree again to take
loop invariants into account (in IdealLoopTree::reassociate_invariants()
I suppose) but the next round of local transformations would apply
AddNode::Ideal() again and undo it.
Roland.
>
> These are my settings:
>
> -XX:+UnlockDiagnosticVMOptions
> -XX:PrintAssemblyOptions=intel
> -XX:-BackgroundCompilation
> -XX:-TieredCompilation
> -XX:-Inline
> -XX:CompileCommand=print,*LoopInvariantCodeMotion.doMath
> -XX:LoopUnrollLimit=0
>
>
> java version "1.8.0_101"
> Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
>
>
> Linux silversurver 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
>
> -------------------------------------------------------------
> Assembly with the +2
> -------------------------------------------------------------
>
> CompilerOracle: print *LoopInvariantCodeMotion.doMath
> Compiled method (c2) 107 4
> com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
> total in heap [0x00007f202906d550,0x00007f202906d7b0] = 608
> relocation [0x00007f202906d678,0x00007f202906d680] = 8
> main code [0x00007f202906d680,0x00007f202906d6e0] = 96
> stub code [0x00007f202906d6e0,0x00007f202906d6f8] = 24
> oops [0x00007f202906d6f8,0x00007f202906d700] = 8
> metadata [0x00007f202906d700,0x00007f202906d708] = 8
> scopes data [0x00007f202906d708,0x00007f202906d728] = 32
> scopes pcs [0x00007f202906d728,0x00007f202906d7a8] = 128
> dependencies [0x00007f202906d7a8,0x00007f202906d7b0] = 8
> Loaded disassembler from /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
> Decoding compiled method 0x00007f202906d550:
> Code:
> [Disassembling for mach='i386:x86-64']
> [Entry Point]
> [Verified Entry Point]
> [Constants]
> # {method} {0x00007f2031188488} 'doMath' '(I)I' in
> 'com/loop/LoopInvariantCodeMotion'
> # parm0: rsi = int
> # [sp+0x20] (sp of caller)
> 0x00007f202906d680: sub rsp,0x18
> 0x00007f202906d687: mov QWORD PTR [rsp+0x10],rbp ;*synchronization
> entry
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> 0x00007f202906d68c: xor r11d,r11d
> 0x00007f202906d68f: test esi,esi
> 0x00007f202906d691: jle 0x00007f202906d6c0 ;*if_icmpge
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> 0x00007f202906d693: xor eax,eax
> 0x00007f202906d695: nop WORD PTR [rax+rax*1+0x0] ;*iload_0
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> 0x00007f202906d6a0: mov r10d,r11d
> 0x00007f202906d6a3: add r10d,esi
> 0x00007f202906d6a6: add eax,r10d
> 0x00007f202906d6a9: add eax,0x2 ;*iadd
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> 0x00007f202906d6ac: inc r11d ;*iinc
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> 0x00007f202906d6af: cmp r11d,esi
> 0x00007f202906d6b2: jl 0x00007f202906d6a0 ;*if_icmpge
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> 0x00007f202906d6b4: add rsp,0x10
> 0x00007f202906d6b8: pop rbp
> 0x00007f202906d6b9: test DWORD PTR [rip+0xca0c941],eax #
> 0x00007f2035a7a000
> ; {poll_return}
> 0x00007f202906d6bf: ret
> 0x00007f202906d6c0: xor eax,eax
> 0x00007f202906d6c2: jmp 0x00007f202906d6b4
> 0x00007f202906d6c4: hlt
> 0x00007f202906d6c5: hlt
> 0x00007f202906d6c6: hlt
> 0x00007f202906d6c7: hlt
> 0x00007f202906d6c8: hlt
> 0x00007f202906d6c9: hlt
> 0x00007f202906d6ca: hlt
> 0x00007f202906d6cb: hlt
> 0x00007f202906d6cc: hlt
> 0x00007f202906d6cd: hlt
> 0x00007f202906d6ce: hlt
> 0x00007f202906d6cf: hlt
> 0x00007f202906d6d0: hlt
> 0x00007f202906d6d1: hlt
> 0x00007f202906d6d2: hlt
> 0x00007f202906d6d3: hlt
> 0x00007f202906d6d4: hlt
> 0x00007f202906d6d5: hlt
> 0x00007f202906d6d6: hlt
> 0x00007f202906d6d7: hlt
> 0x00007f202906d6d8: hlt
> 0x00007f202906d6d9: hlt
> 0x00007f202906d6da: hlt
> 0x00007f202906d6db: hlt
> 0x00007f202906d6dc: hlt
> 0x00007f202906d6dd: hlt
> 0x00007f202906d6de: hlt
> 0x00007f202906d6df: hlt
> [Exception Handler]
> [Stub Code]
> 0x00007f202906d6e0: jmp 0x00007f202906c420 ; {no_reloc}
> [Deopt Handler Code]
> 0x00007f202906d6e5: call 0x00007f202906d6ea
> 0x00007f202906d6ea: sub QWORD PTR [rsp],0x5
> 0x00007f202906d6ef: jmp 0x00007f2029046fc0 ; {runtime_call}
> 0x00007f202906d6f4: hlt
> 0x00007f202906d6f5: hlt
> 0x00007f202906d6f6: hlt
> 0x00007f202906d6f7: hlt
> OopMapSet contains 0 OopMaps
>
> -------------------------------------------------------------
> Assembly with the * 2
> -------------------------------------------------------------
>
> CompilerOracle: print *LoopInvariantCodeMotion.doMath
> Compiled method (c2) 108 4
> com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
> total in heap [0x00007f6379078250,0x00007f63790784b0] = 608
> relocation [0x00007f6379078378,0x00007f6379078380] = 8
> main code [0x00007f6379078380,0x00007f63790783e0] = 96
> stub code [0x00007f63790783e0,0x00007f63790783f8] = 24
> oops [0x00007f63790783f8,0x00007f6379078400] = 8
> metadata [0x00007f6379078400,0x00007f6379078408] = 8
> scopes data [0x00007f6379078408,0x00007f6379078428] = 32
> scopes pcs [0x00007f6379078428,0x00007f63790784a8] = 128
> dependencies [0x00007f63790784a8,0x00007f63790784b0] = 8
> Loaded disassembler from /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
> Decoding compiled method 0x00007f6379078250:
> Code:
> [Disassembling for mach='i386:x86-64']
> [Entry Point]
> [Verified Entry Point]
> [Constants]
> # {method} {0x00007f6380ae0488} 'doMath' '(I)I' in
> 'com/loop/LoopInvariantCodeMotion'
> # parm0: rsi = int
> # [sp+0x20] (sp of caller)
> 0x00007f6379078380: sub rsp,0x18
> 0x00007f6379078387: mov QWORD PTR [rsp+0x10],rbp ;*synchronization
> entry
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> 0x00007f637907838c: xor r11d,r11d
> 0x00007f637907838f: test esi,esi
> 0x00007f6379078391: jle 0x00007f63790783bd ;*if_icmpge
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> 0x00007f6379078393: mov r10d,esi
> 0x00007f6379078396: shl r10d,1
> 0x00007f6379078399: xor eax,eax
> 0x00007f637907839b: nop DWORD PTR [rax+rax*1+0x0]
> ;*iload_0
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> 0x00007f63790783a0: mov r8d,r10d
> 0x00007f63790783a3: add r8d,r11d
> 0x00007f63790783a6: add eax,r8d ;*iadd
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> 0x00007f63790783a9: inc r11d ;*iinc
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> 0x00007f63790783ac: cmp r11d,esi
> 0x00007f63790783af: jl 0x00007f63790783a0 ;*if_icmpge
> ; -
> com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> 0x00007f63790783b1: add rsp,0x10
> 0x00007f63790783b5: pop rbp
> 0x00007f63790783b6: test DWORD PTR [rip+0xc359c44],eax #
> 0x00007f63853d2000
> ; {poll_return}
> 0x00007f63790783bc: ret
> 0x00007f63790783bd: xor eax,eax
> 0x00007f63790783bf: jmp 0x00007f63790783b1
> 0x00007f63790783c1: hlt
> 0x00007f63790783c2: hlt
> 0x00007f63790783c3: hlt
> 0x00007f63790783c4: hlt
> 0x00007f63790783c5: hlt
> 0x00007f63790783c6: hlt
> 0x00007f63790783c7: hlt
> 0x00007f63790783c8: hlt
> 0x00007f63790783c9: hlt
> 0x00007f63790783ca: hlt
> 0x00007f63790783cb: hlt
> 0x00007f63790783cc: hlt
> 0x00007f63790783cd: hlt
> 0x00007f63790783ce: hlt
> 0x00007f63790783cf: hlt
> 0x00007f63790783d0: hlt
> 0x00007f63790783d1: hlt
> 0x00007f63790783d2: hlt
> 0x00007f63790783d3: hlt
> 0x00007f63790783d4: hlt
> 0x00007f63790783d5: hlt
> 0x00007f63790783d6: hlt
> 0x00007f63790783d7: hlt
> 0x00007f63790783d8: hlt
> 0x00007f63790783d9: hlt
> 0x00007f63790783da: hlt
> 0x00007f63790783db: hlt
> 0x00007f63790783dc: hlt
> 0x00007f63790783dd: hlt
> 0x00007f63790783de: hlt
> 0x00007f63790783df: hlt
> [Exception Handler]
> [Stub Code]
> 0x00007f63790783e0: jmp 0x00007f6379075520 ; {no_reloc}
> [Deopt Handler Code]
> 0x00007f63790783e5: call 0x00007f63790783ea
> 0x00007f63790783ea: sub QWORD PTR [rsp],0x5
> 0x00007f63790783ef: jmp 0x00007f6379046fc0 ; {runtime_call}
> 0x00007f63790783f4: hlt
> 0x00007f63790783f5: hlt
> 0x00007f63790783f6: hlt
> 0x00007f63790783f7: hlt
> OopMapSet contains 0 OopMaps
More information about the hotspot-compiler-dev
mailing list