loop invariant code motion and the jit
Vitaly Davidovich
vitalyd at gmail.com
Wed Jan 18 11:21:34 UTC 2017
I wonder if Graal handles this properly (I'd check myself but don't have
access to it at the moment).
Roland, would it be possible to have IdealLoopTree reassociate the
operations and then mark the AddNode to tell it to not perform
canonicalization again?
It's not a huge deal in isolation, but isn't missing loop invariant
hoisting a cardinal sin of sorts? :)
On Wed, Jan 18, 2017 at 4:38 AM Roland Westrelin <rwestrel at redhat.com>
wrote:
>
>
> Hi Peter,
>
>
>
> > I'm studying some JIT compiler optimizations to get a better
> understanding
>
> > of what happens on the lowest level.
>
> >
>
> > I'm running into a problem with loop invariant code motion (or loop
>
> > hoisting) and my impression is that the JIT is not always doing hoisting.
>
> > For example:
>
> >
>
> > E.g.
>
> >
>
> > public class LoopInvariantCodeMotion {
>
> >
>
> > public static void main(String[] args) {
>
> > long l =0;
>
> > for (int k = 0; k < 100000; k++) {
>
> > l+=doMath(k);
>
> > }
>
> > System.out.println(l);
>
> > }
>
> >
>
> > public static int doMath(int a) {
>
> > int result = 0;
>
> > for (int k = 0; k < a; k++) {
>
> > int x = a +2;
>
> > result += x + k;
>
> > }
>
> > return result;
>
> > }
>
> > }
>
> >
>
> > The 'int x = a+2' can be pulled out of the loop because it is an
> invariant
>
> > in the loop. However in the generated Assembly it is still in the loop.
>
> > However if I would change the code to 'int x = a * 2'' it is pulled out
> of
>
> > the loop (even though the *2 is implemented as a simple bitshift).
>
> >
>
> > So can someone explain why the loop hoisting is so unpredictable?
>
>
>
> C2 transforms (a + 2) + k to (a + k) + 2. It needs the result of (a+k)
>
> which is loop variant to then compute (a+k)+2.
>
>
>
> If you look at the end of AddNode::Ideal():
>
>
>
> // Convert "x+(y+1)" into "(x+y)+1". Push constants down the expression
> tree.
>
>
>
> If you comment the block below out, you'll see the add of 2 out of the
>
> loop. I suppose C2 pushes constants down the tree to find opportunities
>
> for constant folding but in that case it hurts code quality. This said,
>
> it looks like a minor issue and AFAICT it's not straightforward to fix
>
> because in AddNode::Ideal() C2 has no knowledge of loops. C2 is designed
>
> so it performs simple local transformation such as this one very often
>
> and has a few passes of global loop optimizations. In the next loop
>
> optimization pass, we could transform that expression tree again to take
>
> loop invariants into account (in IdealLoopTree::reassociate_invariants()
>
> I suppose) but the next round of local transformations would apply
>
> AddNode::Ideal() again and undo it.
>
>
>
> Roland.
>
>
>
> >
>
> > These are my settings:
>
> >
>
> > -XX:+UnlockDiagnosticVMOptions
>
> > -XX:PrintAssemblyOptions=intel
>
> > -XX:-BackgroundCompilation
>
> > -XX:-TieredCompilation
>
> > -XX:-Inline
>
> > -XX:CompileCommand=print,*LoopInvariantCodeMotion.doMath
>
> > -XX:LoopUnrollLimit=0
>
> >
>
> >
>
> > java version "1.8.0_101"
>
> > Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>
> > Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
>
> >
>
> >
>
> > Linux silversurver 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC
>
> > 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> >
>
> >
>
> > -------------------------------------------------------------
>
> > Assembly with the +2
>
> > -------------------------------------------------------------
>
> >
>
> > CompilerOracle: print *LoopInvariantCodeMotion.doMath
>
> > Compiled method (c2) 107 4
>
> > com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>
> > total in heap [0x00007f202906d550,0x00007f202906d7b0] = 608
>
> > relocation [0x00007f202906d678,0x00007f202906d680] = 8
>
> > main code [0x00007f202906d680,0x00007f202906d6e0] = 96
>
> > stub code [0x00007f202906d6e0,0x00007f202906d6f8] = 24
>
> > oops [0x00007f202906d6f8,0x00007f202906d700] = 8
>
> > metadata [0x00007f202906d700,0x00007f202906d708] = 8
>
> > scopes data [0x00007f202906d708,0x00007f202906d728] = 32
>
> > scopes pcs [0x00007f202906d728,0x00007f202906d7a8] = 128
>
> > dependencies [0x00007f202906d7a8,0x00007f202906d7b0] = 8
>
> > Loaded disassembler from
> /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
>
> > Decoding compiled method 0x00007f202906d550:
>
> > Code:
>
> > [Disassembling for mach='i386:x86-64']
>
> > [Entry Point]
>
> > [Verified Entry Point]
>
> > [Constants]
>
> > # {method} {0x00007f2031188488} 'doMath' '(I)I' in
>
> > 'com/loop/LoopInvariantCodeMotion'
>
> > # parm0: rsi = int
>
> > # [sp+0x20] (sp of caller)
>
> > 0x00007f202906d680: sub rsp,0x18
>
> > 0x00007f202906d687: mov QWORD PTR [rsp+0x10],rbp ;*synchronization
>
> > entry
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> >
>
> > 0x00007f202906d68c: xor r11d,r11d
>
> > 0x00007f202906d68f: test esi,esi
>
> > 0x00007f202906d691: jle 0x00007f202906d6c0 ;*if_icmpge
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> > 0x00007f202906d693: xor eax,eax
>
> > 0x00007f202906d695: nop WORD PTR [rax+rax*1+0x0] ;*iload_0
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> >
>
> > 0x00007f202906d6a0: mov r10d,r11d
>
> > 0x00007f202906d6a3: add r10d,esi
>
> > 0x00007f202906d6a6: add eax,r10d
>
> > 0x00007f202906d6a9: add eax,0x2 ;*iadd
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> >
>
> > 0x00007f202906d6ac: inc r11d ;*iinc
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> >
>
> > 0x00007f202906d6af: cmp r11d,esi
>
> > 0x00007f202906d6b2: jl 0x00007f202906d6a0 ;*if_icmpge
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> > 0x00007f202906d6b4: add rsp,0x10
>
> > 0x00007f202906d6b8: pop rbp
>
> > 0x00007f202906d6b9: test DWORD PTR [rip+0xca0c941],eax #
>
> > 0x00007f2035a7a000
>
> > ; {poll_return}
>
> > 0x00007f202906d6bf: ret
>
> > 0x00007f202906d6c0: xor eax,eax
>
> > 0x00007f202906d6c2: jmp 0x00007f202906d6b4
>
> > 0x00007f202906d6c4: hlt
>
> > 0x00007f202906d6c5: hlt
>
> > 0x00007f202906d6c6: hlt
>
> > 0x00007f202906d6c7: hlt
>
> > 0x00007f202906d6c8: hlt
>
> > 0x00007f202906d6c9: hlt
>
> > 0x00007f202906d6ca: hlt
>
> > 0x00007f202906d6cb: hlt
>
> > 0x00007f202906d6cc: hlt
>
> > 0x00007f202906d6cd: hlt
>
> > 0x00007f202906d6ce: hlt
>
> > 0x00007f202906d6cf: hlt
>
> > 0x00007f202906d6d0: hlt
>
> > 0x00007f202906d6d1: hlt
>
> > 0x00007f202906d6d2: hlt
>
> > 0x00007f202906d6d3: hlt
>
> > 0x00007f202906d6d4: hlt
>
> > 0x00007f202906d6d5: hlt
>
> > 0x00007f202906d6d6: hlt
>
> > 0x00007f202906d6d7: hlt
>
> > 0x00007f202906d6d8: hlt
>
> > 0x00007f202906d6d9: hlt
>
> > 0x00007f202906d6da: hlt
>
> > 0x00007f202906d6db: hlt
>
> > 0x00007f202906d6dc: hlt
>
> > 0x00007f202906d6dd: hlt
>
> > 0x00007f202906d6de: hlt
>
> > 0x00007f202906d6df: hlt
>
> > [Exception Handler]
>
> > [Stub Code]
>
> > 0x00007f202906d6e0: jmp 0x00007f202906c420 ; {no_reloc}
>
> > [Deopt Handler Code]
>
> > 0x00007f202906d6e5: call 0x00007f202906d6ea
>
> > 0x00007f202906d6ea: sub QWORD PTR [rsp],0x5
>
> > 0x00007f202906d6ef: jmp 0x00007f2029046fc0 ; {runtime_call}
>
> > 0x00007f202906d6f4: hlt
>
> > 0x00007f202906d6f5: hlt
>
> > 0x00007f202906d6f6: hlt
>
> > 0x00007f202906d6f7: hlt
>
> > OopMapSet contains 0 OopMaps
>
> >
>
> > -------------------------------------------------------------
>
> > Assembly with the * 2
>
> > -------------------------------------------------------------
>
> >
>
> > CompilerOracle: print *LoopInvariantCodeMotion.doMath
>
> > Compiled method (c2) 108 4
>
> > com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>
> > total in heap [0x00007f6379078250,0x00007f63790784b0] = 608
>
> > relocation [0x00007f6379078378,0x00007f6379078380] = 8
>
> > main code [0x00007f6379078380,0x00007f63790783e0] = 96
>
> > stub code [0x00007f63790783e0,0x00007f63790783f8] = 24
>
> > oops [0x00007f63790783f8,0x00007f6379078400] = 8
>
> > metadata [0x00007f6379078400,0x00007f6379078408] = 8
>
> > scopes data [0x00007f6379078408,0x00007f6379078428] = 32
>
> > scopes pcs [0x00007f6379078428,0x00007f63790784a8] = 128
>
> > dependencies [0x00007f63790784a8,0x00007f63790784b0] = 8
>
> > Loaded disassembler from
> /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
>
> > Decoding compiled method 0x00007f6379078250:
>
> > Code:
>
> > [Disassembling for mach='i386:x86-64']
>
> > [Entry Point]
>
> > [Verified Entry Point]
>
> > [Constants]
>
> > # {method} {0x00007f6380ae0488} 'doMath' '(I)I' in
>
> > 'com/loop/LoopInvariantCodeMotion'
>
> > # parm0: rsi = int
>
> > # [sp+0x20] (sp of caller)
>
> > 0x00007f6379078380: sub rsp,0x18
>
> > 0x00007f6379078387: mov QWORD PTR [rsp+0x10],rbp ;*synchronization
>
> > entry
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> >
>
> > 0x00007f637907838c: xor r11d,r11d
>
> > 0x00007f637907838f: test esi,esi
>
> > 0x00007f6379078391: jle 0x00007f63790783bd ;*if_icmpge
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> > 0x00007f6379078393: mov r10d,esi
>
> > 0x00007f6379078396: shl r10d,1
>
> > 0x00007f6379078399: xor eax,eax
>
> > 0x00007f637907839b: nop DWORD PTR [rax+rax*1+0x0]
>
> > ;*iload_0
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> >
>
> > 0x00007f63790783a0: mov r8d,r10d
>
> > 0x00007f63790783a3: add r8d,r11d
>
> > 0x00007f63790783a6: add eax,r8d ;*iadd
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> >
>
> > 0x00007f63790783a9: inc r11d ;*iinc
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> >
>
> > 0x00007f63790783ac: cmp r11d,esi
>
> > 0x00007f63790783af: jl 0x00007f63790783a0 ;*if_icmpge
>
> > ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> > 0x00007f63790783b1: add rsp,0x10
>
> > 0x00007f63790783b5: pop rbp
>
> > 0x00007f63790783b6: test DWORD PTR [rip+0xc359c44],eax #
>
> > 0x00007f63853d2000
>
> > ; {poll_return}
>
> > 0x00007f63790783bc: ret
>
> > 0x00007f63790783bd: xor eax,eax
>
> > 0x00007f63790783bf: jmp 0x00007f63790783b1
>
> > 0x00007f63790783c1: hlt
>
> > 0x00007f63790783c2: hlt
>
> > 0x00007f63790783c3: hlt
>
> > 0x00007f63790783c4: hlt
>
> > 0x00007f63790783c5: hlt
>
> > 0x00007f63790783c6: hlt
>
> > 0x00007f63790783c7: hlt
>
> > 0x00007f63790783c8: hlt
>
> > 0x00007f63790783c9: hlt
>
> > 0x00007f63790783ca: hlt
>
> > 0x00007f63790783cb: hlt
>
> > 0x00007f63790783cc: hlt
>
> > 0x00007f63790783cd: hlt
>
> > 0x00007f63790783ce: hlt
>
> > 0x00007f63790783cf: hlt
>
> > 0x00007f63790783d0: hlt
>
> > 0x00007f63790783d1: hlt
>
> > 0x00007f63790783d2: hlt
>
> > 0x00007f63790783d3: hlt
>
> > 0x00007f63790783d4: hlt
>
> > 0x00007f63790783d5: hlt
>
> > 0x00007f63790783d6: hlt
>
> > 0x00007f63790783d7: hlt
>
> > 0x00007f63790783d8: hlt
>
> > 0x00007f63790783d9: hlt
>
> > 0x00007f63790783da: hlt
>
> > 0x00007f63790783db: hlt
>
> > 0x00007f63790783dc: hlt
>
> > 0x00007f63790783dd: hlt
>
> > 0x00007f63790783de: hlt
>
> > 0x00007f63790783df: hlt
>
> > [Exception Handler]
>
> > [Stub Code]
>
> > 0x00007f63790783e0: jmp 0x00007f6379075520 ; {no_reloc}
>
> > [Deopt Handler Code]
>
> > 0x00007f63790783e5: call 0x00007f63790783ea
>
> > 0x00007f63790783ea: sub QWORD PTR [rsp],0x5
>
> > 0x00007f63790783ef: jmp 0x00007f6379046fc0 ; {runtime_call}
>
> > 0x00007f63790783f4: hlt
>
> > 0x00007f63790783f5: hlt
>
> > 0x00007f63790783f6: hlt
>
> > 0x00007f63790783f7: hlt
>
> > OopMapSet contains 0 OopMaps
>
> --
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170118/59906d6a/attachment-0001.html>
More information about the hotspot-compiler-dev
mailing list