loop invariant code motion and the jit

Vitaly Davidovich vitalyd at gmail.com
Wed Jan 18 11:21:34 UTC 2017


I wonder if Graal handles this properly (I'd check myself but don't have
access to it at the moment).

Roland, would it be possible to have IdealLoopTree reassociate the
operations and then mark the AddNode to tell it to not perform
canonicalization again?

It's not a huge deal in isolation, but isn't missing loop invariant
hoisting a cardinal sin of sorts? :)
On Wed, Jan 18, 2017 at 4:38 AM Roland Westrelin <rwestrel at redhat.com>
wrote:

>
>
> Hi Peter,
>
>
>
> > I'm studying some JIT compiler optimizations to get a better
> understanding
>
> > of what happens on the lowest level.
>
> >
>
> > I'm running into a problem with loop invariant code motion (or loop
>
> > hoisting) and my impression is that the JIT is not always doing hoisting.
>
> > For example:
>
> >
>
> > E.g.
>
> >
>
> > public class LoopInvariantCodeMotion {
>
> >
>
> >     public static void main(String[] args) {
>
> >         long l =0;
>
> >         for (int k = 0; k < 100000; k++) {
>
> >             l+=doMath(k);
>
> >         }
>
> >         System.out.println(l);
>
> >     }
>
> >
>
> >     public static int doMath(int a) {
>
> >         int result = 0;
>
> >         for (int k = 0; k < a; k++) {
>
> >             int x = a +2;
>
> >             result += x + k;
>
> >         }
>
> >         return result;
>
> >     }
>
> > }
>
> >
>
> > The 'int x = a+2' can  be pulled out of the loop because it is an
> invariant
>
> > in the loop.  However in the generated Assembly it is still in the loop.
>
> > However if I would change the code to 'int x = a * 2'' it is pulled out
> of
>
> > the loop (even though the *2 is implemented as a simple bitshift).
>
> >
>
> > So can someone explain why the loop hoisting is so unpredictable?
>
>
>
> C2 transforms (a + 2) + k to (a + k) + 2. It needs the result of (a+k)
>
> which is loop variant to then compute (a+k)+2.
>
>
>
> If you look at the end of AddNode::Ideal():
>
>
>
>   // Convert "x+(y+1)" into "(x+y)+1".  Push constants down the expression
> tree.
>
>
>
> If you comment the block below out, you'll see the add of 2 out of the
>
> loop. I suppose C2 pushes constants down the tree to find opportunities
>
> for constant folding but in that case it hurts code quality. This said,
>
> it looks like a minor issue and AFAICT it's not straightforward to fix
>
> because in AddNode::Ideal() C2 has no knowledge of loops. C2 is designed
>
> so it performs simple local transformation such as this one very often
>
> and has a few passes of global loop optimizations. In the next loop
>
> optimization pass, we could transform that expression tree again to take
>
> loop invariants into account (in IdealLoopTree::reassociate_invariants()
>
> I suppose) but the next round of local transformations would apply
>
> AddNode::Ideal() again and undo it.
>
>
>
> Roland.
>
>
>
> >
>
> > These are my settings:
>
> >
>
> > -XX:+UnlockDiagnosticVMOptions
>
> > -XX:PrintAssemblyOptions=intel
>
> > -XX:-BackgroundCompilation
>
> > -XX:-TieredCompilation
>
> > -XX:-Inline
>
> > -XX:CompileCommand=print,*LoopInvariantCodeMotion.doMath
>
> > -XX:LoopUnrollLimit=0
>
> >
>
> >
>
> > java version "1.8.0_101"
>
> > Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>
> > Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
>
> >
>
> >
>
> > Linux silversurver 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC
>
> > 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> >
>
> >
>
> > -------------------------------------------------------------
>
> > Assembly with the +2
>
> > -------------------------------------------------------------
>
> >
>
> > CompilerOracle: print *LoopInvariantCodeMotion.doMath
>
> > Compiled method (c2)     107    4
>
> > com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>
> >  total in heap  [0x00007f202906d550,0x00007f202906d7b0] = 608
>
> >  relocation     [0x00007f202906d678,0x00007f202906d680] = 8
>
> >  main code      [0x00007f202906d680,0x00007f202906d6e0] = 96
>
> >  stub code      [0x00007f202906d6e0,0x00007f202906d6f8] = 24
>
> >  oops           [0x00007f202906d6f8,0x00007f202906d700] = 8
>
> >  metadata       [0x00007f202906d700,0x00007f202906d708] = 8
>
> >  scopes data    [0x00007f202906d708,0x00007f202906d728] = 32
>
> >  scopes pcs     [0x00007f202906d728,0x00007f202906d7a8] = 128
>
> >  dependencies   [0x00007f202906d7a8,0x00007f202906d7b0] = 8
>
> > Loaded disassembler from
> /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
>
> > Decoding compiled method 0x00007f202906d550:
>
> > Code:
>
> > [Disassembling for mach='i386:x86-64']
>
> > [Entry Point]
>
> > [Verified Entry Point]
>
> > [Constants]
>
> >   # {method} {0x00007f2031188488} 'doMath' '(I)I' in
>
> > 'com/loop/LoopInvariantCodeMotion'
>
> >   # parm0:    rsi       = int
>
> >   #           [sp+0x20]  (sp of caller)
>
> >   0x00007f202906d680: sub    rsp,0x18
>
> >   0x00007f202906d687: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization
>
> > entry
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> >
>
> >   0x00007f202906d68c: xor    r11d,r11d
>
> >   0x00007f202906d68f: test   esi,esi
>
> >   0x00007f202906d691: jle    0x00007f202906d6c0  ;*if_icmpge
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> >   0x00007f202906d693: xor    eax,eax
>
> >   0x00007f202906d695: nop    WORD PTR [rax+rax*1+0x0]  ;*iload_0
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> >
>
> >   0x00007f202906d6a0: mov    r10d,r11d
>
> >   0x00007f202906d6a3: add    r10d,esi
>
> >   0x00007f202906d6a6: add    eax,r10d
>
> >   0x00007f202906d6a9: add    eax,0x2            ;*iadd
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> >
>
> >   0x00007f202906d6ac: inc    r11d               ;*iinc
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> >
>
> >   0x00007f202906d6af: cmp    r11d,esi
>
> >   0x00007f202906d6b2: jl     0x00007f202906d6a0  ;*if_icmpge
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> >   0x00007f202906d6b4: add    rsp,0x10
>
> >   0x00007f202906d6b8: pop    rbp
>
> >   0x00007f202906d6b9: test   DWORD PTR [rip+0xca0c941],eax        #
>
> > 0x00007f2035a7a000
>
> >                                                 ;   {poll_return}
>
> >   0x00007f202906d6bf: ret
>
> >   0x00007f202906d6c0: xor    eax,eax
>
> >   0x00007f202906d6c2: jmp    0x00007f202906d6b4
>
> >   0x00007f202906d6c4: hlt
>
> >   0x00007f202906d6c5: hlt
>
> >   0x00007f202906d6c6: hlt
>
> >   0x00007f202906d6c7: hlt
>
> >   0x00007f202906d6c8: hlt
>
> >   0x00007f202906d6c9: hlt
>
> >   0x00007f202906d6ca: hlt
>
> >   0x00007f202906d6cb: hlt
>
> >   0x00007f202906d6cc: hlt
>
> >   0x00007f202906d6cd: hlt
>
> >   0x00007f202906d6ce: hlt
>
> >   0x00007f202906d6cf: hlt
>
> >   0x00007f202906d6d0: hlt
>
> >   0x00007f202906d6d1: hlt
>
> >   0x00007f202906d6d2: hlt
>
> >   0x00007f202906d6d3: hlt
>
> >   0x00007f202906d6d4: hlt
>
> >   0x00007f202906d6d5: hlt
>
> >   0x00007f202906d6d6: hlt
>
> >   0x00007f202906d6d7: hlt
>
> >   0x00007f202906d6d8: hlt
>
> >   0x00007f202906d6d9: hlt
>
> >   0x00007f202906d6da: hlt
>
> >   0x00007f202906d6db: hlt
>
> >   0x00007f202906d6dc: hlt
>
> >   0x00007f202906d6dd: hlt
>
> >   0x00007f202906d6de: hlt
>
> >   0x00007f202906d6df: hlt
>
> > [Exception Handler]
>
> > [Stub Code]
>
> >   0x00007f202906d6e0: jmp    0x00007f202906c420  ;   {no_reloc}
>
> > [Deopt Handler Code]
>
> >   0x00007f202906d6e5: call   0x00007f202906d6ea
>
> >   0x00007f202906d6ea: sub    QWORD PTR [rsp],0x5
>
> >   0x00007f202906d6ef: jmp    0x00007f2029046fc0  ;   {runtime_call}
>
> >   0x00007f202906d6f4: hlt
>
> >   0x00007f202906d6f5: hlt
>
> >   0x00007f202906d6f6: hlt
>
> >   0x00007f202906d6f7: hlt
>
> > OopMapSet contains 0 OopMaps
>
> >
>
> > -------------------------------------------------------------
>
> > Assembly with the * 2
>
> > -------------------------------------------------------------
>
> >
>
> > CompilerOracle: print *LoopInvariantCodeMotion.doMath
>
> > Compiled method (c2)     108    4
>
> > com.loop.LoopInvariantCodeMotion::doMath (27 bytes)
>
> >  total in heap  [0x00007f6379078250,0x00007f63790784b0] = 608
>
> >  relocation     [0x00007f6379078378,0x00007f6379078380] = 8
>
> >  main code      [0x00007f6379078380,0x00007f63790783e0] = 96
>
> >  stub code      [0x00007f63790783e0,0x00007f63790783f8] = 24
>
> >  oops           [0x00007f63790783f8,0x00007f6379078400] = 8
>
> >  metadata       [0x00007f6379078400,0x00007f6379078408] = 8
>
> >  scopes data    [0x00007f6379078408,0x00007f6379078428] = 32
>
> >  scopes pcs     [0x00007f6379078428,0x00007f63790784a8] = 128
>
> >  dependencies   [0x00007f63790784a8,0x00007f63790784b0] = 8
>
> > Loaded disassembler from
> /java/jdk/jdk1.8.0_101/jre/lib/amd64/hsdis-amd64.so
>
> > Decoding compiled method 0x00007f6379078250:
>
> > Code:
>
> > [Disassembling for mach='i386:x86-64']
>
> > [Entry Point]
>
> > [Verified Entry Point]
>
> > [Constants]
>
> >   # {method} {0x00007f6380ae0488} 'doMath' '(I)I' in
>
> > 'com/loop/LoopInvariantCodeMotion'
>
> >   # parm0:    rsi       = int
>
> >   #           [sp+0x20]  (sp of caller)
>
> >   0x00007f6379078380: sub    rsp,0x18
>
> >   0x00007f6379078387: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization
>
> > entry
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at -1 (line 14)
>
> >
>
> >   0x00007f637907838c: xor    r11d,r11d
>
> >   0x00007f637907838f: test   esi,esi
>
> >   0x00007f6379078391: jle    0x00007f63790783bd  ;*if_icmpge
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> >   0x00007f6379078393: mov    r10d,esi
>
> >   0x00007f6379078396: shl    r10d,1
>
> >   0x00007f6379078399: xor    eax,eax
>
> >   0x00007f637907839b: nop    DWORD PTR [rax+rax*1+0x0]
>
> >                                                 ;*iload_0
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 9 (line 16)
>
> >
>
> >   0x00007f63790783a0: mov    r8d,r10d
>
> >   0x00007f63790783a3: add    r8d,r11d
>
> >   0x00007f63790783a6: add    eax,r8d            ;*iadd
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 17 (line 17)
>
> >
>
> >   0x00007f63790783a9: inc    r11d               ;*iinc
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 19 (line 15)
>
> >
>
> >   0x00007f63790783ac: cmp    r11d,esi
>
> >   0x00007f63790783af: jl     0x00007f63790783a0  ;*if_icmpge
>
> >                                                 ; -
>
> > com.loop.LoopInvariantCodeMotion::doMath at 6 (line 15)
>
> >
>
> >   0x00007f63790783b1: add    rsp,0x10
>
> >   0x00007f63790783b5: pop    rbp
>
> >   0x00007f63790783b6: test   DWORD PTR [rip+0xc359c44],eax        #
>
> > 0x00007f63853d2000
>
> >                                                 ;   {poll_return}
>
> >   0x00007f63790783bc: ret
>
> >   0x00007f63790783bd: xor    eax,eax
>
> >   0x00007f63790783bf: jmp    0x00007f63790783b1
>
> >   0x00007f63790783c1: hlt
>
> >   0x00007f63790783c2: hlt
>
> >   0x00007f63790783c3: hlt
>
> >   0x00007f63790783c4: hlt
>
> >   0x00007f63790783c5: hlt
>
> >   0x00007f63790783c6: hlt
>
> >   0x00007f63790783c7: hlt
>
> >   0x00007f63790783c8: hlt
>
> >   0x00007f63790783c9: hlt
>
> >   0x00007f63790783ca: hlt
>
> >   0x00007f63790783cb: hlt
>
> >   0x00007f63790783cc: hlt
>
> >   0x00007f63790783cd: hlt
>
> >   0x00007f63790783ce: hlt
>
> >   0x00007f63790783cf: hlt
>
> >   0x00007f63790783d0: hlt
>
> >   0x00007f63790783d1: hlt
>
> >   0x00007f63790783d2: hlt
>
> >   0x00007f63790783d3: hlt
>
> >   0x00007f63790783d4: hlt
>
> >   0x00007f63790783d5: hlt
>
> >   0x00007f63790783d6: hlt
>
> >   0x00007f63790783d7: hlt
>
> >   0x00007f63790783d8: hlt
>
> >   0x00007f63790783d9: hlt
>
> >   0x00007f63790783da: hlt
>
> >   0x00007f63790783db: hlt
>
> >   0x00007f63790783dc: hlt
>
> >   0x00007f63790783dd: hlt
>
> >   0x00007f63790783de: hlt
>
> >   0x00007f63790783df: hlt
>
> > [Exception Handler]
>
> > [Stub Code]
>
> >   0x00007f63790783e0: jmp    0x00007f6379075520  ;   {no_reloc}
>
> > [Deopt Handler Code]
>
> >   0x00007f63790783e5: call   0x00007f63790783ea
>
> >   0x00007f63790783ea: sub    QWORD PTR [rsp],0x5
>
> >   0x00007f63790783ef: jmp    0x00007f6379046fc0  ;   {runtime_call}
>
> >   0x00007f63790783f4: hlt
>
> >   0x00007f63790783f5: hlt
>
> >   0x00007f63790783f6: hlt
>
> >   0x00007f63790783f7: hlt
>
> > OopMapSet contains 0 OopMaps
>
> --
Sent from my phone
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20170118/59906d6a/attachment-0001.html>


More information about the hotspot-compiler-dev mailing list