Assembly output from JRuby 'fib'

Christian Thalinger christian.thalinger at oracle.com
Thu Apr 28 03:16:19 PDT 2011


On Apr 27, 2011, at 5:54 AM, Charles Oliver Nutter wrote:
> I prepared this for someone else, but I thought folks here might be
> interested in it too.
> 
> This gist contains hotspot x86 (32-bit) assembly output for JRuby's
> dynopt mode and invokedynamic (on a couple-week-old OS X OpenJDK
> build). I haven't spent a lot of time investigating.

I took a look at it.  I used 64-bit x86 since the code is a bit smaller than with 32-bit.

The code is almost identical but three things popped into my eye (the output is from PrintOptoAssembly):

1. The obvious one:  the method handle call site guard:

1a4   B32: #    B160 B33 <- B31 B149 B123  Freq: 0.499969
1a4     movq    R10, byte[int:>=0]<ciObject ident=770 PERM address=0xe99088> *  # ptr
1ae     movq    R10, [R10 + #1576 (32-bit)]     # ptr
1b5     movq    R11, [R10 + #32 (8-bit)]        # ptr
1b9     movq    R8, java/lang/invoke/AdapterMethodHandle:exact *        # ptr
1c3     cmpq    R11, R8 # ptr
1c6     jne,u  B160  P=0.000000 C=-1.000000


2. The dynopt version only has one class check while the indy version has two (before and after the recursive call site).  This could be because of basic block layout but I'm curious why it's laid out differently:

dynopt:
-------

<recursive call site>

209   B37: #    B142 B38 <- B102 B36  Freq: 0.499974
209     movq    R10, [rsp + #16]        # spill
20e     movq    R11, precise klass org/jruby/RubyObject: 0x0000000000f4cf88:Constant:exact *    # ptr
218     cmpq    R10, R11        # ptr
21b     jne,u  B142  P=0.000000 C=-1.000000
21b
221   B38: #    B161 B39 <- B37  Freq: 0.499974
221     movq    R10, [rsp + #64]        # spill
226     # checkcastPP of R10
226     movq    R10, [R10 + #24 (8-bit)]        # ptr ! Field org/jruby/RubyBasicObject.metaClass
22a     movl    R10, [R10 + #44 (8-bit)]        # int ! Field org/jruby/RubyModule.generation
22e     NullCheck R10
22e
22e   B39: #    B107 B40 <- B38  Freq: 0.499974
22e     cmpl    R10, #632
235     jne     B107  P=0.000000 C=563147.000000
235
23b   B40: #    B162 B41 <- B39  Freq: 0.499973
23b     movq    R9, [rsp + #0]  # spill
23f     movq    R10, [R9 + #16 (8-bit)] # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache
243     movq    RBP, [R10 + #24 (8-bit)]        # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites
247     NullCheck R10


indy:
-----

1cc   B33: #    B174 B34 <- B32  Freq: 0.499969
1cc     movq    R10, [rsp + #80]        # spill
1d1     movq    R10, [R10 + #8 (8-bit)] # class
1d5     NullCheck R10
1d5
1d5   B34: #    B114 B35 <- B33  Freq: 0.499969
1d5     movq    R10, [R10 + #64 (8-bit)]        # class
1d9     movq    R11, precise klass org/jruby/RubyBasicObject: 0x00000000011f5478:Constant:exact *       # ptr
1e3     cmpq    R10, R11        # ptr
1e6     jne,u  B114  P=0.000001 C=-1.000000
1e6
1ec   B35: #    B175 B36 <- B34  Freq: 0.499968
1ec     movq    R10, [rsp + #80]        # spill
1f1     # checkcastPP of R10
1f1     movq    R10, [R10 + #24 (8-bit)]        # ptr ! Field org/jruby/RubyBasicObject.metaClass
1f5     movl    R11, [R10 + #44 (8-bit)]        # int ! Field org/jruby/RubyModule.generation
1f9     NullCheck R10
1f9
1f9   B36: #    B124 B37 <- B35  Freq: 0.499968
1f9     cmpl    R11, #632
200     jne     B124  P=0.000000 C=209925.000000200

<recursive call site>

237   B40: #    B86 B41 <- B39  Freq: 0.499957
237     movq    R10, [RSI + #40 (8-bit)]        # class
23b     movq    R11, precise klass org/jruby/runtime/builtin/IRubyObject: 0x00000000011ce468:Constant:exact *   # ptr
245     cmpq    R10, R11        # ptr
248     jne,u  B86  P=0.170000 C=-1.000000
248
24e   B41: #    B42 <- B40 B86  Freq: 0.499957
24e     # checkcastPP of RBP
24e     movq    [rsp + #96], RBP        # spill
24e
253   B42: #    B177 B43 <- B41 B161  Freq: 0.499957
253     movq    R10, [rsp + #24]        # spill
258     movq    R10, [R10 + #16 (8-bit)]        # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache
25c     movq    RBP, [R10 + #24 (8-bit)]        # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites
260     NullCheck R10


3. The dynopt version has two occurrences of the following block in the hot code path, while the indy version has three of them (this could also be because of code layout as 2.):

1ec   B35: #    B175 B36 <- B34  Freq: 0.499968
1ec     movq    R10, [rsp + #80]        # spill
1f1     # checkcastPP of R10
1f1     movq    R10, [R10 + #24 (8-bit)]        # ptr ! Field org/jruby/RubyBasicObject.metaClass
1f5     movl    R11, [R10 + #44 (8-bit)]        # int ! Field org/jruby/RubyModule.generation
1f9     NullCheck R10

-- Christian

> 
> https://gist.github.com/943357
> 
> One thing I did notice is that MaxRecursiveInlineLevel appears to be 1
> by default normally. I played with bumping it up but performance
> degraded no matter what combination of flags I used.
> 
> A related question: what would it take to get the hsdis plugin
> included with openjdk proper all the time? It would be nice if
> PrintAssembly worked out of the box on all Java 7 builds.
> 
> - Charlie
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev




More information about the mlvm-dev mailing list