Assembly output from JRuby 'fib'
Christian Thalinger
christian.thalinger at oracle.com
Thu Apr 28 03:16:19 PDT 2011
On Apr 27, 2011, at 5:54 AM, Charles Oliver Nutter wrote:
> I prepared this for someone else, but I thought folks here might be
> interested in it too.
>
> This gist contains hotspot x86 (32-bit) assembly output for JRuby's
> dynopt mode and invokedynamic (on a couple-week-old OS X OpenJDK
> build). I haven't spent a lot of time investigating.
I took a look at it. I used 64-bit x86 since the code is a bit smaller than with 32-bit.
The code is almost identical but three things popped into my eye (the output is from PrintOptoAssembly):
1. The obvious one: the method handle call site guard:
1a4 B32: # B160 B33 <- B31 B149 B123 Freq: 0.499969
1a4 movq R10, byte[int:>=0]<ciObject ident=770 PERM address=0xe99088> * # ptr
1ae movq R10, [R10 + #1576 (32-bit)] # ptr
1b5 movq R11, [R10 + #32 (8-bit)] # ptr
1b9 movq R8, java/lang/invoke/AdapterMethodHandle:exact * # ptr
1c3 cmpq R11, R8 # ptr
1c6 jne,u B160 P=0.000000 C=-1.000000
2. The dynopt version only has one class check while the indy version has two (before and after the recursive call site). This could be because of basic block layout but I'm curious why it's laid out differently:
dynopt:
-------
<recursive call site>
209 B37: # B142 B38 <- B102 B36 Freq: 0.499974
209 movq R10, [rsp + #16] # spill
20e movq R11, precise klass org/jruby/RubyObject: 0x0000000000f4cf88:Constant:exact * # ptr
218 cmpq R10, R11 # ptr
21b jne,u B142 P=0.000000 C=-1.000000
21b
221 B38: # B161 B39 <- B37 Freq: 0.499974
221 movq R10, [rsp + #64] # spill
226 # checkcastPP of R10
226 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass
22a movl R10, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation
22e NullCheck R10
22e
22e B39: # B107 B40 <- B38 Freq: 0.499974
22e cmpl R10, #632
235 jne B107 P=0.000000 C=563147.000000
235
23b B40: # B162 B41 <- B39 Freq: 0.499973
23b movq R9, [rsp + #0] # spill
23f movq R10, [R9 + #16 (8-bit)] # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache
243 movq RBP, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites
247 NullCheck R10
indy:
-----
1cc B33: # B174 B34 <- B32 Freq: 0.499969
1cc movq R10, [rsp + #80] # spill
1d1 movq R10, [R10 + #8 (8-bit)] # class
1d5 NullCheck R10
1d5
1d5 B34: # B114 B35 <- B33 Freq: 0.499969
1d5 movq R10, [R10 + #64 (8-bit)] # class
1d9 movq R11, precise klass org/jruby/RubyBasicObject: 0x00000000011f5478:Constant:exact * # ptr
1e3 cmpq R10, R11 # ptr
1e6 jne,u B114 P=0.000001 C=-1.000000
1e6
1ec B35: # B175 B36 <- B34 Freq: 0.499968
1ec movq R10, [rsp + #80] # spill
1f1 # checkcastPP of R10
1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass
1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation
1f9 NullCheck R10
1f9
1f9 B36: # B124 B37 <- B35 Freq: 0.499968
1f9 cmpl R11, #632
200 jne B124 P=0.000000 C=209925.000000200
<recursive call site>
237 B40: # B86 B41 <- B39 Freq: 0.499957
237 movq R10, [RSI + #40 (8-bit)] # class
23b movq R11, precise klass org/jruby/runtime/builtin/IRubyObject: 0x00000000011ce468:Constant:exact * # ptr
245 cmpq R10, R11 # ptr
248 jne,u B86 P=0.170000 C=-1.000000
248
24e B41: # B42 <- B40 B86 Freq: 0.499957
24e # checkcastPP of RBP
24e movq [rsp + #96], RBP # spill
24e
253 B42: # B177 B43 <- B41 B161 Freq: 0.499957
253 movq R10, [rsp + #24] # spill
258 movq R10, [R10 + #16 (8-bit)] # ptr ! Field org/jruby/ast/executable/AbstractScript.runtimeCache
25c movq RBP, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/ast/executable/RuntimeCache.callSites
260 NullCheck R10
3. The dynopt version has two occurrences of the following block in the hot code path, while the indy version has three of them (this could also be because of code layout as 2.):
1ec B35: # B175 B36 <- B34 Freq: 0.499968
1ec movq R10, [rsp + #80] # spill
1f1 # checkcastPP of R10
1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass
1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation
1f9 NullCheck R10
-- Christian
>
> https://gist.github.com/943357
>
> One thing I did notice is that MaxRecursiveInlineLevel appears to be 1
> by default normally. I played with bumping it up but performance
> degraded no matter what combination of flags I used.
>
> A related question: what would it take to get the hsdis plugin
> included with openjdk proper all the time? It would be nice if
> PrintAssembly worked out of the box on all Java 7 builds.
>
> - Charlie
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
More information about the mlvm-dev
mailing list