Assembly output from JRuby 'fib'
Charles Oliver Nutter
headius at headius.com
Thu Apr 28 06:19:33 PDT 2011
On Thu, Apr 28, 2011 at 5:16 AM, Christian Thalinger
<christian.thalinger at oracle.com> wrote:
> I took a look at it. I used 64-bit x86 since the code is a bit smaller than with 32-bit.
>
> The code is almost identical but three things popped into my eye (the output is from PrintOptoAssembly):
>
> 1. The obvious one: the method handle call site guard:
>
> 1a4 B32: # B160 B33 <- B31 B149 B123 Freq: 0.499969
> 1a4 movq R10, byte[int:>=0]<ciObject ident=770 PERM address=0xe99088> * # ptr
> 1ae movq R10, [R10 + #1576 (32-bit)] # ptr
> 1b5 movq R11, [R10 + #32 (8-bit)] # ptr
> 1b9 movq R8, java/lang/invoke/AdapterMethodHandle:exact * # ptr
> 1c3 cmpq R11, R8 # ptr
> 1c6 jne,u B160 P=0.000000 C=-1.000000
I saw in your other email that eliminating this puts indy on par with
dynopt, which is spectacular news. Can you elaborate on how that would
be possible to do "correctly" (as in not via a hack)? Would it be a
lighter-weight check and deopt of some kind (in Hotspot), or is it
something I'd need to rig up on my code?
> 2. The dynopt version only has one class check while the indy version has two (before and after the recursive call site). This could be because of basic block layout but I'm curious why it's laid out differently:
...
> indy:
> -----
>
> 1cc B33: # B174 B34 <- B32 Freq: 0.499969
> 1cc movq R10, [rsp + #80] # spill
> 1d1 movq R10, [R10 + #8 (8-bit)] # class
> 1d5 NullCheck R10
> 1d5
> 1d5 B34: # B114 B35 <- B33 Freq: 0.499969
> 1d5 movq R10, [R10 + #64 (8-bit)] # class
> 1d9 movq R11, precise klass org/jruby/RubyBasicObject: 0x00000000011f5478:Constant:exact * # ptr
> 1e3 cmpq R10, R11 # ptr
> 1e6 jne,u B114 P=0.000001 C=-1.000000
> 1e6
> 1ec B35: # B175 B36 <- B34 Freq: 0.499968
> 1ec movq R10, [rsp + #80] # spill
> 1f1 # checkcastPP of R10
> 1f1 movq R10, [R10 + #24 (8-bit)] # ptr ! Field org/jruby/RubyBasicObject.metaClass
> 1f5 movl R11, [R10 + #44 (8-bit)] # int ! Field org/jruby/RubyModule.generation
> 1f9 NullCheck R10
> 1f9
> 1f9 B36: # B124 B37 <- B35 Freq: 0.499968
> 1f9 cmpl R11, #632
> 200 jne B124 P=0.000000 C=209925.000000200
I'll have to read through the PrintAssembly output to see if both
guards are being traversed on the fast path. Hopefully they're not...I
assume we'd see more degradation in the indy case if that were
happening, though.
I've been trying to think of ways to reduce the guard cost, since the
perf without the JRuby guard is a fair bit better (0.79 versus 0.63s
for fib(35)). The performance without guards is actually faster than
any other Ruby implementation I've yet run. One idea:
call site => SwitchPoint invalidated if Fixnum is reopened (rare) =>
GWT guarded on exact object type RubyFixnum => RubyFixnum method
This would avoid traversing the metaclass and generation fields and
doing the generation compare. This approach could also work for all
core JRuby classes. Basically, where subclasses of Array are currently
backed by the same RubyArray object, I would introduce a
RubyArraySubclass object for that purpose. That would guarantee that
only regular Array objects are RubyArray, allowing me to reduce any
invocations against Array to a switchpoint + type check.
A question: what would be the best way currently to emit the cheapest
possible type guard? There's currently no "instanceof" adapter that
can do that type check for me, so I'd be reduced to something like a
Class equality check. Basically I'm looking for the right way to emit
an exact type check that will optimize to the equivalent check Hotspot
does for virtual method invocations. Help?
- Charlie
More information about the mlvm-dev
mailing list