Significant indy slowdown on JRuby + redblack bench

Charles Oliver Nutter headius at headius.com
Thu Feb 24 09:49:47 PST 2011


On Thu, Feb 24, 2011 at 7:23 AM, Stephen Bannasch
<stephen.bannasch at deanbrook.org> wrote:
> Charlie, can you give me a bit of context.
>
>   " I'd like to have indy be competitive with my dynopt logic,
>     since they are structurally identical"
>
> Remind me what's the difference between indy and dynopt?
>
> Which one relates to what will be in java 1.7 when it is first released?
>
> I remember JRuby previously running faster using mlvm.

Perhaps this is useful for the whole list.

Over the years, we've gone through many dispatch mechanisms in JRuby.
I'll describe the ones currently available first.

1. Normally JRuby still dispatches all calls via
org.jruby.runtime.CallSite subclasses, generally via some subclass of
CachingCallSite
(https://github.com/jruby/jruby/blob/master/src/org/jruby/runtime/callsite/CachingCallSite.java).
CachingCallSite is a monomorphic cache, holding a tuple of an integer
and a JRuby "DynamicMethod" reference. DynamicMethod is basically our
method handle. The integer is derived from a "serial number" of the
class we cached from at the time of caching. Guarding the site then is
a matter of comparing the incoming receiver's class's serial number
with the cached serial number. Invocations from Ruby code pass
*through* the call site via one of the CallSite.call methods, which
defeats inlining on current JVMs, so we're not achieving the best perf
possible.

2. A second mode uses largely the same mechanism, but instead of
calling *through* CallSite it pulls the DynamicMethod all the way back
to the Ruby call site in JVM bytecode and invokes it there. This
allows the target call to inline in many cases (since we generate a
unique DynamicMethod handle class per Ruby method), but because
DynamicMethod itself introduces a lot of extra logic and extra stack
frames, it doesn't inline as well as a direct call could.

3. The newer "dynopt" mechanism uses the same guard mechanism again,
but instead of dispatching through CallSite or DynamicMethod it often
dispatches *directly* to the target Java method. Essentially, if the
method cached by the interpreter has a JVM method somewhere, dynopt
emits the guard with the success path making a direct invokevirtual or
invokestatic to the target JVM method, and the fail path using
CallSite logic. This allows code to inline extremely well; "fib" and
"tak" performance nearly match Java performance (for a Java version
that also uses RubyFixnum objects) and recursive calls inline straight
through. But inlining other Ruby methods is trickier, since we usually
load them into their own classloaders (and it's not possible to emit
invokevirtual or invokestatic calls to methods loaded in sibling
classloaders).

4. The recent work on invokedynamic leverages the dynopt work. The
guard is again the same as in the above three cases; I have not yet
explored MutableCallSite.sync for actively invalidating (thereby
eliminating the guard). The success path from GWT is one of two pieces
of logic: either a direct handle to the target method, just like
dynopt; or a handle to a DynamicMethod object. The eventual goal is
for dispatch to be all direct handles with any logic currently in
DynamicMethod moved into additional handles decorating the target.

So back to your original question...

I expect (hope?) that invokedynamic will eventually perform as well as
dynopt because:

* Both use the same guard logic
* Both have direct paths to target methods

Logically, both mechanisms are identical, with the only real
difference being that dynopt is essentially "pre-inlined"
invokedynamic logic. Currently, mechanism 4 (invokedynamic as it is
used in JRuby today) is sometimes faster than mechanism 1 (CallSite),
but comes nowhere near mechanism 3 (dynopt). I would provide numbers,
but I don't have Christian's fixes for recursive indy calls. And of
course, algorithms like redblack are still having some perf issues
with indy, running about 2x slower than standard CallSite-based
dispatch.

Excluding indy for now, here are numbers for 1, 2, 3 above plus a
version running with dynopt and no guards (in theory, what I would
expect from indy + MutableCallSite.sync to eliminate guards there). I
use fib(35) here because it shows well how inlining helps performance.

https://gist.github.com/842528

- Charlie


More information about the mlvm-dev mailing list