Invokedynamic updates for JRuby

Charles Oliver Nutter headius at headius.com
Tue Jun 14 08:02:50 PDT 2011


I need to start doing a TOC for my emails.

** 1. JRuby invokedynamic updates
** 2. Latest performance observations
** 3. Performance ideas going forward

Ok. Here we go.

** 1. JRuby invokedynamic updates

I've been landing a ton of additional call paths and logic surrounding
JRuby's various dispatch paths. The following is a list of all paths
that will bind directly through with MHs right now:

Enabled:
* matching arity <= 3 calls from Ruby to core class (Java) methods
that don't require framing
* matching arity <= 3 calls from Ruby to precompiled (!!!) Ruby
methods that don't require framing
* zero-arity calls from Ruby to Java methods (via Java integration layer)
* Attribute reader/writer methods (property accessors, basically)

Disabled (for perf or incompleteness):
* Math operator invocations with literal fixnum RHS (incomplete: no guards)

Most other methods will bind indirectly through a DynamicMethod shim.
Because this indirect binding is still slower than inline caching, it
is disabled. A few paths of note that are not binding via indy at all:

* super invocations (Ruby super is by-name and sometimes carries frame
state with it)
* More complex operator assignments like foo[a] += b

Overall, things are shaping up very nicely. The enabled paths are all
faster than dispatching without indy. Where appropriate, they are near
"dynopt" speeds.

** 2. Latest performance observations

Last night I made the discovery that a volatile field on every JRuby
object was being initialized on object construction to an empty array,
to ease downstream logic. Removing that early initialization and
replacing it with null checks later seems to have drastically improved
the performance of several benchmarks, including small ones like fib.
Because of this, performance numbers I report in the future will be
skewed terribly from previous numbers. But that's a good thing :)

Overall, relative performance of indy versus dynopt has not changed a
great deal. indy still is slightly behind dynopt but usually well
ahead of non-indy inline caching.

I have started to notice on some methods that I'm hitting the
NodeCountInliningCutoff more often/more quickly, and it seems to be
due to the extra inlining that MHs enable. The sooner I hit that
cutoff, the more performance tanks. I'm a little worried about finding
a balance between indy + inlining versus other techniques to reduce
bytecode size. Some of these Ruby methods are rather large, and now
they're essentially *all* invokedynamic from top to bottom. The
related logic isn't necessarily complex, but there's a lot of it.

Larger and larger benchmarks are starting to show gains from
invokedynamic. But when I cross that threshold, performance goes into
the toilet. Which brings me to the last section.

** 3. Performance ideas going forward

For my part, I'm probably going to be making some trade-offs in the future:

* Estimating code size and opting not to use invokedynamic if it seems
like it's going to push node counts too high for the method itself to
perform well (i.e. if it continues too much to inline too early)
* Disabling the use of invokedynamic altogether on OpenJDK's "client"
compiler, since I'm hearing reports that it performs *terribly* there
(I don't have a non-64-bit build to try at the moment). Tiered may be
a band-aid?

I've been thinking about both of these. Ideally we could continue to
use invokedynamic everywhere, even when it pushes beyond inlining
limits, but it needs to degrade more gracefully. Today, if you hit one
of those limits, it seems like inlining basically just stops and
you've got chains of really, really slow MH logic that make reflection
look fast. This also applies to the client compiler...if it's not
going to inline, then an MH chain should at the very least generate a
unique per-site stub class so we're not juggling args and boxing and
casts heavily.

I know much of this work will happen after 1.7.0, and that's fine.
We've already made the decision to hold off on any JRuby 1.7 release
(the "invokedynamic" version) until one or two OpenJDK updates have
been released, so that we're better characterizing indy's true
potential. Perhaps by JavaOne we'll have more impressive results to
show?

- Charlie


More information about the mlvm-dev mailing list