JRuby invokedynamic updates

Wed Aug 10 21:02:28 PDT 2011

Hello everyone!

I have added a few new items to JRuby relating to invokedynamic. Let's
dive in, shall we?

1. invokedynamic-based dispatch for literal binary operators with RHS
a literal fixnum or float

This is actually on by default because it didn't seem to hurt perf
(much?) on JDK7, and it should be faster with recent patches to
Hotspot. Before this change, JRuby was still using custom call sites
for math, boolean, and bitwise operators that had a fixnum or float as
the RHS argument. The new logic propagates the literal value and
operator name to the bootstrap (reducing bytecode size). It has
appropriate guards for when the LHS is not a fixnum or float or when
fixnum or float classes have been modified (this should be SwitchPoint
in the future), with fallback using a cached fixnum or float object to
do an inline-cached call.

Initial numbers showed it to improve fib(35) from 1.25s to 1.15s on my
old linux machine (using hsx/hotspot-comp with recent patches
applied).

Invokedynamic-based "fast ops" are enabled by default, but the
property to enable/disable is jruby.invokedynamic.fastops

2. SwitchPoint-based invalidation for class modification

After seeing how fast constants are with SwitchPoint fixed, I've gone
ahead and made a first pass at using SwitchPoint for invalidation due
to class modification.

Each class now holds an Invalidator instance. Without the new logic
enabled, Invalidator just flips the generation int used for guards in
inline caches and invokedynamic calls before. With the logic, an
Invalidator is used that works using SwitchPoints. The guard in a
dispatch's GWT is then reduced to a type check (currently
cached_metaclass == object.getMetaclass), with the switch point
wrapping GWT.

It appears to work well...I'm surprised how quickly I got it wired up.

This is not enabled by default. It is definitely faster for small
benchmarks like fib, reducing the time to the 1.05s range on that same
machine. However if the benchmark is just slightly larger, performance
*tanks*.

The property to enable switchpoint-based invocation is
jruby.invokedynamic.invocation.switchpoint=true.

3. A more complex "fib" benchmark that stresses invokedynamic more

I've added bench/bench_fib_complex.rb. This runs the original fib
along with three variations:

* One that uses constants for the literals 1 and 2 in the code
* One that dispatches to other Ruby methods for the <, -, and + calls
* One that does both

Performance is perhaps most easily explained by showing the numbers:

*** no switchpoint use

normal fib
9227465
  1.177000   0.000000   1.177000 (  1.177000)
fib with constants
9227465
  3.750000   0.000000   3.750000 (  3.750000)
fib with additional calls
9227465
  1.664000   0.000000   1.664000 (  1.664000)
fib with constants and additional calls
9227465
  3.739000   0.000000   3.739000 (  3.740000)

Ok, so we have a baseline. A few notes:

* For whatever reason, constants have quite an impact on performance
here. That could be because it uses the boxed logic for all binops, or
it could simply be the overhead of the old constant cache logic.
* The method that adds additional calls for the four calls to <, -,
and + degrades by a bit less than 50%. That's not great, but it's not
especially bad either.

*** switchpoints for constant cache

normal fib
9227465
  1.170000   0.000000   1.170000 (  1.170000)
fib with constants
9227465
  2.790000   0.000000   2.790000 (  2.790000)
fib with additional calls
9227465
  1.658000   0.000000   1.658000 (  1.658000)
fib with constants and additional calls
9227465
  3.181000   0.000000   3.181000 (  3.180000)

Obvious improvement here from the SwitchPoint-based constants, but not
as much as I'd like to see. Christian: This would be a good benchmark
for you to use to test non-elidable constant access...it obviously
still degrades a lot, and it would be good to know if that's
invokedynamic stuff or just JRuby. I will say that the boxed math
operators do more logic to determine what type the argument is, and
because we're using constants instead of literals that's the logic we
use.

*** switchpoints for class-modification call site invalidation

normal fib
9227465
  1.128000   0.000000   1.128000 (  1.128000)
fib with constants
9227465
  3.436000   0.000000   3.436000 (  3.436000)
fib with additional calls
9227465
  5.724000   0.000000   5.724000 (  5.725000)
fib with constants and additional calls
9227465
 12.419000   0.000000  12.419000 ( 12.419000)
normal fib

Woah nelly!

You can see there's a small improvement to the normal case, showing
that the switchpoint-based invalidation is at least working and not
hurting things there. Also the version that just adds constants for
literals is about the same as it was in the first run. However the
others are *terrible*. There's gotta be some failure to inline causing
these terrible numbers.

*** switchpoints for both constants and method invalidation

normal fib
9227465
  1.070000   0.000000   1.070000 (  1.071000)
fib with constants
9227465
  2.590000   0.000000   2.590000 (  2.590000)
fib with additional calls
9227465
  5.667000   0.000000   5.667000 (  5.667000)
fib with constants and additional calls
9227465
 11.863000   0.000000  11.863000 ( 11.863000)

Final numbers for the bad cases are about the same, and we have the
improvement for the first two cases.

So for the larger cases, it seems like things fall off very quickly. I
hope this is simply because MH chains' bytecode are still counting
against jitting thresholds, because fib is *not* a very big method. I
also hope that we can do better for degraded cases.

- Charlie