More performance explorations

Sun Jun 5 10:21:41 PDT 2011

Here's a little ray of sunshine to temper all my grousing about
performance: a benchmark that is comfortably faster than non-indy, and
shows the down side of dynopt (specifically, that it slows down
nontrivial benchmarks, presumably due to excessive bytecode size):

bench_evanphx_goruco.rb is a benchmark created by Evan Phoenix of the
Rubinius project, another Ruby implementation for which Evan has built
an optimizing mixed-mode JIT and generational GC. Rubinius currently
does a better job of optimizing Ruby code (than JRuby) due largely to
its ability to inline Ruby code (where stock JRuby never does).
invokedynamic has and (probably) will continue to help us match or
exceed Rubinius's raw Ruby execution performance.

This benchmark is mostly an object-creation bench, to stress
allocation and GC. In JRuby, however, the overhead of dispatch comes
through in many places, and invokedynamic appears to help a good
amount over stock CachingCallSite dispatch (and it's considerably
better than dynopt):

INDY:

~/projects/jruby ➔ jruby bench/bench_evanphx_goruco.rb
 11.300000   0.000000  11.300000 ( 11.249000)
 10.026000   0.000000  10.026000 ( 10.026000)
 10.184000   0.000000  10.184000 ( 10.184000)
 10.907000   0.000000  10.907000 ( 10.906000)
 10.379000   0.000000  10.379000 ( 10.378000)

NON-INDY:

~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
bench/bench_evanphx_goruco.rb
 12.500000   0.000000  12.500000 ( 12.448000)
 11.454000   0.000000  11.454000 ( 11.454000)
 11.910000   0.000000  11.910000 ( 11.909000)
 11.305000   0.000000  11.305000 ( 11.305000)
 11.331000   0.000000  11.331000 ( 11.331000)

DYNOPT:

~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
-Xcompile.dynopt=true bench/bench_evanphx_goruco.rb
 12.982000   0.000000  12.982000 ( 12.887000)
 12.363000   0.000000  12.363000 ( 12.363000)
 12.431000   0.000000  12.431000 ( 12.431000)
 12.490000   0.000000  12.490000 ( 12.490000)
 12.344000   0.000000  12.344000 ( 12.344000)

Nice results, and I know there's tons of improvements possible for the
hot paths in this benchmark (both in JRuby and in Hotspot).

Incidentally, my MLVM build is a good 25-30% faster than Java 6, even
without invokedynamic use. Kudos to the entire Hotspot team for making
every release almost inexplicably "just faster" than previous
versions.

- Charlie