Early JRuby indy results on recursive "fib"

Sun Aug 23 12:11:15 PDT 2009

Ok, so I've got Christian's inlining stuff enabled and working, and I
can report that things are definitely looking up.

I made one modification to JRuby: removing the logic to do an unboxed
'long' call, since it goes around normal dynamic dispatch and always
uses an inline caching mechanism. The appropriate change in the future
would be to have unboxed paths through indy as well. For now, I just
commented that logic out.

The bench_recursive_fib.rb script in JRuby's repo is the slow
dual-recursing fib impl, running to fib(30).

Here's running bench_recursive_fib without indy:

~/projects/jruby ➔ jruby --server bench/bench_fib_recursive.rb 100
0.432000   0.000000   0.432000 (  0.385000)
  0.234000   0.000000   0.234000 (  0.234000)
  0.228000   0.000000   0.228000 (  0.228000)
  0.227000   0.000000   0.227000 (  0.227000)
  0.229000   0.000000   0.229000 (  0.229000)
  0.230000   0.000000   0.230000 (  0.230000)
  0.231000   0.000000   0.231000 (  0.231000)
  0.231000   0.000000   0.231000 (  0.231000)
  0.226000   0.000000   0.226000 (  0.226000)
  0.226000   0.000000   0.226000 (  0.226000)
  0.226000   0.000000   0.226000 (  0.225000)
  0.226000   0.000000   0.226000 (  0.226000)

It also deoptimizes down to around 0.40 later when another piece of
code compiles and causes a number of call paths to go polymorphic.

Now here's the same thing with indy:

~/projects/jruby ➔ jruby --server -J-Djruby.compile.invokedynamic=true
-J-XX:InlineSmallCode=1500 -J-XX:MaxInlineSize=50
-J-XX:+EnableInvokeDynamic bench/bench_fib_recursive.rb 100
  0.437000   0.000000   0.437000 (  0.402000)
  0.206000   0.000000   0.206000 (  0.206000)
  0.208000   0.000000   0.208000 (  0.208000)
  0.211000   0.000000   0.211000 (  0.211000)
  0.205000   0.000000   0.205000 (  0.205000)
  0.206000   0.000000   0.206000 (  0.206000)
  0.207000   0.000000   0.207000 (  0.207000)
  0.207000   0.000000   0.207000 (  0.207000)
  0.207000   0.000000   0.207000 (  0.207000)

The best time I saw was 0.202, and more importantly it never
deoptimized as additional Ruby code jitted!

I tested a few other things and did not see performance gains, but
only saw moderate degradation in most cases. I'll need to look at each
in turn and see what might be causing the slowdown. The truth is that
fib is a much simpler microbench than most of the others, so it's easy
to ensure it's optimizing correctly.

But make no mistake...this is the first time we've had JRuby
performing better with indy than with our built-in logic. And even
more exciting: I don't think this is actually inlining the dynamic
calls, eventually still doing a slow virtual call to the target body
of code. My next phase of indy work will bind the last handle in the
chain to a DMH that goes to the actual code body, which should allow
it to inline all the way through.

Excellent work, Christian! And thank you for the flag
suggestions...I'm very excited to be able to reproduce your JRuby fib
results locally!

FYI, here's the indy and non-indy bytecode I'm outputting for fib:
http://gist.github.com/173403

- Charlie