Assembly output from JRuby 'fib'

Thu Apr 28 07:50:22 PDT 2011

On Thu, Apr 28, 2011 at 9:17 AM, Christian Thalinger
<christian.thalinger at oracle.com> wrote:
> I have now a patch that makes the command line switch tweaking superfluous and the default performance looks pretty good (see below, 32-bit x86).

That's excellent! I can't wait to see that land.

A couple notes I haven't included in other emails:

* +PrintInlining doesn't seem to be working properly on any OpenJDK7
builds I run. I have to use LogCompilation.
* With +PrintInlining on, I see many lines like "discounting inlining
depth from  to". I'm guessing that's some preliminary
depth-discounting logic in invokedynamic?

I've been running with the macosx port builds lately, and they seem to
be fairly up-to-date and working very well for me. I'm not sure how
far behind mainstream or bsd-port openjdk, but hopefully development
on indy is starting to coalesce back toward a single codebase. Perhaps
we won't need Stephen's builds soon :)

> Charlie, what benchmark could I use for more real world application numbers?  bench_string_ops.rb sounds and looks promising.  With bench_string_ops.rb I see that dynopt performance isn't always better than "normal" (whatever normal is).  I guess that is expected?

My theory about why dynopt degrades perf sometimes:

The biggest problem with dynopt (I think) is that it easily doubles
the in-body bytecode count for dynamic invocations. Compare:

https://gist.github.com/946485

Ruby is such a terse language, we frequently see very large code
bodies...which compounds the bloating effect of dynopt. Invokedynamic
(or simple JRuby CallSite) invocation will often perform better than
dynopt simply because there's so much more bytecode being generated.
Logical?

bench_string_ops is good. bench_richards would test polymorphic
dispatch, which is currently *very* slow in the JRuby indy logic
(recreating the full MH chain *every time*. I may talk to Tom Enebo
about this today...ideally we'd see no cases where indy comes in
slower than CallSite dispatching, but it would be good to start using
real-world benchmarks rather than fib to measure this (especially now
that we've achieved close to a theoretical maximum in dispatch
performance).

Will get back to you about that.

- Charlie