More performance explorations

Thu May 26 18:40:13 PDT 2011

Another data point...bench_method_dispatch_only. Review: this bench
benchmarks the overhead of a while loop and dynamic invocations of a
trivial Ruby method. It is intended to benchmark (mostly) the overhead
of making a dyncall, and the ability of the VM to optimize away
mostly-dead logic.

The benchmark looks like this:

def foo
  self
end

def invoking
  i = 0;
  while i < 1000000
    foo; foo; foo; foo; foo; foo; foo; foo; foo; foo;
    i += 1;
  end
end
Here's results on 5/13 macosx and current mlvm:

~/projects/jruby ➔ jruby --server -J-d32
-Xinvokedynamic.constants=false
bench/language/bench_method_dispatch_only.rb 10
Test ruby method: 1000k loops calling self's foo 10 times
  0.337000   0.000000   0.337000 (  0.279000)
  0.102000   0.000000   0.102000 (  0.102000)
  0.084000   0.000000   0.084000 (  0.084000)
  0.087000   0.000000   0.087000 (  0.087000)
  0.083000   0.000000   0.083000 (  0.083000)
  0.084000   0.000000   0.084000 (  0.084000)
  0.085000   0.000000   0.085000 (  0.085000)
  0.082000   0.000000   0.082000 (  0.082000)
  0.083000   0.000000   0.083000 (  0.083000)
  0.083000   0.000000   0.083000 (  0.083000)

~/projects/jruby ➔ pickjdk 3
New JDK: 1.7.0.mlvm

~/projects/jruby ➔ jruby --server -J-d32
-Xinvokedynamic.constants=false
bench/language/bench_method_dispatch_only.rb 10
Test ruby method: 1000k loops calling self's foo 10 times
  0.452000   0.000000   0.452000 (  0.377000)
  0.115000   0.000000   0.115000 (  0.115000)
  0.099000   0.000000   0.099000 (  0.099000)
  0.094000   0.000000   0.094000 (  0.095000)
  0.095000   0.000000   0.095000 (  0.095000)
  0.094000   0.000000   0.094000 (  0.094000)
  0.097000   0.000000   0.097000 (  0.097000)
  0.095000   0.000000   0.095000 (  0.095000)
  0.095000   0.000000   0.095000 (  0.095000)
  0.097000   0.000000   0.097000 (  0.097000)

In an ideal world, the calls to foo would mostly disappear, since they
have no net effect. In actuality, Hotspot emits all the logic for the
guards and that's largely what makes up the cost of each call. There's
also some cost from the loop itself (about 40%), which is a Fixnum
loop (so it constructs about 1M RubyFixnum objects).

dynopt is interesting for comparison:

~/projects/jruby ➔ jruby --server -J-d32 -Xjit.threshold=2
-Xcompile.invokedynamic=false -Xcompile.dynopt=true
-Xinvokedynamic.constants=false
bench/language/bench_method_dispatch_only.rb 10
Test ruby method: 1000k loops calling self's foo 10 times
  0.617000   0.000000   0.617000 (  0.541000)
  0.096000   0.000000   0.096000 (  0.095000)
  0.066000   0.000000   0.066000 (  0.067000)
  0.053000   0.000000   0.053000 (  0.054000)
  0.053000   0.000000   0.053000 (  0.053000)
  0.055000   0.000000   0.055000 (  0.055000)
  0.047000   0.000000   0.047000 (  0.047000)
  0.054000   0.000000   0.054000 (  0.054000)
  0.048000   0.000000   0.048000 (  0.048000)
  0.047000   0.000000   0.047000 (  0.047000)

In this case, it ends up being about 2x faster than invokedynamic.
Why? Beats me.

The jit.threshold=2 is to force the "invoking" method to jit sooner,
since dynopt does not compile the target script to bytecode
immediately.

- Charlie

On Thu, May 26, 2011 at 1:43 AM, Charles Oliver Nutter
<headius at headius.com> wrote:
> On Thu, May 26, 2011 at 1:34 AM, Charles Oliver Nutter
> <headius at headius.com> wrote:
>> Ok, here we go with the macosx build from 5/13. Performance is
>> *substantially* better.
>
> It's worth mentioning that the 5/13 build results are only slightly
> slower than our "ideal", JRuby's dynopt mode. That made me water at
> the mouth a bit, which is why I'm so eager to help get back to that
> performance with current logic.
>
> - Charlie
>