Assembly output from JRuby 'fib'

Christian Thalinger christian.thalinger at oracle.com
Fri Apr 29 01:11:55 PDT 2011


On Apr 28, 2011, at 4:50 PM, Charles Oliver Nutter wrote:
> On Thu, Apr 28, 2011 at 9:17 AM, Christian Thalinger
> <christian.thalinger at oracle.com> wrote:
>> I have now a patch that makes the command line switch tweaking superfluous and the default performance looks pretty good (see below, 32-bit x86).
> 
> That's excellent! I can't wait to see that land.
> 
> A couple notes I haven't included in other emails:
> 
> * +PrintInlining doesn't seem to be working properly on any OpenJDK7
> builds I run. I have to use LogCompilation.

Today I noticed that PrintInlining is broken in product builds.  I have a fix for that too.

[Update:  I just saw that Vladimir already has a fix for that in his recent webrev.]

> * With +PrintInlining on, I see many lines like "discounting inlining
> depth from  to". I'm guessing that's some preliminary
> depth-discounting logic in invokedynamic?

Right.  John added that code and the output but I'd like to remove the output again since it's very confusing.

> 
> I've been running with the macosx port builds lately, and they seem to
> be fairly up-to-date and working very well for me. I'm not sure how
> far behind mainstream or bsd-port openjdk, but hopefully development
> on indy is starting to coalesce back toward a single codebase. Perhaps
> we won't need Stephen's builds soon :)

Lately bsd-port is much more frequently updated.  I'm always working on mainstream.

> 
>> Charlie, what benchmark could I use for more real world application numbers?  bench_string_ops.rb sounds and looks promising.  With bench_string_ops.rb I see that dynopt performance isn't always better than "normal" (whatever normal is).  I guess that is expected?
> 
> My theory about why dynopt degrades perf sometimes:
> 
> The biggest problem with dynopt (I think) is that it easily doubles
> the in-body bytecode count for dynamic invocations. Compare:
> 
> https://gist.github.com/946485
> 
> Ruby is such a terse language, we frequently see very large code
> bodies...which compounds the bloating effect of dynopt. Invokedynamic
> (or simple JRuby CallSite) invocation will often perform better than
> dynopt simply because there's so much more bytecode being generated.
> Logical?

Sounds logical.

> 
> bench_string_ops is good. bench_richards would test polymorphic
> dispatch, which is currently *very* slow in the JRuby indy logic
> (recreating the full MH chain *every time*. I may talk to Tom Enebo
> about this today...ideally we'd see no cases where indy comes in
> slower than CallSite dispatching, but it would be good to start using
> real-world benchmarks rather than fib to measure this (especially now
> that we've achieved close to a theoretical maximum in dispatch
> performance).

I agree.  And the numbers for bench_string_ops look really good (see below).

(Sorry, it's a bit difficult to read when the numbers are pasted that way but I'm too lazy to calculate performance improvement in percent :-)

But the gist is that invokedynamic is faster on all benchmarks than dynopt (except the first one which is equally fast on all three configurations) and a little slower on the last three than normal CallSite dispatching.  And maybe that small slowdown goes away with the inlining guard.

But I'm very excited about that result!

-- Christian

$ bin/jruby.sh --server -Xcompile.dynopt=false bench/bench_string_ops.rb 
Measure string array sort time
  2.793000   0.000000   2.793000 (  2.761000)
  2.645000   0.000000   2.645000 (  2.644000)
  2.650000   0.000000   2.650000 (  2.649000)
  2.632000   0.000000   2.632000 (  2.632000)
  2.622000   0.000000   2.622000 (  2.622000)
Measure hash put time
  0.227000   0.000000   0.227000 (  0.227000)
  0.102000   0.000000   0.102000 (  0.102000)
  0.103000   0.000000   0.103000 (  0.103000)
  0.101000   0.000000   0.101000 (  0.101000)
  0.101000   0.000000   0.101000 (  0.101000)
Measure hash get time (note: not same scale as put test)
  0.366000   0.000000   0.366000 (  0.366000)
  0.301000   0.000000   0.301000 (  0.301000)
  0.302000   0.000000   0.302000 (  0.302000)
  0.304000   0.000000   0.304000 (  0.304000)
  0.302000   0.000000   0.302000 (  0.302000)
Measure string == comparison time
  0.299000   0.000000   0.299000 (  0.299000)
  0.199000   0.000000   0.199000 (  0.199000)
  0.196000   0.000000   0.196000 (  0.196000)
  0.196000   0.000000   0.196000 (  0.197000)
  0.197000   0.000000   0.197000 (  0.197000)
Measure string == comparison time, different last pos
  0.329000   0.000000   0.329000 (  0.329000)
  0.240000   0.000000   0.240000 (  0.240000)
  0.237000   0.000000   0.237000 (  0.237000)
  0.239000   0.000000   0.239000 (  0.239000)
  0.407000   0.000000   0.407000 (  0.407000)
Measure string <=> comparison time
  0.408000   0.000000   0.408000 (  0.408000)
  0.351000   0.000000   0.351000 (  0.351000)
  0.351000   0.000000   0.351000 (  0.351000)
  0.349000   0.000000   0.349000 (  0.349000)
  0.349000   0.000000   0.349000 (  0.349000)
Measure 'string'.index(fixnum) time
  0.359000   0.000000   0.359000 (  0.358000)
  0.299000   0.000000   0.299000 (  0.299000)
  0.299000   0.000000   0.299000 (  0.299000)
  0.299000   0.000000   0.299000 (  0.299000)
  0.299000   0.000000   0.299000 (  0.299000)
Measure 'string'.index(str) time
  0.472000   0.000000   0.472000 (  0.473000)
  0.406000   0.000000   0.406000 (  0.406000)
  0.405000   0.000000   0.405000 (  0.405000)
  0.406000   0.000000   0.406000 (  0.406000)
  0.405000   0.000000   0.405000 (  0.405000)
Measure 'string'.rindex(fixnum) time
  0.304000   0.000000   0.304000 (  0.304000)
  0.254000   0.000000   0.254000 (  0.254000)
  0.254000   0.000000   0.254000 (  0.254000)
  0.254000   0.000000   0.254000 (  0.254000)
  0.255000   0.000000   0.255000 (  0.255000)
Measure 'string'.rindex(str) time
  0.511000   0.000000   0.511000 (  0.511000)
  0.447000   0.000000   0.447000 (  0.447000)
  0.447000   0.000000   0.447000 (  0.447000)
  0.447000   0.000000   0.447000 (  0.447000)
  0.447000   0.000000   0.447000 (  0.447000)

$ bin/jruby.sh --server -Xcompile.dynopt=true bench/bench_string_ops.rb 
Measure string array sort time
  2.745000   0.000000   2.745000 (  2.710000)
  2.626000   0.000000   2.626000 (  2.626000)
  2.631000   0.000000   2.631000 (  2.632000)
  2.637000   0.000000   2.637000 (  2.637000)
  2.630000   0.000000   2.630000 (  2.630000)
Measure hash put time
  0.212000   0.000000   0.212000 (  0.212000)
  0.129000   0.000000   0.129000 (  0.129000)
  0.128000   0.000000   0.128000 (  0.128000)
  0.127000   0.000000   0.127000 (  0.127000)
  0.126000   0.000000   0.126000 (  0.126000)
Measure hash get time (note: not same scale as put test)
  0.551000   0.000000   0.551000 (  0.551000)
  0.538000   0.000000   0.538000 (  0.538000)
  0.540000   0.000000   0.540000 (  0.540000)
  0.538000   0.000000   0.538000 (  0.538000)
  0.535000   0.000000   0.535000 (  0.536000)
Measure string == comparison time
  0.530000   0.000000   0.530000 (  0.530000)
  0.526000   0.000000   0.526000 (  0.526000)
  0.526000   0.000000   0.526000 (  0.526000)
  0.527000   0.000000   0.527000 (  0.527000)
  0.528000   0.000000   0.528000 (  0.528000)
Measure string == comparison time, different last pos
  0.561000   0.000000   0.561000 (  0.560000)
  0.568000   0.000000   0.568000 (  0.569000)
  0.569000   0.000000   0.569000 (  0.569000)
  0.570000   0.000000   0.570000 (  0.570000)
  0.732000   0.000000   0.732000 (  0.732000)
Measure string <=> comparison time
  0.525000   0.000000   0.525000 (  0.525000)
  0.511000   0.000000   0.511000 (  0.511000)
  0.508000   0.000000   0.508000 (  0.508000)
  0.508000   0.000000   0.508000 (  0.508000)
  0.508000   0.000000   0.508000 (  0.508000)
Measure 'string'.index(fixnum) time
  0.418000   0.000000   0.418000 (  0.418000)
  0.390000   0.000000   0.390000 (  0.390000)
  0.392000   0.000000   0.392000 (  0.392000)
  0.391000   0.000000   0.391000 (  0.391000)
  0.390000   0.000000   0.390000 (  0.390000)
Measure 'string'.index(str) time
  0.582000   0.000000   0.582000 (  0.583000)
  0.559000   0.000000   0.559000 (  0.560000)
  0.561000   0.000000   0.561000 (  0.561000)
  0.558000   0.000000   0.558000 (  0.558000)
  0.559000   0.000000   0.559000 (  0.559000)
Measure 'string'.rindex(fixnum) time
  0.375000   0.000000   0.375000 (  0.376000)
  0.361000   0.000000   0.361000 (  0.361000)
  0.360000   0.000000   0.360000 (  0.360000)
  0.361000   0.000000   0.361000 (  0.361000)
  0.361000   0.000000   0.361000 (  0.361000)
Measure 'string'.rindex(str) time
  0.622000   0.000000   0.622000 (  0.622000)
  0.604000   0.000000   0.604000 (  0.604000)
  0.605000   0.000000   0.605000 (  0.605000)
  0.605000   0.000000   0.605000 (  0.605000)
  0.603000   0.000000   0.603000 (  0.603000)

$ bin/jruby.sh --server -Xcompile.invokedynamic=true bench/bench_string_ops.rb 
Measure string array sort time
  2.743000   0.000000   2.743000 (  2.712000)
  2.616000   0.000000   2.616000 (  2.616000)
  2.636000   0.000000   2.636000 (  2.636000)
  2.586000   0.000000   2.586000 (  2.587000)
  2.571000   0.000000   2.571000 (  2.571000)
Measure hash put time
  0.191000   0.000000   0.191000 (  0.191000)
  0.097000   0.000000   0.097000 (  0.097000)
  0.098000   0.000000   0.098000 (  0.098000)
  0.097000   0.000000   0.097000 (  0.097000)
  0.096000   0.000000   0.096000 (  0.096000)
Measure hash get time (note: not same scale as put test)
  0.296000   0.000000   0.296000 (  0.296000)
  0.229000   0.000000   0.229000 (  0.229000)
  0.229000   0.000000   0.229000 (  0.229000)
  0.230000   0.000000   0.230000 (  0.230000)
  0.230000   0.000000   0.230000 (  0.230000)
Measure string == comparison time
  0.209000   0.000000   0.209000 (  0.209000)
  0.115000   0.000000   0.115000 (  0.115000)
  0.114000   0.000000   0.114000 (  0.114000)
  0.115000   0.000000   0.115000 (  0.115000)
  0.115000   0.000000   0.115000 (  0.115000)
Measure string == comparison time, different last pos
  0.243000   0.000000   0.243000 (  0.243000)
  0.155000   0.000000   0.155000 (  0.155000)
  0.155000   0.000000   0.155000 (  0.155000)
  0.155000   0.000000   0.155000 (  0.155000)
  0.304000   0.000000   0.304000 (  0.304000)
Measure string <=> comparison time
  0.287000   0.000000   0.287000 (  0.287000)
  0.192000   0.000000   0.192000 (  0.192000)
  0.195000   0.000000   0.195000 (  0.195000)
  0.194000   0.000000   0.194000 (  0.194000)
  0.196000   0.000000   0.196000 (  0.196000)
Measure 'string'.index(fixnum) time
  0.413000   0.000000   0.413000 (  0.413000)
  0.300000   0.000000   0.300000 (  0.300000)
  0.299000   0.000000   0.299000 (  0.299000)
  0.299000   0.000000   0.299000 (  0.299000)
  0.298000   0.000000   0.298000 (  0.298000)
Measure 'string'.index(str) time
  0.531000   0.000000   0.531000 (  0.531000)
  0.411000   0.000000   0.411000 (  0.411000)
  0.411000   0.000000   0.411000 (  0.411000)
  0.410000   0.000000   0.410000 (  0.410000)
  0.411000   0.000000   0.411000 (  0.411000)
Measure 'string'.rindex(fixnum) time
  0.360000   0.000000   0.360000 (  0.360000)
  0.261000   0.000000   0.261000 (  0.262000)
  0.262000   0.000000   0.262000 (  0.262000)
  0.262000   0.000000   0.262000 (  0.262000)
  0.262000   0.000000   0.262000 (  0.262000)
Measure 'string'.rindex(str) time
  0.568000   0.000000   0.568000 (  0.568000)
  0.456000   0.000000   0.456000 (  0.457000)
  0.456000   0.000000   0.456000 (  0.456000)
  0.455000   0.000000   0.455000 (  0.455000)
  0.457000   0.000000   0.457000 (  0.458000)



More information about the mlvm-dev mailing list