Assembly output from JRuby 'fib'
Christian Thalinger
christian.thalinger at oracle.com
Fri Apr 29 01:11:55 PDT 2011
On Apr 28, 2011, at 4:50 PM, Charles Oliver Nutter wrote:
> On Thu, Apr 28, 2011 at 9:17 AM, Christian Thalinger
> <christian.thalinger at oracle.com> wrote:
>> I have now a patch that makes the command line switch tweaking superfluous and the default performance looks pretty good (see below, 32-bit x86).
>
> That's excellent! I can't wait to see that land.
>
> A couple notes I haven't included in other emails:
>
> * +PrintInlining doesn't seem to be working properly on any OpenJDK7
> builds I run. I have to use LogCompilation.
Today I noticed that PrintInlining is broken in product builds. I have a fix for that too.
[Update: I just saw that Vladimir already has a fix for that in his recent webrev.]
> * With +PrintInlining on, I see many lines like "discounting inlining
> depth from to". I'm guessing that's some preliminary
> depth-discounting logic in invokedynamic?
Right. John added that code and the output but I'd like to remove the output again since it's very confusing.
>
> I've been running with the macosx port builds lately, and they seem to
> be fairly up-to-date and working very well for me. I'm not sure how
> far behind mainstream or bsd-port openjdk, but hopefully development
> on indy is starting to coalesce back toward a single codebase. Perhaps
> we won't need Stephen's builds soon :)
Lately bsd-port is much more frequently updated. I'm always working on mainstream.
>
>> Charlie, what benchmark could I use for more real world application numbers? bench_string_ops.rb sounds and looks promising. With bench_string_ops.rb I see that dynopt performance isn't always better than "normal" (whatever normal is). I guess that is expected?
>
> My theory about why dynopt degrades perf sometimes:
>
> The biggest problem with dynopt (I think) is that it easily doubles
> the in-body bytecode count for dynamic invocations. Compare:
>
> https://gist.github.com/946485
>
> Ruby is such a terse language, we frequently see very large code
> bodies...which compounds the bloating effect of dynopt. Invokedynamic
> (or simple JRuby CallSite) invocation will often perform better than
> dynopt simply because there's so much more bytecode being generated.
> Logical?
Sounds logical.
>
> bench_string_ops is good. bench_richards would test polymorphic
> dispatch, which is currently *very* slow in the JRuby indy logic
> (recreating the full MH chain *every time*. I may talk to Tom Enebo
> about this today...ideally we'd see no cases where indy comes in
> slower than CallSite dispatching, but it would be good to start using
> real-world benchmarks rather than fib to measure this (especially now
> that we've achieved close to a theoretical maximum in dispatch
> performance).
I agree. And the numbers for bench_string_ops look really good (see below).
(Sorry, it's a bit difficult to read when the numbers are pasted that way but I'm too lazy to calculate performance improvement in percent :-)
But the gist is that invokedynamic is faster on all benchmarks than dynopt (except the first one which is equally fast on all three configurations) and a little slower on the last three than normal CallSite dispatching. And maybe that small slowdown goes away with the inlining guard.
But I'm very excited about that result!
-- Christian
$ bin/jruby.sh --server -Xcompile.dynopt=false bench/bench_string_ops.rb
Measure string array sort time
2.793000 0.000000 2.793000 ( 2.761000)
2.645000 0.000000 2.645000 ( 2.644000)
2.650000 0.000000 2.650000 ( 2.649000)
2.632000 0.000000 2.632000 ( 2.632000)
2.622000 0.000000 2.622000 ( 2.622000)
Measure hash put time
0.227000 0.000000 0.227000 ( 0.227000)
0.102000 0.000000 0.102000 ( 0.102000)
0.103000 0.000000 0.103000 ( 0.103000)
0.101000 0.000000 0.101000 ( 0.101000)
0.101000 0.000000 0.101000 ( 0.101000)
Measure hash get time (note: not same scale as put test)
0.366000 0.000000 0.366000 ( 0.366000)
0.301000 0.000000 0.301000 ( 0.301000)
0.302000 0.000000 0.302000 ( 0.302000)
0.304000 0.000000 0.304000 ( 0.304000)
0.302000 0.000000 0.302000 ( 0.302000)
Measure string == comparison time
0.299000 0.000000 0.299000 ( 0.299000)
0.199000 0.000000 0.199000 ( 0.199000)
0.196000 0.000000 0.196000 ( 0.196000)
0.196000 0.000000 0.196000 ( 0.197000)
0.197000 0.000000 0.197000 ( 0.197000)
Measure string == comparison time, different last pos
0.329000 0.000000 0.329000 ( 0.329000)
0.240000 0.000000 0.240000 ( 0.240000)
0.237000 0.000000 0.237000 ( 0.237000)
0.239000 0.000000 0.239000 ( 0.239000)
0.407000 0.000000 0.407000 ( 0.407000)
Measure string <=> comparison time
0.408000 0.000000 0.408000 ( 0.408000)
0.351000 0.000000 0.351000 ( 0.351000)
0.351000 0.000000 0.351000 ( 0.351000)
0.349000 0.000000 0.349000 ( 0.349000)
0.349000 0.000000 0.349000 ( 0.349000)
Measure 'string'.index(fixnum) time
0.359000 0.000000 0.359000 ( 0.358000)
0.299000 0.000000 0.299000 ( 0.299000)
0.299000 0.000000 0.299000 ( 0.299000)
0.299000 0.000000 0.299000 ( 0.299000)
0.299000 0.000000 0.299000 ( 0.299000)
Measure 'string'.index(str) time
0.472000 0.000000 0.472000 ( 0.473000)
0.406000 0.000000 0.406000 ( 0.406000)
0.405000 0.000000 0.405000 ( 0.405000)
0.406000 0.000000 0.406000 ( 0.406000)
0.405000 0.000000 0.405000 ( 0.405000)
Measure 'string'.rindex(fixnum) time
0.304000 0.000000 0.304000 ( 0.304000)
0.254000 0.000000 0.254000 ( 0.254000)
0.254000 0.000000 0.254000 ( 0.254000)
0.254000 0.000000 0.254000 ( 0.254000)
0.255000 0.000000 0.255000 ( 0.255000)
Measure 'string'.rindex(str) time
0.511000 0.000000 0.511000 ( 0.511000)
0.447000 0.000000 0.447000 ( 0.447000)
0.447000 0.000000 0.447000 ( 0.447000)
0.447000 0.000000 0.447000 ( 0.447000)
0.447000 0.000000 0.447000 ( 0.447000)
$ bin/jruby.sh --server -Xcompile.dynopt=true bench/bench_string_ops.rb
Measure string array sort time
2.745000 0.000000 2.745000 ( 2.710000)
2.626000 0.000000 2.626000 ( 2.626000)
2.631000 0.000000 2.631000 ( 2.632000)
2.637000 0.000000 2.637000 ( 2.637000)
2.630000 0.000000 2.630000 ( 2.630000)
Measure hash put time
0.212000 0.000000 0.212000 ( 0.212000)
0.129000 0.000000 0.129000 ( 0.129000)
0.128000 0.000000 0.128000 ( 0.128000)
0.127000 0.000000 0.127000 ( 0.127000)
0.126000 0.000000 0.126000 ( 0.126000)
Measure hash get time (note: not same scale as put test)
0.551000 0.000000 0.551000 ( 0.551000)
0.538000 0.000000 0.538000 ( 0.538000)
0.540000 0.000000 0.540000 ( 0.540000)
0.538000 0.000000 0.538000 ( 0.538000)
0.535000 0.000000 0.535000 ( 0.536000)
Measure string == comparison time
0.530000 0.000000 0.530000 ( 0.530000)
0.526000 0.000000 0.526000 ( 0.526000)
0.526000 0.000000 0.526000 ( 0.526000)
0.527000 0.000000 0.527000 ( 0.527000)
0.528000 0.000000 0.528000 ( 0.528000)
Measure string == comparison time, different last pos
0.561000 0.000000 0.561000 ( 0.560000)
0.568000 0.000000 0.568000 ( 0.569000)
0.569000 0.000000 0.569000 ( 0.569000)
0.570000 0.000000 0.570000 ( 0.570000)
0.732000 0.000000 0.732000 ( 0.732000)
Measure string <=> comparison time
0.525000 0.000000 0.525000 ( 0.525000)
0.511000 0.000000 0.511000 ( 0.511000)
0.508000 0.000000 0.508000 ( 0.508000)
0.508000 0.000000 0.508000 ( 0.508000)
0.508000 0.000000 0.508000 ( 0.508000)
Measure 'string'.index(fixnum) time
0.418000 0.000000 0.418000 ( 0.418000)
0.390000 0.000000 0.390000 ( 0.390000)
0.392000 0.000000 0.392000 ( 0.392000)
0.391000 0.000000 0.391000 ( 0.391000)
0.390000 0.000000 0.390000 ( 0.390000)
Measure 'string'.index(str) time
0.582000 0.000000 0.582000 ( 0.583000)
0.559000 0.000000 0.559000 ( 0.560000)
0.561000 0.000000 0.561000 ( 0.561000)
0.558000 0.000000 0.558000 ( 0.558000)
0.559000 0.000000 0.559000 ( 0.559000)
Measure 'string'.rindex(fixnum) time
0.375000 0.000000 0.375000 ( 0.376000)
0.361000 0.000000 0.361000 ( 0.361000)
0.360000 0.000000 0.360000 ( 0.360000)
0.361000 0.000000 0.361000 ( 0.361000)
0.361000 0.000000 0.361000 ( 0.361000)
Measure 'string'.rindex(str) time
0.622000 0.000000 0.622000 ( 0.622000)
0.604000 0.000000 0.604000 ( 0.604000)
0.605000 0.000000 0.605000 ( 0.605000)
0.605000 0.000000 0.605000 ( 0.605000)
0.603000 0.000000 0.603000 ( 0.603000)
$ bin/jruby.sh --server -Xcompile.invokedynamic=true bench/bench_string_ops.rb
Measure string array sort time
2.743000 0.000000 2.743000 ( 2.712000)
2.616000 0.000000 2.616000 ( 2.616000)
2.636000 0.000000 2.636000 ( 2.636000)
2.586000 0.000000 2.586000 ( 2.587000)
2.571000 0.000000 2.571000 ( 2.571000)
Measure hash put time
0.191000 0.000000 0.191000 ( 0.191000)
0.097000 0.000000 0.097000 ( 0.097000)
0.098000 0.000000 0.098000 ( 0.098000)
0.097000 0.000000 0.097000 ( 0.097000)
0.096000 0.000000 0.096000 ( 0.096000)
Measure hash get time (note: not same scale as put test)
0.296000 0.000000 0.296000 ( 0.296000)
0.229000 0.000000 0.229000 ( 0.229000)
0.229000 0.000000 0.229000 ( 0.229000)
0.230000 0.000000 0.230000 ( 0.230000)
0.230000 0.000000 0.230000 ( 0.230000)
Measure string == comparison time
0.209000 0.000000 0.209000 ( 0.209000)
0.115000 0.000000 0.115000 ( 0.115000)
0.114000 0.000000 0.114000 ( 0.114000)
0.115000 0.000000 0.115000 ( 0.115000)
0.115000 0.000000 0.115000 ( 0.115000)
Measure string == comparison time, different last pos
0.243000 0.000000 0.243000 ( 0.243000)
0.155000 0.000000 0.155000 ( 0.155000)
0.155000 0.000000 0.155000 ( 0.155000)
0.155000 0.000000 0.155000 ( 0.155000)
0.304000 0.000000 0.304000 ( 0.304000)
Measure string <=> comparison time
0.287000 0.000000 0.287000 ( 0.287000)
0.192000 0.000000 0.192000 ( 0.192000)
0.195000 0.000000 0.195000 ( 0.195000)
0.194000 0.000000 0.194000 ( 0.194000)
0.196000 0.000000 0.196000 ( 0.196000)
Measure 'string'.index(fixnum) time
0.413000 0.000000 0.413000 ( 0.413000)
0.300000 0.000000 0.300000 ( 0.300000)
0.299000 0.000000 0.299000 ( 0.299000)
0.299000 0.000000 0.299000 ( 0.299000)
0.298000 0.000000 0.298000 ( 0.298000)
Measure 'string'.index(str) time
0.531000 0.000000 0.531000 ( 0.531000)
0.411000 0.000000 0.411000 ( 0.411000)
0.411000 0.000000 0.411000 ( 0.411000)
0.410000 0.000000 0.410000 ( 0.410000)
0.411000 0.000000 0.411000 ( 0.411000)
Measure 'string'.rindex(fixnum) time
0.360000 0.000000 0.360000 ( 0.360000)
0.261000 0.000000 0.261000 ( 0.262000)
0.262000 0.000000 0.262000 ( 0.262000)
0.262000 0.000000 0.262000 ( 0.262000)
0.262000 0.000000 0.262000 ( 0.262000)
Measure 'string'.rindex(str) time
0.568000 0.000000 0.568000 ( 0.568000)
0.456000 0.000000 0.456000 ( 0.457000)
0.456000 0.000000 0.456000 ( 0.456000)
0.455000 0.000000 0.455000 ( 0.455000)
0.457000 0.000000 0.457000 ( 0.458000)
More information about the mlvm-dev
mailing list