More performance explorations
Charles Oliver Nutter
headius at headius.com
Wed May 25 23:34:54 PDT 2011
Ok, here we go with the macosx build from 5/13. Performance is
*substantially* better.
First tak:
user system total real
1.401000 0.000000 1.401000 ( 0.821000)
0.552000 0.000000 0.552000 ( 0.552000)
0.561000 0.000000 0.561000 ( 0.561000)
0.552000 0.000000 0.552000 ( 0.552000)
0.553000 0.000000 0.553000 ( 0.553000)
Same JRuby logic, earlier build, 2-4x faster than current MLVM invokedynamic.
Now fib:
9227465
0.979000 0.000000 0.979000 ( 0.922000)
9227465
0.848000 0.000000 0.848000 ( 0.848000)
9227465
0.796000 0.000000 0.796000 ( 0.796000)
9227465
0.792000 0.000000 0.792000 ( 0.792000)
9227465
0.786000 0.000000 0.786000 ( 0.787000)
The margin is not as great here, but it's easily 20% faster than even
the reverted GWT (no idea bout the new GWT logic yet).
I can provide assembly dumps and other logs from both builds on
request. Where shall we start?
Disclaimer: I know optimizing for simple cases like fib and tak is not
a great idea, but it seems like if we can't make them fast we're going
to have trouble with a lot of other stuff. I will endeavor to get
numbers for less synthetic benchmarks too.
- Charlie
On Thu, May 26, 2011 at 12:33 AM, Charles Oliver Nutter
<headius at headius.com> wrote:
> Ok, onward with perf exploration, folks!
>
> I'm running with mostly-current MLVM, with John's temporary reversion
> of GWT to the older non-ricochet logic.
>
> As reported before, "fib" has improved with the reversion, but it's
> only marginally faster than JRuby's inline caching logic and easily
> 30-40% slower than it was in builds from earlier this month.
>
> I also decided to run "tak", which is another dispatch and
> recursion-heavy benchmark. This still seems to have a perf
> degradation.
>
> Here's with standard settings, current MLVM, amd64:
>
>
> ~/projects/jruby ➔ jruby --server bench/bench_tak.rb 5
> user system total real
> 2.443000 0.000000 2.443000 ( 2.383000)
> 1.985000 0.000000 1.985000 ( 1.985000)
> 2.007000 0.000000 2.007000 ( 2.007000)
> 1.987000 0.000000 1.987000 ( 1.987000)
> 1.991000 0.000000 1.991000 ( 1.991000)
>
> Here is with JRuby's inline caching. Given that tak is an arity three
> method, it's likely that the usually megamorphic inline cache is still
> monomorphic, so things are inlining through it when they wouldn't
> normally:
>
> ~/projects/jruby ➔ jruby --server -Xcompile.invokedynamic=false
> bench/bench_tak.rb 5
> user system total real
> 1.565000 0.000000 1.565000 ( 1.510000)
> 0.624000 0.000000 0.624000 ( 0.624000)
> 0.624000 0.000000 0.624000 ( 0.624000)
> 0.624000 0.000000 0.624000 ( 0.624000)
> 0.632000 0.000000 0.632000 ( 0.632000)
>
> Oddly enough, modifying the benchmark to guarantee there's at least
> three different method calls of arity 3 does not appear to degrade
> this benchmark...
>
> Moving on to dynopt (reminder: this emits two invocations at compile
> time, one a guarded invokevirtual or invokestatic and the other a
> normal CachingCallSite.call):
>
> ~/projects/jruby ➔ jruby --server -Xcompile.invokedynamic=false
> -Xcompile.dynopt=true bench/bench_tak.rb 5
> user system total real
> 0.703000 0.000000 0.703000 ( 0.630000)
> 0.514000 0.000000 0.514000 ( 0.514000)
> 0.511000 0.000000 0.511000 ( 0.511000)
> 0.512000 0.000000 0.512000 ( 0.512000)
> 0.510000 0.000000 0.510000 ( 0.510000)
>
> This is the "ideal" for invokedynamic, which hopefully should inline
> as well as this guarded direct invocation (right?).
>
> Now, it gets a bit more interesting. If I turn recursive inlining down
> to zero and use invokedynamic:
>
> ~/projects/jruby ➔ jruby --server -J-XX:MaxRecursiveInlineLevel=0
> bench/bench_tak.rb 5
> user system total real
> 1.010000 0.000000 1.010000 ( 0.954000)
> 0.869000 0.000000 0.869000 ( 0.869000)
> 0.870000 0.000000 0.870000 ( 0.870000)
> 0.869000 0.000000 0.869000 ( 0.869000)
> 0.870000 0.000000 0.870000 ( 0.870000)
>
> Performance is easily 2x what it is with stock inlining settings.
> Something about invokedynamic or the MH chain is changing the
> characteristics of inlining in a way different from dynopt.
>
> So what looks interesting here? For which combination would you be
> interested in seeing logs?
>
> FWIW, I am pulling earlier builds now to try out fib and tak and get
> assembly output from them.
>
> - Charlie
>
More information about the mlvm-dev
mailing list