More performance explorations
Charles Oliver Nutter
headius at headius.com
Sun Jun 5 03:16:02 PDT 2011
On Sat, Jun 4, 2011 at 2:05 AM, John Rose <john.r.rose at oracle.com> wrote:
> One big answer is that pre-RF code was building such things routinely, in order to normalize signatures down to a few equivalence classes (arity only). But post-RF code doesn't need to do that. I found a few places in MethodHandleImpl.java where needless asType calls were issued in order to normalize signatures. I changed those to an internal equivalent to explicitCastArgs, and pushed it.
With an updated build, method_dispatch benchmark does show a reduction
in checkcasts! Compare before and after...3 checkcasts versus 7:
https://gist.github.com/d32bbbb22af14ce56520
That's good progress! I'd be curious what else in that asm dump you
folks think could be eliminated. Given the updated build, we're now
faster than a (temporarily monomorphic) JRuby CachingCallSite IC
setup, but a fair margin amount slower than dynopt. My next task is
going to make sure dynopt isn't taking any shortcuts that the
invokedynamic logic doesn't take (it may be skipping some guards since
it is an experimental flag). Here's fib and method_dispatch numbers,
plus a bonus segfault when running tak!!!
FIB
~/projects/jruby ➔ jruby bench/bench_fib_recursive.rb 5 35
9227465
1.181000 0.000000 1.181000 ( 1.113000)
9227465
0.822000 0.000000 0.822000 ( 0.821000)
9227465
0.805000 0.000000 0.805000 ( 0.805000)
9227465
0.812000 0.000000 0.812000 ( 0.812000)
9227465
0.823000 0.000000 0.823000 ( 0.823000)
~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
bench/bench_fib_recursive.rb 5 35
9227465
1.232000 0.000000 1.232000 ( 1.174000)
9227465
0.958000 0.000000 0.958000 ( 0.958000)
9227465
0.853000 0.000000 0.853000 ( 0.853000)
9227465
0.869000 0.000000 0.869000 ( 0.869000)
9227465
0.854000 0.000000 0.854000 ( 0.854000)
~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
-Xcompile.dynopt=true bench/bench_fib_recursive.rb 5 35
9227465
1.156000 0.000000 1.156000 ( 1.076000)
9227465
0.923000 0.000000 0.923000 ( 0.923000)
9227465
0.714000 0.000000 0.714000 ( 0.715000)
9227465
0.884000 0.000000 0.884000 ( 0.884000)
9227465
0.722000 0.000000 0.722000 ( 0.722000)
METHOD DISPATCH
Note 1: dynopt fails badly in this case, probably because it generates
substantially more bytecode than JRuby CallSite-based logic (and *way*
more than invokedynamic logic. This is where dynopt fails, and it's
why I have not enabled it yet...the near-doubling of bytecode emitted
for every call site damages most Ruby methods terribly.
Note 2: Why does invokedynamic take so much longer to optimize on this
benchmark? It's not until the third iter that perf settles If this is
interesting I can flip on compilation logging to see when and why it's
finally able to optimize.
~/projects/jruby ➔ jruby bench/language/bench_method_dispatch_only.rb 5
user system
total real
10M loop 0.560000 0.000000
0.560000 ( 0.480000)
10M loop calling self's foo 10 times 5.678000 0.000000
5.678000 ( 5.678000)
user system
total real
10M loop 0.314000 0.000000
0.314000 ( 0.314000)
10M loop calling self's foo 10 times 5.190000 0.000000
5.190000 ( 5.190000)
user system
total real
10M loop 0.341000 0.000000
0.341000 ( 0.341000)
10M loop calling self's foo 10 times 0.760000 0.000000
0.760000 ( 0.760000)
user system
total real
10M loop 0.358000 0.000000
0.358000 ( 0.358000)
10M loop calling self's foo 10 times 0.746000 0.000000
0.746000 ( 0.746000)
user system
total real
10M loop 0.331000 0.000000
0.331000 ( 0.331000)
10M loop calling self's foo 10 times 0.874000 0.000000
0.874000 ( 0.874000)
~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
bench/language/bench_method_dispatch_only.rb 5
user system
total real
10M loop 0.597000 0.000000
0.597000 ( 0.503000)
10M loop calling self's foo 10 times 1.292000 0.000000
1.292000 ( 1.291000)
user system
total real
10M loop 0.328000 0.000000
0.328000 ( 0.328000)
10M loop calling self's foo 10 times 1.129000 0.000000
1.129000 ( 1.128000)
user system
total real
10M loop 0.336000 0.000000
0.336000 ( 0.336000)
10M loop calling self's foo 10 times 1.163000 0.000000
1.163000 ( 1.163000)
user system
total real
10M loop 0.339000 0.000000
0.339000 ( 0.338000)
10M loop calling self's foo 10 times 1.185000 0.000000
1.185000 ( 1.185000)
user system
total real
10M loop 0.366000 0.000000
0.366000 ( 0.365000)
10M loop calling self's foo 10 times 1.098000 0.000000
1.098000 ( 1.098000)
~/projects/jruby ➔ jruby -Xcompile.invokedynamic=false
-Xcompile.dynopt=true bench/language/bench_method_dispatch_only.rb 5
user system
total real
10M loop 1.017000 0.000000
1.017000 ( 0.893000)
10M loop calling self's foo 10 times 3.701000 0.000000
3.701000 ( 3.701000)
user system
total real
10M loop 0.655000 0.000000
0.655000 ( 0.656000)
10M loop calling self's foo 10 times 3.715000 0.000000
3.715000 ( 3.715000)
user system
total real
10M loop 0.540000 0.000000
0.540000 ( 0.540000)
10M loop calling self's foo 10 times 4.561000 0.000000
4.561000 ( 4.561000)
user system
total real
10M loop 0.542000 0.000000
0.542000 ( 0.542000)
10M loop calling self's foo 10 times 3.839000 0.000000
3.839000 ( 3.839000)
user system
total real
10M loop 0.676000 0.000000
0.676000 ( 0.677000)
10M loop calling self's foo 10 times 4.343000 0.000000
4.343000 ( 4.344000)
And finally, the aforementioned TAK SEGFAULT
~/projects/jruby ➔ jruby bench/bench_tak.rb 5 user system
total real
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00000001010574f4, pid=64715, tid=4299165696
#
# JRE version: 7.0
# Java VM: OpenJDK 64-Bit Server VM (21.0-b14 mixed mode bsd-amd64
compressed oops)
# Problematic frame:
# V [libjvm.dylib+0x574f4]
methodOopDesc::validate_bci_from_bcx(long) const+0x24
#
# Failed to write core dump. Core dumps have been disabled. To enable
core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/headius/projects/jruby/hs_err_pid64715.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
Abort trap
Log is attached.
- Charlie
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid64715.log
Type: application/octet-stream
Size: 10679 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20110605/4de2a094/attachment-0001.obj
More information about the mlvm-dev
mailing list