GWT implementation
Charles Oliver Nutter
headius at headius.com
Tue Jun 14 20:26:22 PDT 2011
On Tue, Jun 14, 2011 at 5:21 PM, John Rose <john.r.rose at oracle.com> wrote:
> Charlie, thanks for being our Chief Test Victim Subject on performance issues. I have a question for you (and for the list in general).
>
> Since 6/03 the following patch in the mlvm queue has set GWT to use ricochet frames by default:
> http://hg.openjdk.java.net/mlvm/mlvm/jdk/file/tip/meth-experiment.patch
>
> Have you noticed a performance or functionality difference between the two experimental settings?
Hard to say...as I mentioned in my other update email, I found an
unrelated bottleneck in JRuby's object construction that has sped up
most stuff quite a bit.
So...let's have a look!
> The settings are this:
> -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
> versus this:
> -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
>
> (The default is true, not false, since 6/03.)
>
> In order to determine if this experimental patch is in your build, just execute this command to test for the symbol:
> $ fgrep -l GWT_FORCE_RICOCHET_FRAMES $JAVA_HOME/jre/lib/rt.jar
~/projects/jruby ➔ fgrep -l GWT_FORCE_RICOCHET_FRAMES $JAVA_HOME/jre/lib/rt.jar
/Library/Java/JavaVirtualMachines/1.7.0.mlvm.x86_64/Contents/Home/jre/lib/rt.jar
Simplest benchmark I have, GWT guarding a bunch of dispatches that
just return an incoming argument. Return value is unused, so this
generally melts away to just GWT + test overhead.
~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
bench/language/bench_method_dispatch_only.rb 5
user system
total real
10M loop 0.408000 0.000000
0.408000 ( 0.343000)
10M loop calling self's foo 10 times 3.764000 0.000000
3.764000 ( 3.764000)
user system
total real
10M loop 0.179000 0.000000
0.179000 ( 0.179000)
10M loop calling self's foo 10 times 3.535000 0.000000
3.535000 ( 3.535000)
user system
total real
10M loop 0.179000 0.000000
0.179000 ( 0.179000)
10M loop calling self's foo 10 times 0.336000 0.000000
0.336000 ( 0.336000)
user system
total real
10M loop 0.184000 0.000000
0.184000 ( 0.184000)
10M loop calling self's foo 10 times 0.339000 0.000000
0.339000 ( 0.339000)
user system
total real
10M loop 0.176000 0.000000
0.176000 ( 0.176000)
10M loop calling self's foo 10 times 0.357000 0.000000
0.357000 ( 0.357000)
~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
bench/language/bench_method_dispatch_only.rb 5
user system
total real
10M loop 0.405000 0.000000
0.405000 ( 0.343000)
10M loop calling self's foo 10 times 0.616000 0.000000
0.616000 ( 0.616000)
user system
total real
10M loop 0.178000 0.000000
0.178000 ( 0.178000)
10M loop calling self's foo 10 times 0.449000 0.000000
0.449000 ( 0.449000)
user system
total real
10M loop 0.187000 0.000000
0.187000 ( 0.187000)
10M loop calling self's foo 10 times 0.340000 0.000000
0.340000 ( 0.340000)
user system
total real
10M loop 0.190000 0.000000
0.190000 ( 0.190000)
10M loop calling self's foo 10 times 0.341000 0.000000
0.341000 ( 0.341000)
user system
total real
10M loop 0.179000 0.000000
0.179000 ( 0.179000)
10M loop calling self's foo 10 times 0.338000 0.000000
0.338000 ( 0.338000)
Interesting factoid here is that ricochet seems to be responsible for
a *drastic* delay in optimization, and then is a bit faster on average
after that.
Here's fib:
~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
bench/bench_fib_recursive.rb 5 35
9227465
0.982000 0.000000 0.982000 ( 0.870000)
9227465
0.739000 0.000000 0.739000 ( 0.739000)
9227465
0.748000 0.000000 0.748000 ( 0.748000)
9227465
0.740000 0.000000 0.740000 ( 0.740000)
9227465
0.733000 0.000000 0.733000 ( 0.733000)
~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
bench/bench_fib_recursive.rb 5 35
9227465
0.905000 0.000000 0.905000 ( 0.856000)
9227465
0.746000 0.000000 0.746000 ( 0.746000)
9227465
0.747000 0.000000 0.747000 ( 0.747000)
9227465
0.769000 0.000000 0.769000 ( 0.769000)
9227465
0.763000 0.000000 0.763000 ( 0.763000)
Faster again, and no apparent delay in optimization.
Java invocation (only indy-relevant bits pulled out:
WITH RICOCHET:
Measure System.currentTimeMillis, int becoming Fixnum
0.147000 0.000000 0.147000 ( 0.147000)
0.103000 0.000000 0.103000 ( 0.103000)
0.098000 0.000000 0.098000 ( 0.098000)
0.119000 0.000000 0.119000 ( 0.119000)
0.127000 0.000000 0.127000 ( 0.128000)
Measure java.lang.Thread#name, String entering Ruby
0.512000 0.000000 0.512000 ( 0.512000)
0.265000 0.000000 0.265000 ( 0.265000)
0.272000 0.000000 0.272000 ( 0.272000)
0.274000 0.000000 0.274000 ( 0.274000)
0.286000 0.000000 0.286000 ( 0.287000)
WITHOUT:
Measure System.currentTimeMillis, int becoming Fixnum
0.130000 0.000000 0.130000 ( 0.130000)
0.108000 0.000000 0.108000 ( 0.108000)
0.103000 0.000000 0.103000 ( 0.103000)
0.128000 0.000000 0.128000 ( 0.128000)
0.107000 0.000000 0.107000 ( 0.107000)
Measure java.lang.Thread#name, String entering Ruby
0.444000 0.000000 0.444000 ( 0.444000)
0.240000 0.000000 0.240000 ( 0.240000)
0.251000 0.000000 0.251000 ( 0.251000)
0.239000 0.000000 0.239000 ( 0.239000)
0.246000 0.000000 0.246000 ( 0.246000)
Probably a toss-up, but WITHOUT seems like it might be slightly faster here.
Most other benchmarks I try are probably too close to call. Is there
another way to know if the ricochet stuff is getting turned on, so I
can be sure I'm actually measuring it?
> Here is a question for this list: Who has used the mlvm builds since 6/04 to do testing (performance or functionality) that involves guardWithTest?
I have definitely been testing with builds later than 6/04. I think my
current build is from Friday night when I saw you land some fixes.
Christian also had me run a few things through a 32-bit Linux OpenJDK
trunk build he built on Friday and things seemed ok.
- Charlie
More information about the mlvm-dev
mailing list