GWT implementation

Tue Jun 14 20:26:22 PDT 2011

On Tue, Jun 14, 2011 at 5:21 PM, John Rose <john.r.rose at oracle.com> wrote:
> Charlie, thanks for being our Chief Test Victim Subject on performance issues.  I have a question for you (and for the list in general).
>
> Since 6/03 the following patch in the mlvm queue has set GWT to use ricochet frames by default:
>  http://hg.openjdk.java.net/mlvm/mlvm/jdk/file/tip/meth-experiment.patch
>
> Have you noticed a performance or functionality difference between the two experimental settings?

Hard to say...as I mentioned in my other update email, I found an
unrelated bottleneck in JRuby's object construction that has sped up
most stuff quite a bit.

So...let's have a look!

> The settings are this:
>  -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
> versus this:
>  -Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
>
> (The default is true, not false, since 6/03.)
>
> In order to determine if this experimental patch is in your build, just execute this command to test for the symbol:
>  $ fgrep -l GWT_FORCE_RICOCHET_FRAMES $JAVA_HOME/jre/lib/rt.jar

~/projects/jruby ➔ fgrep -l GWT_FORCE_RICOCHET_FRAMES $JAVA_HOME/jre/lib/rt.jar
/Library/Java/JavaVirtualMachines/1.7.0.mlvm.x86_64/Contents/Home/jre/lib/rt.jar

Simplest benchmark I have, GWT guarding a bunch of dispatches that
just return an incoming argument. Return value is unused, so this
generally melts away to just GWT + test overhead.

~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
bench/language/bench_method_dispatch_only.rb 5
                                              user     system
total        real
10M loop                                  0.408000   0.000000
0.408000 (  0.343000)
10M loop calling self's foo 10 times      3.764000   0.000000
3.764000 (  3.764000)
                                              user     system
total        real
10M loop                                  0.179000   0.000000
0.179000 (  0.179000)
10M loop calling self's foo 10 times      3.535000   0.000000
3.535000 (  3.535000)
                                              user     system
total        real
10M loop                                  0.179000   0.000000
0.179000 (  0.179000)
10M loop calling self's foo 10 times      0.336000   0.000000
0.336000 (  0.336000)
                                              user     system
total        real
10M loop                                  0.184000   0.000000
0.184000 (  0.184000)
10M loop calling self's foo 10 times      0.339000   0.000000
0.339000 (  0.339000)
                                              user     system
total        real
10M loop                                  0.176000   0.000000
0.176000 (  0.176000)
10M loop calling self's foo 10 times      0.357000   0.000000
0.357000 (  0.357000)

~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
bench/language/bench_method_dispatch_only.rb 5
                                              user     system
total        real
10M loop                                  0.405000   0.000000
0.405000 (  0.343000)
10M loop calling self's foo 10 times      0.616000   0.000000
0.616000 (  0.616000)
                                              user     system
total        real
10M loop                                  0.178000   0.000000
0.178000 (  0.178000)
10M loop calling self's foo 10 times      0.449000   0.000000
0.449000 (  0.449000)
                                              user     system
total        real
10M loop                                  0.187000   0.000000
0.187000 (  0.187000)
10M loop calling self's foo 10 times      0.340000   0.000000
0.340000 (  0.340000)
                                              user     system
total        real
10M loop                                  0.190000   0.000000
0.190000 (  0.190000)
10M loop calling self's foo 10 times      0.341000   0.000000
0.341000 (  0.341000)
                                              user     system
total        real
10M loop                                  0.179000   0.000000
0.179000 (  0.179000)
10M loop calling self's foo 10 times      0.338000   0.000000
0.338000 (  0.338000)

Interesting factoid here is that ricochet seems to be responsible for
a *drastic* delay in optimization, and then is a bit faster on average
after that.

Here's fib:

~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=true
bench/bench_fib_recursive.rb 5 35
9227465
  0.982000   0.000000   0.982000 (  0.870000)
9227465
  0.739000   0.000000   0.739000 (  0.739000)
9227465
  0.748000   0.000000   0.748000 (  0.748000)
9227465
  0.740000   0.000000   0.740000 (  0.740000)
9227465
  0.733000   0.000000   0.733000 (  0.733000)

~/projects/jruby ➔ jruby
-J-Djava.lang.invoke.GWT_FORCE_RICOCHET_FRAMES=false
bench/bench_fib_recursive.rb 5 35
9227465
  0.905000   0.000000   0.905000 (  0.856000)
9227465
  0.746000   0.000000   0.746000 (  0.746000)
9227465
  0.747000   0.000000   0.747000 (  0.747000)
9227465
  0.769000   0.000000   0.769000 (  0.769000)
9227465
  0.763000   0.000000   0.763000 (  0.763000)

Faster again, and no apparent delay in optimization.

Java invocation (only indy-relevant bits pulled out:

WITH RICOCHET:
Measure System.currentTimeMillis, int becoming Fixnum
  0.147000   0.000000   0.147000 (  0.147000)
  0.103000   0.000000   0.103000 (  0.103000)
  0.098000   0.000000   0.098000 (  0.098000)
  0.119000   0.000000   0.119000 (  0.119000)
  0.127000   0.000000   0.127000 (  0.128000)
Measure java.lang.Thread#name, String entering Ruby
  0.512000   0.000000   0.512000 (  0.512000)
  0.265000   0.000000   0.265000 (  0.265000)
  0.272000   0.000000   0.272000 (  0.272000)
  0.274000   0.000000   0.274000 (  0.274000)
  0.286000   0.000000   0.286000 (  0.287000)

WITHOUT:
Measure System.currentTimeMillis, int becoming Fixnum
  0.130000   0.000000   0.130000 (  0.130000)
  0.108000   0.000000   0.108000 (  0.108000)
  0.103000   0.000000   0.103000 (  0.103000)
  0.128000   0.000000   0.128000 (  0.128000)
  0.107000   0.000000   0.107000 (  0.107000)
Measure java.lang.Thread#name, String entering Ruby
  0.444000   0.000000   0.444000 (  0.444000)
  0.240000   0.000000   0.240000 (  0.240000)
  0.251000   0.000000   0.251000 (  0.251000)
  0.239000   0.000000   0.239000 (  0.239000)
  0.246000   0.000000   0.246000 (  0.246000)

Probably a toss-up, but WITHOUT seems like it might be slightly faster here.

Most other benchmarks I try are probably too close to call. Is there
another way to know if the ricochet stuff is getting turned on, so I
can be sure I'm actually measuring it?

> Here is a question for this list:  Who has used the mlvm builds since 6/04 to do testing (performance or functionality) that involves guardWithTest?

I have definitely been testing with builds later than 6/04. I think my
current build is from Friday night when I saw you land some fixes.

Christian also had me run a few things through a 32-bit Linux OpenJDK
trunk build he built on Friday and things seemed ok.

- Charlie