Latest experiments...happiness and sadness

Charles Oliver Nutter headius at headius.com
Tue Oct 16 17:53:40 PDT 2012


Hello all!

I've recently been informed that a new Ruby implementation is about to
be announced that puts JRuby's numeric perf to shame. Boo hoo.

It's not like I expected us to retain the numeric crown since we're
still allocating objects for every number in the system, but hopefully
we can get that crown back at some point.

In an effort to start getting back to indy + perf work (with JRuby 1.7
almost released, finally), I bring you today's benchmark:

50.times { puts Benchmark.measure { f = 20.5; i = 0; while i <
2000000; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
+= 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
-= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
+= 1; end } }

So we have a 2M fixnum loop with ten float adds and ten float
subtracts. Other variations of this have more iterations and fewer
float operations or put the whole loop inside a times{} block. This
version runs in about 0.34s on hotspot-comp + Christian's patches,
which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
logic in the creation of all Ruby objects (including floats) I can get
this down to 0.29s. This is many times faster than almost all the
current Ruby implementations.

However, this new Ruby impl runs the same code in around 0.1s, so even
with everything inlining JRuby + indy + hotspot-comp + patches is
still 3x slower. I suspect Float allocation is the main bottleneck
here.

Here's logc output for one of the adds:

    @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
      @ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
        @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
      @ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
      @ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
        @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
(8 bytes)
        @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
        @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
          @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
          @ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
          @ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
(10 bytes)
            @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
              @ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
                @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
                @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
              @ 10 org.jruby.RubyFloat::newFloat (10 bytes)
                @ 6 org.jruby.RubyFloat::<init> (15 bytes)
                  @ 3 org.jruby.Ruby::getFloat (5 bytes)
                  @ 6 org.jruby.RubyNumeric::<init> (7 bytes)
                    @ 3 org.jruby.RubyObject::<init> (7 bytes)
                      @ 3 org.jruby.RubyBasicObject::<init> (30 bytes)
                        @ 1 java.lang.Object::<init> (1 bytes)

This is *great*. We're getting all paths inlined, and allocation
inlines all the way up to Object::<init>, so in theory escape analysis
could get rid of this...RIGHT? WRONG!!!

logc appears to be missing some ouput (either the tool or the
LogCompilation flag are dropping information). The same block of code
from PrintInlining:

                            @ 207
java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
inline (hot)
                              @ 1
java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
                                @ 4
java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
                              @ 14
java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
(hot)
                                @ 12   java.lang.Class::cast (27
bytes)   inline (hot)
                                  @ 6   java.lang.Class::isInstance (0
bytes)   (intrinsic)
                                @ 17
java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
inline (hot)
                                  @ 13
java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
bytes)   inline (hot)
                                  @ 26
java.lang.invoke.LambdaForm$DMH/842171382::invokeStatic_LL_I (15
bytes)   inline (hot)
                                    @ 1
java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
inline (hot)
                                    @ 11
org.jruby.runtime.invokedynamic.MathLinker::floatTest (20 bytes)
inline (hot)
                                      @ 8
org.jruby.Ruby::isFloatReopened (5 bytes)   inline (hot)
                                @ 50
java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
bytes)   inline (hot)
                                  @ 1
java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
inline (hot)
                                  @ 16
java.lang.invoke.LambdaForm$BMH/1698703785::reinvoke (32 bytes)
inline (hot)
                                    @ 13
java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget (8
bytes)   inline (hot)
                                    @ 28
java.lang.invoke.LambdaForm$DMH/590335041::invokeStatic_LLLD_L (20
bytes)   inline (hot)
                                      @ 1
java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
inline (hot)
                                      @ 16
org.jruby.runtime.invokedynamic.MathLinker::float_op_plus (10 bytes)
inline (hot)
                                        @ 6
org.jruby.RubyFloat::op_plus (14 bytes)   inline (hot)
                                          @ 1
org.jruby.RubyBasicObject::getRuntime (8 bytes)   inline (hot)
                                            @ 1
org.jruby.RubyBasicObject::getMetaClass (5 bytes)   inline (hot)
                                            @ 4
org.jruby.RubyClass::getClassRuntime (5 bytes)   inline (hot)
                                          @ 10
org.jruby.RubyFloat::newFloat (10 bytes)   inline (hot)
                                            @ 6
org.jruby.RubyFloat::<init> (15 bytes)   inline (hot)
                                              @ 3
org.jruby.Ruby::getFloat (5 bytes)   inline (hot)
                                              @ 6
org.jruby.RubyNumeric::<init> (7 bytes)   inline (hot)
                                                @ 3
org.jruby.RubyObject::<init> (7 bytes)   inline (hot)
                                                  @ 3
org.jruby.RubyBasicObject::<init> (30 bytes)   inline (hot)
                                                    @ 1
java.lang.Object::<init> (1 bytes)   inline (hot)
                                @ 76
java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
bytes)   call site not reached

So *almost* everything is inlining, but one path (I believe it's the
failure path from GWT after talking with Christian) is not reached.
Because Hotspot's EA can't do partial EA, any unfollowed paths that
would receive the allocated object have to be considered escapes, and
so anywhere we're doing guarded logic (either in indy or in Java code,
like Fixnum overflow checks) the unfollowed paths prevent EA from
happening. Boo-hoo.

At this point there's nothing I can really do. I have to guard the
call sites in case we don't see a Float at some point, and for Fixnum
overflow I have to do that boolean check in most cases. There's always
going to be unfollowed paths dangling off the edges of even our
simplest logic.

Bottom line is that the new indy stuff is starting to really look good
wrt inlining, but EA is still not up to the task of eliding
allocations in the places we need it to.

Thoughts?

- Charlie


More information about the mlvm-dev mailing list