Latest experiments...happiness and sadness

Ben Evans benjamin.john.evans at gmail.com
Wed Oct 17 01:03:07 PDT 2012


Hi Charlie,

Can you send us a decent link or two once it actually does drop. I'm
not much of a Ruby head generally, but would like to see the numbers
(and, of course, take a quick look at their testing / benching
methodology).

Thanks,

Ben

On Wed, Oct 17, 2012 at 1:53 AM, Charles Oliver Nutter
<headius at headius.com> wrote:
> Hello all!
>
> I've recently been informed that a new Ruby implementation is about to
> be announced that puts JRuby's numeric perf to shame. Boo hoo.
>
> It's not like I expected us to retain the numeric crown since we're
> still allocating objects for every number in the system, but hopefully
> we can get that crown back at some point.
>
> In an effort to start getting back to indy + perf work (with JRuby 1.7
> almost released, finally), I bring you today's benchmark:
>
> 50.times { puts Benchmark.measure { f = 20.5; i = 0; while i <
> 2000000; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
> += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
> -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
> += 1; end } }
>
> So we have a 2M fixnum loop with ten float adds and ten float
> subtracts. Other variations of this have more iterations and fewer
> float operations or put the whole loop inside a times{} block. This
> version runs in about 0.34s on hotspot-comp + Christian's patches,
> which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
> logic in the creation of all Ruby objects (including floats) I can get
> this down to 0.29s. This is many times faster than almost all the
> current Ruby implementations.
>
> However, this new Ruby impl runs the same code in around 0.1s, so even
> with everything inlining JRuby + indy + hotspot-comp + patches is
> still 3x slower. I suspect Float allocation is the main bottleneck
> here.
>
> Here's logc output for one of the adds:
>
>     @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
>       @ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
>         @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
>       @ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
>       @ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
>         @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
> (8 bytes)
>         @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
>         @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
>           @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
>           @ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
>           @ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
> (10 bytes)
>             @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
>               @ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
>                 @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
>                 @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
>               @ 10 org.jruby.RubyFloat::newFloat (10 bytes)
>                 @ 6 org.jruby.RubyFloat::<init> (15 bytes)
>                   @ 3 org.jruby.Ruby::getFloat (5 bytes)
>                   @ 6 org.jruby.RubyNumeric::<init> (7 bytes)
>                     @ 3 org.jruby.RubyObject::<init> (7 bytes)
>                       @ 3 org.jruby.RubyBasicObject::<init> (30 bytes)
>                         @ 1 java.lang.Object::<init> (1 bytes)
>
> This is *great*. We're getting all paths inlined, and allocation
> inlines all the way up to Object::<init>, so in theory escape analysis
> could get rid of this...RIGHT? WRONG!!!
>
> logc appears to be missing some ouput (either the tool or the
> LogCompilation flag are dropping information). The same block of code
> from PrintInlining:
>
>                             @ 207
> java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
> inline (hot)
>                               @ 1
> java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
>                                 @ 4
> java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
>                               @ 14
> java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
> (hot)
>                                 @ 12   java.lang.Class::cast (27
> bytes)   inline (hot)
>                                   @ 6   java.lang.Class::isInstance (0
> bytes)   (intrinsic)
>                                 @ 17
> java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
> inline (hot)
>                                   @ 13
> java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
> bytes)   inline (hot)
>                                   @ 26
> java.lang.invoke.LambdaForm$DMH/842171382::invokeStatic_LL_I (15
> bytes)   inline (hot)
>                                     @ 1
> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
> inline (hot)
>                                     @ 11
> org.jruby.runtime.invokedynamic.MathLinker::floatTest (20 bytes)
> inline (hot)
>                                       @ 8
> org.jruby.Ruby::isFloatReopened (5 bytes)   inline (hot)
>                                 @ 50
> java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
> bytes)   inline (hot)
>                                   @ 1
> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
> inline (hot)
>                                   @ 16
> java.lang.invoke.LambdaForm$BMH/1698703785::reinvoke (32 bytes)
> inline (hot)
>                                     @ 13
> java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget (8
> bytes)   inline (hot)
>                                     @ 28
> java.lang.invoke.LambdaForm$DMH/590335041::invokeStatic_LLLD_L (20
> bytes)   inline (hot)
>                                       @ 1
> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
> inline (hot)
>                                       @ 16
> org.jruby.runtime.invokedynamic.MathLinker::float_op_plus (10 bytes)
> inline (hot)
>                                         @ 6
> org.jruby.RubyFloat::op_plus (14 bytes)   inline (hot)
>                                           @ 1
> org.jruby.RubyBasicObject::getRuntime (8 bytes)   inline (hot)
>                                             @ 1
> org.jruby.RubyBasicObject::getMetaClass (5 bytes)   inline (hot)
>                                             @ 4
> org.jruby.RubyClass::getClassRuntime (5 bytes)   inline (hot)
>                                           @ 10
> org.jruby.RubyFloat::newFloat (10 bytes)   inline (hot)
>                                             @ 6
> org.jruby.RubyFloat::<init> (15 bytes)   inline (hot)
>                                               @ 3
> org.jruby.Ruby::getFloat (5 bytes)   inline (hot)
>                                               @ 6
> org.jruby.RubyNumeric::<init> (7 bytes)   inline (hot)
>                                                 @ 3
> org.jruby.RubyObject::<init> (7 bytes)   inline (hot)
>                                                   @ 3
> org.jruby.RubyBasicObject::<init> (30 bytes)   inline (hot)
>                                                     @ 1
> java.lang.Object::<init> (1 bytes)   inline (hot)
>                                 @ 76
> java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
> bytes)   call site not reached
>
> So *almost* everything is inlining, but one path (I believe it's the
> failure path from GWT after talking with Christian) is not reached.
> Because Hotspot's EA can't do partial EA, any unfollowed paths that
> would receive the allocated object have to be considered escapes, and
> so anywhere we're doing guarded logic (either in indy or in Java code,
> like Fixnum overflow checks) the unfollowed paths prevent EA from
> happening. Boo-hoo.
>
> At this point there's nothing I can really do. I have to guard the
> call sites in case we don't see a Float at some point, and for Fixnum
> overflow I have to do that boolean check in most cases. There's always
> going to be unfollowed paths dangling off the edges of even our
> simplest logic.
>
> Bottom line is that the new indy stuff is starting to really look good
> wrt inlining, but EA is still not up to the task of eliding
> allocations in the places we need it to.
>
> Thoughts?
>
> - Charlie
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


More information about the mlvm-dev mailing list