Getting back into indy, binding straight through
John Rose
john.r.rose at oracle.com
Wed Jul 28 16:07:51 PDT 2010
Nice results. Thanks for pushing it through.
Can we figure out what FilterGeneric$F3.invoke_V0 is doing there?
It is the combinator (f,g)=>(x,y,z)=>g(f(x),y,z).
(See near line 564 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/FilterGeneric.java )
The g is probably yours. The f may be (x:Object)->((boolean)(Boolean)x), as in GuardWithTest.make which uses convertArguments to make sure the predicate produces a boolean.
(See near line 937 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/MethodHandleImpl.java )
I'd like to get rid of the F3.invoke_V0 frame...
-- John
On Jul 27, 2010, at 3:53 PM, Charles Oliver Nutter wrote:
> Here's the real trace...
>
> at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
> at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
> at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
> at bench.bench_fib_recursive.method__0$RUBY$fib_ruby(bench_fib_recursive.rb:7)
>
> The method handle graph here works out like this:
>
> * guard on the type serial number
> * fast path is the direct handle to the target method, seen above
> * slow path is the old inline-caching logic that invokes against our
> pseudo-handles
>
> Some numbers... In this comparison the indy stuff it's only optimizing
> the < + - methods to direct paths.
>
> In the first case, there's no invokedynamic and we dispatch through a
> separate piece of code that's specific to the math operator and
> Fixnum, that looks like this:
>
> public IRubyObject call(ThreadContext context, IRubyObject caller,
> IRubyObject self, long fixnum) {
> if (self instanceof RubyFixnum) {
> return ((RubyFixnum) self).op_plus(context, fixnum);
> }
> return super.call(context, caller, self, fixnum);
> }
>
> And cases that return an IRubyObject (like the call to fib itself)
> dispatch through an object version that just does a normal monomorphic
> cache.
>
> In the second case, we're using an object Fixnum in every case
> (instead of a long for literal cases like above), and dispatching all
> three math operators through indy. In this case, there are no
> functional differences between the two call paths...for example, the
> actual pseudo-handle for + looks like this:
>
> public org.jruby.runtime.builtin.IRubyObject
> call(org.jruby.runtime.ThreadContext,
> org.jruby.runtime.builtin.IRubyObject, org.jruby.RubyModule,
> java.lang.String, org.jruby.runtime.builtin.IRubyObject);
> Code:
> 0: aload_2
> 1: checkcast #13 // class org/jruby/RubyFixnum
> 4: aload_1
> 5: aload 5
> 7: invokevirtual #17 // Method
> org/jruby/RubyFixnum.op_plus:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
> 10: areturn
> }
>
> Now, the numbers:
>
> Stock JRuby with long call paths and manually-specialized
> Fixnum#<math> call sites:
>
> ~/projects/jruby ➔ jruby --server -J-XX:MaxInlineSize=150
> -J-XX:InlineSmallCode=1500 bench/bench_fib_recursive.rb 10832040
> 0.409000 0.000000 0.409000 ( 0.353000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.216000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
> 832040
> 0.217000 0.000000 0.217000 ( 0.217000)
>
> Invokedynamic with fast path as a volatile int read + compare and direct call:
> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
> bench/bench_fib_recursive.rb 100
> 832040
> 0.417000 0.000000 0.417000 ( 0.361000)
> 832040
> 0.166000 0.000000 0.166000 ( 0.166000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.164000)
> 832040
> 0.164000 0.000000 0.164000 ( 0.163000)
> 832040
> 0.180000 0.000000 0.180000 ( 0.180000)
>
> This is a much more impressive boost over the non-indy logic than
> previously (fast path still dispatched through our pseudo-handles),
> which I guess is due to getting those extra frames out of the call
> path:
>
> (old non-direct, via-pseudo-handle indy logic)
> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
> bench/bench_fib_recursive.rb 10
> 832040
> 0.438000 0.000000 0.438000 ( 0.382000)
> 832040
> 0.199000 0.000000 0.199000 ( 0.200000)
> 832040
> 0.206000 0.000000 0.206000 ( 0.205000)
> 832040
> 0.196000 0.000000 0.196000 ( 0.196000)
> 832040
> 0.198000 0.000000 0.198000 ( 0.198000)
> 832040
> 0.196000 0.000000 0.196000 ( 0.196000)
> 832040
> 0.195000 0.000000 0.195000 ( 0.195000)
> 832040
> 0.196000 0.000000 0.196000 ( 0.196000)
> 832040
> 0.196000 0.000000 0.196000 ( 0.196000)
> 832040
> 0.214000 0.000000 0.214000 ( 0.214000)
>
> Note that this is still using the old mechanism for the calls to fib
> itself, and this is not encoding primitive indy calls where literals
> are being passed, both of which will improve performance further.
>
> Note also this is still a March build of MLVM...so I'm guessing other
> things have happened at the VM level that will improve it even more.
>
> I'm pleased with this new result!
>
> - Charlie
>
> On Tue, Jul 27, 2010 at 1:50 PM, Charles Oliver Nutter
> <headius at headius.com> wrote:
>> I'm slowly getting back into indy stuff :) I'm still running off a
>> build from March, though, since ASM doesn't support the latest
>> changes.
>>
>> Anyway, I mentioned at JVMLS that I thought I could get indy to patch
>> through to the actual target method in my existing indy stuff. I said
>> I could do it by today, but I was delayed...I have done it now :)
>>
>> I've only got it wired up for one arity case, but here's what it looks
>> like (with some of the handles still in there...these should disappear
>> as they're supported by the inlining, I presume):
>>
>> Old backtrace for def foo; 1 + 1; end
>>
>> at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>> at org.jruby.RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.call(org/jruby/RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.gen:65535)
>> at sun.dyn.FilterGeneric$F7.invoke_F7(FilterGeneric.java:844)
>> at sun.dyn.FilterGeneric$F6.invoke_F6(FilterGeneric.java:758)
>> at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
>> at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>>
>> Because the current indy stuff binds to our DynamicMethod subclass
>> (RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus), we have at least one
>> extra bounce and a lot more argument juggling because the
>> DynamicMethod.call paths are complicated.
>>
>> With the modified version, the fast path binds straight through to the
>> actual target method with no intermediate wrapper:
>>
>> at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>> at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
>> (at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830))
>> at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>>
>> The GuardWithTest is not yet in my toy code, but I inserted it where
>> it would be. You can see that once the handles fold away, there's no
>> intermediate code between the caller and the callee.
>>
>> The interesting thing to me here is that since I know the actual
>> target method in these cases, I can decorate the handle chain with the
>> wrapper logic normally contained in the DynamicMethod subclass, which
>> means with indy we *don't have to generate our intermediate
>> pseudo-handles at all*. That's a tremendous win, for a few reasons: 1.
>> that logic will no longer count against our inlining budgets (at least
>> one stack frame and probably a good dozen+ bytecodes; and 2. I've
>> wrangled raw ASM in the pseudo-handle generation logic way too many
>> times to want to continue doing it :)
>>
>> Of course it also means we don't have the memory/size costs of
>> generating those classes ourselves.
>>
>> I'm sure I can do this same thing for field/instance variable
>> accesses, Ruby-to-Java calls, and more, and actually do iterative
>> optimizations without an interpreter or tiered compilation. That's
>> pretty cool.
>>
>> - Charlie
>>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
More information about the mlvm-dev
mailing list