Getting back into indy, binding straight through

Wed Jul 28 16:07:51 PDT 2010

Nice results.  Thanks for pushing it through.

Can we figure out what FilterGeneric$F3.invoke_V0 is doing there?

It is the combinator (f,g)=>(x,y,z)=>g(f(x),y,z).

(See near line 564 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/FilterGeneric.java )

The g is probably yours.  The f may be (x:Object)->((boolean)(Boolean)x), as in GuardWithTest.make which uses convertArguments to make sure the predicate produces a boolean.

(See near line 937 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/MethodHandleImpl.java )

I'd like to get rid of the F3.invoke_V0 frame...

-- John

On Jul 27, 2010, at 3:53 PM, Charles Oliver Nutter wrote:

> Here's the real trace...
> 
> 	at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
> 	at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
> 	at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
> 	at bench.bench_fib_recursive.method__0$RUBY$fib_ruby(bench_fib_recursive.rb:7)
> 
> The method handle graph here works out like this:
> 
> * guard on the type serial number
> * fast path is the direct handle to the target method, seen above
> * slow path is the old inline-caching logic that invokes against our
> pseudo-handles
> 
> Some numbers... In this comparison the indy stuff it's only optimizing
> the < + - methods to direct paths.
> 
> In the first case, there's no invokedynamic and we dispatch through a
> separate piece of code that's specific to the math operator and
> Fixnum, that looks like this:
> 
>    public IRubyObject call(ThreadContext context, IRubyObject caller,
> IRubyObject self, long fixnum) {
>        if (self instanceof RubyFixnum) {
>            return ((RubyFixnum) self).op_plus(context, fixnum);
>        }
>        return super.call(context, caller, self, fixnum);
>    }
> 
> And cases that return an IRubyObject (like the call to fib itself)
> dispatch through an object version that just does a normal monomorphic
> cache.
> 
> In the second case, we're using an object Fixnum in every case
> (instead of a long for literal cases like above), and dispatching all
> three math operators through indy. In this case, there are no
> functional differences between the two call paths...for example, the
> actual pseudo-handle for + looks like this:
> 
>  public org.jruby.runtime.builtin.IRubyObject
> call(org.jruby.runtime.ThreadContext,
> org.jruby.runtime.builtin.IRubyObject, org.jruby.RubyModule,
> java.lang.String, org.jruby.runtime.builtin.IRubyObject);
>    Code:
>       0: aload_2
>       1: checkcast     #13                 // class org/jruby/RubyFixnum
>       4: aload_1
>       5: aload         5
>       7: invokevirtual #17                 // Method
> org/jruby/RubyFixnum.op_plus:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>      10: areturn
> }
> 
> Now, the numbers:
> 
> Stock JRuby with long call paths and manually-specialized
> Fixnum#<math> call sites:
> 
> ~/projects/jruby ➔ jruby --server -J-XX:MaxInlineSize=150
> -J-XX:InlineSmallCode=1500 bench/bench_fib_recursive.rb 10832040
>  0.409000   0.000000   0.409000 (  0.353000)
> 832040
>  0.217000   0.000000   0.217000 (  0.216000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 832040
>  0.217000   0.000000   0.217000 (  0.217000)
> 
> Invokedynamic with fast path as a volatile int read + compare and direct call:
> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
> bench/bench_fib_recursive.rb 100
> 832040
>  0.417000   0.000000   0.417000 (  0.361000)
> 832040
>  0.166000   0.000000   0.166000 (  0.166000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.164000)
> 832040
>  0.164000   0.000000   0.164000 (  0.163000)
> 832040
>  0.180000   0.000000   0.180000 (  0.180000)
> 
> This is a much more impressive boost over the non-indy logic than
> previously (fast path still dispatched through our pseudo-handles),
> which I guess is due to getting those extra frames out of the call
> path:
> 
> (old non-direct, via-pseudo-handle indy logic)
> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
> bench/bench_fib_recursive.rb 10
> 832040
>  0.438000   0.000000   0.438000 (  0.382000)
> 832040
>  0.199000   0.000000   0.199000 (  0.200000)
> 832040
>  0.206000   0.000000   0.206000 (  0.205000)
> 832040
>  0.196000   0.000000   0.196000 (  0.196000)
> 832040
>  0.198000   0.000000   0.198000 (  0.198000)
> 832040
>  0.196000   0.000000   0.196000 (  0.196000)
> 832040
>  0.195000   0.000000   0.195000 (  0.195000)
> 832040
>  0.196000   0.000000   0.196000 (  0.196000)
> 832040
>  0.196000   0.000000   0.196000 (  0.196000)
> 832040
>  0.214000   0.000000   0.214000 (  0.214000)
> 
> Note that this is still using the old mechanism for the calls to fib
> itself, and this is not encoding primitive indy calls where literals
> are being passed, both of which will improve performance further.
> 
> Note also this is still a March build of MLVM...so I'm guessing other
> things have happened at the VM level that will improve it even more.
> 
> I'm pleased with this new result!
> 
> - Charlie
> 
> On Tue, Jul 27, 2010 at 1:50 PM, Charles Oliver Nutter
> <headius at headius.com> wrote:
>> I'm slowly getting back into indy stuff :) I'm still running off a
>> build from March, though, since ASM doesn't support the latest
>> changes.
>> 
>> Anyway, I mentioned at JVMLS that I thought I could get indy to patch
>> through to the actual target method in my existing indy stuff. I said
>> I could do it by today, but I was delayed...I have done it now :)
>> 
>> I've only got it wired up for one arity case, but here's what it looks
>> like (with some of the handles still in there...these should disappear
>> as they're supported by the inlining, I presume):
>> 
>> Old backtrace for def foo; 1 + 1; end
>> 
>>        at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>>        at org.jruby.RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.call(org/jruby/RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.gen:65535)
>>        at sun.dyn.FilterGeneric$F7.invoke_F7(FilterGeneric.java:844)
>>        at sun.dyn.FilterGeneric$F6.invoke_F6(FilterGeneric.java:758)
>>        at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
>>        at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>> 
>> Because the current indy stuff binds to our DynamicMethod subclass
>> (RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus), we have at least one
>> extra bounce and a lot more argument juggling because the
>> DynamicMethod.call paths are complicated.
>> 
>> With the modified version, the fast path binds straight through to the
>> actual target method with no intermediate wrapper:
>> 
>>        at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>>        at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
>>        (at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830))
>>        at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>> 
>> The GuardWithTest is not yet in my toy code, but I inserted it where
>> it would be. You can see that once the handles fold away, there's no
>> intermediate code between the caller and the callee.
>> 
>> The interesting thing to me here is that since I know the actual
>> target method in these cases, I can decorate the handle chain with the
>> wrapper logic normally contained in the DynamicMethod subclass, which
>> means with indy we *don't have to generate our intermediate
>> pseudo-handles at all*. That's a tremendous win, for a few reasons: 1.
>> that logic will no longer count against our inlining budgets (at least
>> one stack frame and probably a good dozen+ bytecodes; and 2. I've
>> wrangled raw ASM in the pseudo-handle generation logic way too many
>> times to want to continue doing it :)
>> 
>> Of course it also means we don't have the memory/size costs of
>> generating those classes ourselves.
>> 
>> I'm sure I can do this same thing for field/instance variable
>> accesses, Ruby-to-Java calls, and more, and actually do iterative
>> optimizations without an interpreter or tiered compilation. That's
>> pretty cool.
>> 
>> - Charlie
>> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev