Getting back into indy, binding straight through

Wed Jul 28 17:13:47 PDT 2010

Could it be my casting operation?

            // prepare a handle to do the cast
            MethodHandle cast =
MethodHandles.lookup().findVirtual(Class.class, "cast",
MethodType.make(Object.class, Object.class));
            cast = MethodHandles.convertArguments(cast,
MethodType.make(nativeCall.getNativeTarget(), Class.class,
IRubyObject.class));
            cast = MethodHandles.insertArguments(cast, 0,
nativeCall.getNativeTarget());

            // get the handle to the actual method
            MethodHandle directTarget =
MethodHandles.lookup().findVirtual(nativeCall.getNativeTarget(),
nativeCall.getNativeName(), nativeMethodType);

            // filter the receiver to cast to exact
            MethodHandle[] filters = new
MethodHandle[nativeCall.getNativeSignature().length + 1];
            filters[0] = cast;
            directTarget = MethodHandles.filterArguments(directTarget, filters);

- Charlie

On Wed, Jul 28, 2010 at 4:07 PM, John Rose <john.r.rose at oracle.com> wrote:
> Nice results.  Thanks for pushing it through.
>
> Can we figure out what FilterGeneric$F3.invoke_V0 is doing there?
>
> It is the combinator (f,g)=>(x,y,z)=>g(f(x),y,z).
>
> (See near line 564 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/FilterGeneric.java )
>
> The g is probably yours.  The f may be (x:Object)->((boolean)(Boolean)x), as in GuardWithTest.make which uses convertArguments to make sure the predicate produces a boolean.
>
> (See near line 937 of http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/tip/src/share/classes/sun/dyn/MethodHandleImpl.java )
>
> I'd like to get rid of the F3.invoke_V0 frame...
>
> -- John
>
> On Jul 27, 2010, at 3:53 PM, Charles Oliver Nutter wrote:
>
>> Here's the real trace...
>>
>>       at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>>       at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
>>       at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
>>       at bench.bench_fib_recursive.method__0$RUBY$fib_ruby(bench_fib_recursive.rb:7)
>>
>> The method handle graph here works out like this:
>>
>> * guard on the type serial number
>> * fast path is the direct handle to the target method, seen above
>> * slow path is the old inline-caching logic that invokes against our
>> pseudo-handles
>>
>> Some numbers... In this comparison the indy stuff it's only optimizing
>> the < + - methods to direct paths.
>>
>> In the first case, there's no invokedynamic and we dispatch through a
>> separate piece of code that's specific to the math operator and
>> Fixnum, that looks like this:
>>
>>    public IRubyObject call(ThreadContext context, IRubyObject caller,
>> IRubyObject self, long fixnum) {
>>        if (self instanceof RubyFixnum) {
>>            return ((RubyFixnum) self).op_plus(context, fixnum);
>>        }
>>        return super.call(context, caller, self, fixnum);
>>    }
>>
>> And cases that return an IRubyObject (like the call to fib itself)
>> dispatch through an object version that just does a normal monomorphic
>> cache.
>>
>> In the second case, we're using an object Fixnum in every case
>> (instead of a long for literal cases like above), and dispatching all
>> three math operators through indy. In this case, there are no
>> functional differences between the two call paths...for example, the
>> actual pseudo-handle for + looks like this:
>>
>>  public org.jruby.runtime.builtin.IRubyObject
>> call(org.jruby.runtime.ThreadContext,
>> org.jruby.runtime.builtin.IRubyObject, org.jruby.RubyModule,
>> java.lang.String, org.jruby.runtime.builtin.IRubyObject);
>>    Code:
>>       0: aload_2
>>       1: checkcast     #13                 // class org/jruby/RubyFixnum
>>       4: aload_1
>>       5: aload         5
>>       7: invokevirtual #17                 // Method
>> org/jruby/RubyFixnum.op_plus:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
>>      10: areturn
>> }
>>
>> Now, the numbers:
>>
>> Stock JRuby with long call paths and manually-specialized
>> Fixnum#<math> call sites:
>>
>> ~/projects/jruby ➔ jruby --server -J-XX:MaxInlineSize=150
>> -J-XX:InlineSmallCode=1500 bench/bench_fib_recursive.rb 10832040
>>  0.409000   0.000000   0.409000 (  0.353000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.216000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>> 832040
>>  0.217000   0.000000   0.217000 (  0.217000)
>>
>> Invokedynamic with fast path as a volatile int read + compare and direct call:
>> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
>> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
>> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
>> bench/bench_fib_recursive.rb 100
>> 832040
>>  0.417000   0.000000   0.417000 (  0.361000)
>> 832040
>>  0.166000   0.000000   0.166000 (  0.166000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.164000)
>> 832040
>>  0.164000   0.000000   0.164000 (  0.163000)
>> 832040
>>  0.180000   0.000000   0.180000 (  0.180000)
>>
>> This is a much more impressive boost over the non-indy logic than
>> previously (fast path still dispatched through our pseudo-handles),
>> which I guess is due to getting those extra frames out of the call
>> path:
>>
>> (old non-direct, via-pseudo-handle indy logic)
>> ~/projects/jruby ➔ jruby --server -J-XX:+UnlockExperimentalVMOptions
>> -J-XX:+EnableInvokeDynamic -J-Djruby.compile.invokedynamic=true
>> -J-XX:MaxInlineSize=150 -J-XX:InlineSmallCode=1500
>> bench/bench_fib_recursive.rb 10
>> 832040
>>  0.438000   0.000000   0.438000 (  0.382000)
>> 832040
>>  0.199000   0.000000   0.199000 (  0.200000)
>> 832040
>>  0.206000   0.000000   0.206000 (  0.205000)
>> 832040
>>  0.196000   0.000000   0.196000 (  0.196000)
>> 832040
>>  0.198000   0.000000   0.198000 (  0.198000)
>> 832040
>>  0.196000   0.000000   0.196000 (  0.196000)
>> 832040
>>  0.195000   0.000000   0.195000 (  0.195000)
>> 832040
>>  0.196000   0.000000   0.196000 (  0.196000)
>> 832040
>>  0.196000   0.000000   0.196000 (  0.196000)
>> 832040
>>  0.214000   0.000000   0.214000 (  0.214000)
>>
>> Note that this is still using the old mechanism for the calls to fib
>> itself, and this is not encoding primitive indy calls where literals
>> are being passed, both of which will improve performance further.
>>
>> Note also this is still a March build of MLVM...so I'm guessing other
>> things have happened at the VM level that will improve it even more.
>>
>> I'm pleased with this new result!
>>
>> - Charlie
>>
>> On Tue, Jul 27, 2010 at 1:50 PM, Charles Oliver Nutter
>> <headius at headius.com> wrote:
>>> I'm slowly getting back into indy stuff :) I'm still running off a
>>> build from March, though, since ASM doesn't support the latest
>>> changes.
>>>
>>> Anyway, I mentioned at JVMLS that I thought I could get indy to patch
>>> through to the actual target method in my existing indy stuff. I said
>>> I could do it by today, but I was delayed...I have done it now :)
>>>
>>> I've only got it wired up for one arity case, but here's what it looks
>>> like (with some of the handles still in there...these should disappear
>>> as they're supported by the inlining, I presume):
>>>
>>> Old backtrace for def foo; 1 + 1; end
>>>
>>>        at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>>>        at org.jruby.RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.call(org/jruby/RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.gen:65535)
>>>        at sun.dyn.FilterGeneric$F7.invoke_F7(FilterGeneric.java:844)
>>>        at sun.dyn.FilterGeneric$F6.invoke_F6(FilterGeneric.java:758)
>>>        at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830)
>>>        at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>>>
>>> Because the current indy stuff binds to our DynamicMethod subclass
>>> (RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus), we have at least one
>>> extra bounce and a lot more argument juggling because the
>>> DynamicMethod.call paths are complicated.
>>>
>>> With the modified version, the fast path binds straight through to the
>>> actual target method with no intermediate wrapper:
>>>
>>>        at org.jruby.RubyFixnum.op_plus(RubyFixnum.java:328)
>>>        at sun.dyn.FilterGeneric$F3.invoke_V0(FilterGeneric.java:565)
>>>        (at sun.dyn.MethodHandleImpl$GuardWithTest.invoke_L5(MethodHandleImpl.java:830))
>>>        at ruby.__dash_e__.method__0$RUBY$foo(-e:1)
>>>
>>> The GuardWithTest is not yet in my toy code, but I inserted it where
>>> it would be. You can see that once the handles fold away, there's no
>>> intermediate code between the caller and the callee.
>>>
>>> The interesting thing to me here is that since I know the actual
>>> target method in these cases, I can decorate the handle chain with the
>>> wrapper logic normally contained in the DynamicMethod subclass, which
>>> means with indy we *don't have to generate our intermediate
>>> pseudo-handles at all*. That's a tremendous win, for a few reasons: 1.
>>> that logic will no longer count against our inlining budgets (at least
>>> one stack frame and probably a good dozen+ bytecodes; and 2. I've
>>> wrangled raw ASM in the pseudo-handle generation logic way too many
>>> times to want to continue doing it :)
>>>
>>> Of course it also means we don't have the memory/size costs of
>>> generating those classes ourselves.
>>>
>>> I'm sure I can do this same thing for field/instance variable
>>> accesses, Ruby-to-Java calls, and more, and actually do iterative
>>> optimizations without an interpreter or tiered compilation. That's
>>> pretty cool.
>>>
>>> - Charlie
>>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>