More performance explorations
John Rose
john.r.rose at oracle.com
Sat Jun 4 00:05:05 PDT 2011
On Jun 3, 2011, at 4:15 PM, Tom Rodriguez wrote:
> On Jun 2, 2011, at 7:37 PM, John Rose wrote:
>
>> Thanks; I'll look at your dump later tonight.
>>
>> If the problem is friction from interface casts, we can probably remove them. It's hard to figure out how they are getting in, though. It happens when IRubyObject interconverts with Object.
>
> So I put in a little hack to fold repeated interface checkcasts and that gets back a lot of the performance. With fib on my machine dynopt=true reports 1.005000, invokedynamic=true reports 1.293000 and turning on my checkcast hack gets it down to 1.112000. Unfortunately what I've got right now isn't really suitable for inclusion in the JDK7.
>
> John, I noticed that it looks like MethodHandleWalk is injecting them for return values, thought it's somewhat inconsistent. For instance, I see this:
>
> // FIXME: consider inlining the invokee at the bytecode level
> ArgToken ret = make_invoke(methodOop(invoker), vmIntrinsics::_none,
> Bytecodes::_invokevirtual, false, 1+argc, &arglist[0], CHECK_(empty));
> DEBUG_ONLY(invoker = NULL);
> if (rtype == T_OBJECT) {
> klassOop rklass = java_lang_Class::as_klassOop( java_lang_invoke_MethodType::rtype(recursive_mtype()) );
> if (rklass != SystemDictionary::Object_klass() &&
> !Klass::cast(rklass)->is_interface()) {
> // preserve type safety
> ret = make_conversion(T_OBJECT, rklass, Bytecodes::_checkcast, ret, CHECK_(empty));
> }
> }
>
> but down in make_invoke itself we do this:
>
> switch (_rtype) {
> case T_BOOLEAN: case T_BYTE: case T_CHAR: case T_SHORT:
> case T_INT: emit_bc(Bytecodes::_ireturn); break;
> case T_LONG: emit_bc(Bytecodes::_lreturn); break;
> case T_FLOAT: emit_bc(Bytecodes::_freturn); break;
> case T_DOUBLE: emit_bc(Bytecodes::_dreturn); break;
> case T_VOID: emit_bc(Bytecodes::_return); break;
> case T_OBJECT:
> if (_rklass.not_null() && _rklass() != SystemDictionary::Object_klass())
> emit_bc(Bytecodes::_checkcast, cpool_klass_put(_rklass()));
> emit_bc(Bytecodes::_areturn);
>
> This results in adapter bytecodes that look like this:
>
> 0 aload_1
> 1 aload #4
> 3 aload #5
> 5 aload_2
> 6 aload #6
> 8 invokevirtual 7 <org/jruby/internal/runtime/methods/DynamicMethod.call(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/RubyModule;Ljava/lang/String;)Lorg/jruby/runtime/builtin/IRubyObject;>
> 0 bci: 8 VirtualCallData count(10000) entries(0)
> 11 checkcast 8 <org/jruby/runtime/builtin/IRubyObject>
> 24 bci: 11 ReceiverTypeData count(10000) entries(0)
> 14 areturn
>
> which seems fairly pointless.
Yes, those are pointless and should be removed (with an extra !is_interface guard).
> These don't seem to be the source of the checkcasts in jruby though. They seem to be explicitly part of the method handle chain. For this chain:
>
> 0xeff0d808: adapter: arg_slot 0 conversion op check_cast (LLLLL)L
> 0xeff0d7a8: adapter: arg_slot 1 conversion op check_cast (LLLLL)L
> 0xeff0d748: adapter: arg_slot 2 conversion op check_cast (LLLLL)L
> 0xeff0d6e8: adapter: arg_slot 3 conversion op check_cast (LLLLL)L
> 0xeff0d688: adapter: arg_slot 4 conversion op check_cast (LLLLL)L
> 0xeff0d2b8: adapter: arg_slot 1 conversion op drop_args pushes -1 (LLLLL)L
> 0xeff0d1a8: adapter: arg_slot 2 conversion op drop_args pushes -1 (LLLL)L
> 0xeff0acd8: bound: arg_type object arg_slot 0 instance org.jruby.runtime.Block (LLL)L
> 0xeff0ac68: bound: arg_type object arg_slot 4 instance bench.bench_fib_recursive (LLLL)L
>
> we produce these bytecodes:
>
> 0 aload #5
> 2 checkcast 3 <org/jruby/runtime/builtin/IRubyObject>
> 0 bci: 2 ReceiverTypeData count(31244) entries(0)
> 5 astore #5
> 7 aload #4
> 9 checkcast 4 <java/lang/String>
> 24 bci: 9 ReceiverTypeData count(31244) entries(0)
> 12 astore #4
> 14 aload_3
> 15 checkcast 5 <org/jruby/runtime/builtin/IRubyObject>
> 48 bci: 15 ReceiverTypeData count(31244) entries(0)
> 18 astore_3
> 19 aload_2
> 20 checkcast 6 <org/jruby/runtime/builtin/IRubyObject>
> 72 bci: 20 ReceiverTypeData count(31244) entries(0)
> 23 astore_2
> 24 aload_1
> 25 checkcast 7 <org/jruby/runtime/ThreadContext>
> 96 bci: 25 ReceiverTypeData count(31244) entries(0)
> 28 astore_1
> 29 ldc <Object> 0xefe59f88
> 31 aload_1
> 32 aload_3
> 33 aload #5
> 35 ldc <Object> 0xefabd418
> 37 invokestatic 14 <bench/bench_fib_recursive.method__0$RUBY$fib_ruby(Lbench/bench_fib_recursive;Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runti\
> me/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;)Lorg/jruby/runtime/builtin/IRubyObject;>
> 120 bci: 37 CounterData count(31244)
> 40 areturn
That looks like somebody did this:
MethodHandle inner = #method__0$RUBY$fib_ruby;
MethodHandle outer = inner.asType(inner.type().generic());
In other words, wrapped a moderately typeful method in an erased method type of all-Object.
The big question is who built that chain.
One big answer is that pre-RF code was building such things routinely, in order to normalize signatures down to a few equivalence classes (arity only). But post-RF code doesn't need to do that. I found a few places in MethodHandleImpl.java where needless asType calls were issued in order to normalize signatures. I changed those to an internal equivalent to explicitCastArgs, and pushed it.
> Just blindly skipping checkcast method handles for interface types bring the time on fib down to 1.071000.
That's promising.
-- John
More information about the mlvm-dev
mailing list