Unusually high polymorphic dispatch costs?

Fri Apr 29 02:23:13 PDT 2011

On Apr 28, 2011, at 9:58 PM, Charles Oliver Nutter wrote:
> I'm trying to figure out why polymorphic dispatch is incredibly slow
> in JRuby + indy. Take this benchmark, for example:
> 
> class A; def foo; end; end
> class B; def foo; end; end
> 
> a = A.new
> b = B.new
> 
> 5.times { puts Benchmark.measure { 1000000.times { a, b = b, a; a.foo;
> b.foo } } }
> 
> a.foo and b.foo are bimorphic here. Under stock JRuby, using
> CachingCallSite, this benchmark runs in about 0.13s per iteration.
> Using invokedynamic, it takes 9s!!!
> 
> This is after a patch I just committed that caches the target method
> handle for direct paths. I believe the only thing created when GWT
> fails now is a new GWT.
> 
> Is it expected that rebinding a call site or constructing a GWT would
> be very expensive?

Looking at the compiled methods, it seems so.  There is a lot going on when creating a new GWT.

> If yes...I will have to look into having a hard
> failover to inline caching or a PIC-like handle chain for polymorphic
> cases. That's not necessarily difficult. If no...I'm happy to update
> my build and play with patches to see what's happening here.
> 
> A sampled profile produced the following output:
> 
>         Stub + native   Method
> 57.6%     0  +  5214    java.lang.invoke.MethodHandleNatives.init
> 30.9%     0  +  2798    java.lang.invoke.MethodHandleNatives.init
>  2.1%     0  +   189    java.lang.invoke.MethodHandleNatives.getTarget
>  0.1%     0  +     7    java.lang.Object.getClass
>  0.0%     0  +     3    java.lang.Class.isPrimitive
>  0.0%     0  +     3    java.lang.System.arraycopy
> 90.7%     0  +  8214    Total stub
> 
> Of course we all know how accurate sampled profiles are, but this is
> pretty a pretty dismal result.

But that seems to be correct.  java.lang.invoke.MethodHandleImpl$GuardWithTest::<init> gets compiled and the inline tree is:

   8892  135             java.lang.invoke.MethodHandleImpl$GuardWithTest::<init> (22 bytes)
                            @ 2   java.lang.invoke.BoundMethodHandle::<init> (37 bytes)   inline (hot)
                              @ 2   java.lang.invoke.MethodHandle::type (5 bytes)   inline (hot)
                              @ 7   java.lang.invoke.MethodType::dropParameterTypes (162 bytes)   already compiled into a big method
                              @ 10   java.lang.invoke.MethodHandle::<init> (15 bytes)   inline (hot)
                                @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                                @ 5   java.lang.Object::getClass (0 bytes)   (intrinsic)
                              @ 20   java.lang.invoke.MethodHandle::type (5 bytes)   inline (hot)
                              @ 24   java.lang.invoke.MethodType::parameterSlotDepth (30 bytes)   inline (hot)
                                @ 26   java.lang.invoke.MethodTypeForm::parameterToArgSlot (9 bytes)   inline (hot)
                              @ 33   java.lang.invoke.BoundMethodHandle::initTarget (7 bytes)   inline (hot)
                                @ 3   java.lang.invoke.MethodHandleNatives::init (0 bytes)   native method

Obviously that is VERY expensive.

-- Christian

> 
> I suspect that this polymorphic cost is a *major* factor in slowing
> down some benchmarks under invokedynamic. FWIW, the above benchmark
> without the a,b swap runs in 0.06s, better than 2x faster than stock
> JRuby (yay!).
> 
> - Charlie
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev