Unusually high polymorphic dispatch costs?

Sun May 1 08:06:06 PDT 2011

On 04/29/2011 09:59 PM, Ola Bini wrote:
> Hi,
>
> Given that creating GWTs are expensive, is it a really bad idea to
> create them and bind them on a cache miss then? My current logic for
> call sites look something like this:
>
>     invoke call site
>          if fallback, check if current morphism is<  10.
>              If so, create a new GWT with the currently found method and
> appropriate test.
>
> How would you recommend doing this without creating GWTs at runtime?
> Having ten slots in the call site and precreate the GWTs that use them?
>
> Cheers

Creating GWT at runtime is fine unless you create one for each call.
So your logic is fine.

Rémi

> On 2011-04-29 09.59, Rémi Forax wrote:
>> On 04/28/2011 09:58 PM, Charles Oliver Nutter wrote:
>>> I'm trying to figure out why polymorphic dispatch is incredibly slow
>>> in JRuby + indy. Take this benchmark, for example:
>>>
>>> class A; def foo; end; end
>>> class B; def foo; end; end
>>>
>>> a = A.new
>>> b = B.new
>>>
>>> 5.times { puts Benchmark.measure { 1000000.times { a, b = b, a; a.foo;
>>> b.foo } } }
>>>
>>> a.foo and b.foo are bimorphic here. Under stock JRuby, using
>>> CachingCallSite, this benchmark runs in about 0.13s per iteration.
>>> Using invokedynamic, it takes 9s!!!
>>>
>>> This is after a patch I just committed that caches the target method
>>> handle for direct paths. I believe the only thing created when GWT
>>> fails now is a new GWT.
>> If you want to emulate a bimorphic cache, you should have two GWTs.
>> So no construction of new GWT after discovering all possible targets
>> for the two callsites.
>>
>> Relying on a mutable MethodHandle, a method handle that change
>> for every call will not work well because the JIT will not be able to
>> inline through this mutable method handle.
>>
>>> Is it expected that rebinding a call site or constructing a GWT would
>>> be very expensive? If yes...I will have to look into having a hard
>>> failover to inline caching or a PIC-like handle chain for polymorphic
>>> cases. That's not necessarily difficult. If no...I'm happy to update
>>> my build and play with patches to see what's happening here.
>> Yes, it's expensive.
>> The target of a CallSite should be stable.
>> So yes it's expensible and yes it's intended.
>>
>>> A sampled profile produced the following output:
>>>
>>>            Stub + native   Method
>>>    57.6%     0  +  5214    java.lang.invoke.MethodHandleNatives.init
>>>    30.9%     0  +  2798    java.lang.invoke.MethodHandleNatives.init
>>>     2.1%     0  +   189    java.lang.invoke.MethodHandleNatives.getTarget
>>>     0.1%     0  +     7    java.lang.Object.getClass
>>>     0.0%     0  +     3    java.lang.Class.isPrimitive
>>>     0.0%     0  +     3    java.lang.System.arraycopy
>>>    90.7%     0  +  8214    Total stub
>>>
>>> Of course we all know how accurate sampled profiles are, but this is
>>> pretty a pretty dismal result.
>>>
>>> I suspect that this polymorphic cost is a *major* factor in slowing
>>> down some benchmarks under invokedynamic. FWIW, the above benchmark
>>> without the a,b swap runs in 0.06s, better than 2x faster than stock
>>> JRuby (yay!).
>>>
>>> - Charlie
>> Rémi
>>
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
>