Unusually high polymorphic dispatch costs?

Fri Apr 29 12:59:28 PDT 2011

Hi,

Given that creating GWTs are expensive, is it a really bad idea to
create them and bind them on a cache miss then? My current logic for
call sites look something like this:

   invoke call site
        if fallback, check if current morphism is < 10.
            If so, create a new GWT with the currently found method and
appropriate test.

How would you recommend doing this without creating GWTs at runtime?
Having ten slots in the call site and precreate the GWTs that use them?

Cheers

On 2011-04-29 09.59, Rémi Forax wrote:
> On 04/28/2011 09:58 PM, Charles Oliver Nutter wrote:
>> I'm trying to figure out why polymorphic dispatch is incredibly slow
>> in JRuby + indy. Take this benchmark, for example:
>>
>> class A; def foo; end; end
>> class B; def foo; end; end
>>
>> a = A.new
>> b = B.new
>>
>> 5.times { puts Benchmark.measure { 1000000.times { a, b = b, a; a.foo;
>> b.foo } } }
>>
>> a.foo and b.foo are bimorphic here. Under stock JRuby, using
>> CachingCallSite, this benchmark runs in about 0.13s per iteration.
>> Using invokedynamic, it takes 9s!!!
>>
>> This is after a patch I just committed that caches the target method
>> handle for direct paths. I believe the only thing created when GWT
>> fails now is a new GWT.
> 
> If you want to emulate a bimorphic cache, you should have two GWTs.
> So no construction of new GWT after discovering all possible targets
> for the two callsites.
> 
> Relying on a mutable MethodHandle, a method handle that change
> for every call will not work well because the JIT will not be able to
> inline through this mutable method handle.
> 
>> Is it expected that rebinding a call site or constructing a GWT would
>> be very expensive? If yes...I will have to look into having a hard
>> failover to inline caching or a PIC-like handle chain for polymorphic
>> cases. That's not necessarily difficult. If no...I'm happy to update
>> my build and play with patches to see what's happening here.
> 
> Yes, it's expensive.
> The target of a CallSite should be stable.
> So yes it's expensible and yes it's intended.
> 
>> A sampled profile produced the following output:
>>
>>           Stub + native   Method
>>   57.6%     0  +  5214    java.lang.invoke.MethodHandleNatives.init
>>   30.9%     0  +  2798    java.lang.invoke.MethodHandleNatives.init
>>    2.1%     0  +   189    java.lang.invoke.MethodHandleNatives.getTarget
>>    0.1%     0  +     7    java.lang.Object.getClass
>>    0.0%     0  +     3    java.lang.Class.isPrimitive
>>    0.0%     0  +     3    java.lang.System.arraycopy
>>   90.7%     0  +  8214    Total stub
>>
>> Of course we all know how accurate sampled profiles are, but this is
>> pretty a pretty dismal result.
>>
>> I suspect that this polymorphic cost is a *major* factor in slowing
>> down some benchmarks under invokedynamic. FWIW, the above benchmark
>> without the a,b swap runs in 0.06s, better than 2x faster than stock
>> JRuby (yay!).
>>
>> - Charlie
> 
> Rémi
> 
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> 

-- 
 Ola Bini (http://olabini.com)
  Ioke - JRuby - ThoughtWorks

 "Yields falsehood when quined" yields falsehood when quined.