Inlining

Mon Aug 24 10:14:30 PDT 2009

Charles Oliver Nutter wrote:
> On Mon, Aug 24, 2009 at 5:08 AM, Raffaello
> Giulietti<raffaello.giulietti at gmail.com> wrote:
>> The target of the call site is a method handle to a method similar to
>>
>> static Obj invoke_0(Stub stub, Obj self) {
>>    if (self.klass == stub.klass_0)
>>        return stub.mh_0.<Obj>invoke(self);
>>    // some other similar lines, depending on how polymorphic the inline
>> cache shall be
>>
>>    // otherwise do a slow lookup, cache the results in the stub
>> according to some strategy and invoke the final method
>>    return lookupCacheInvoke(stub, self);
>> }
>>
>> It is true that each call site's target refers to the same method
>> handle, hence to the same dispatching method, i.e., invoke_0 above.
>> However, every call site has its own Stub instance, so every call site
>> caches its own information.
>>
>> But you state
>>
>>> It sounds like you're still dispatching through a generic piece of
>>> code, yes? If you have a piece of code in the call path that all calls
>>> pass through, you essentially defeat inlining entirely.
>> Why is this so? I cannot see why invoke_0 couldn't be inlined at the
>> invokedynamic call site. Is there a fundamental reason?
> 
> invoke_0 will be inlined, but the further call to the method handles
> may not be. If this same method is being called through for many
> different paths, then you actually have a polymorphic (even
> megamorphic) call site at the <Obj>invoke(self) calls. It will be, to
> the JVM, a single call site with a large number of possible targets,
> and at least Hotspot can't inline across such a boundary.
> 
> What you actually want to do is install a MH into the indy call site
> that points either directly at the eventual method to be called or at
> a guardWithTest handle that performs your polymorphic check and then
> decides on a slow or fast path call.
> 
> Here's what that looks like in JRuby:
> 
>     private static MethodHandle createGWT(MethodHandle test,
> MethodHandle target, MethodHandle fallback, CacheEntry entry, CallSite
> site) {
>         MethodHandle myTest = MethodHandles.insertArguments(test, 0, entry);
>         MethodHandle myTarget = MethodHandles.insertArguments(target, 0, entry);
>         MethodHandle myFallback =
> MethodHandles.insertArguments(fallback, 0, site);
>         MethodHandle guardWithTest =
> MethodHandles.guardWithTest(myTest, myTarget, myFallback);
> 
>         return MethodHandles.convertArguments(guardWithTest, site.type());
>     }
> 
> The result of this call is installed directly into the indy call site.
> The "test" boils down to this:
> 
>     public static boolean test(CacheEntry entry, IRubyObject self) {
>         return entry.typeOk(self.getMetaClass());
>     }
> 
> This just confirms that the CacheEntry (a tuple of class token and
> method) is valid for the incoming self.
> 
> The fallback patch ends up like this, a megamorphic slow-path bit of logic:
> 
>     public static IRubyObject fallback(JRubyCallSite site,
> ThreadContext context, IRubyObject caller, IRubyObject self, String
> name) {
>         RubyClass selfClass = pollAndGetClass(context, self);
>         CacheEntry entry = selfClass.searchWithCache(name);
>         if (methodMissing(entry, site.callType(), name, caller)) {
>             return callMethodMissing(entry, site.callType(), context,
> self, name);
>         }
>         site.setTarget(createGWT(TEST_0, TARGET_0, FALLBACK_0, entry, site));
> 
>         return entry.method.call(context, self, selfClass, name);
>     }
> 
> Notice that it reinstalls a *new* GWT with the new fast-path target,
> and then calls the method directly.
> 
> The targets don't have any generic piece of code; they all bind
> eventually to a virtual call to DynamicMethod.call, where
> DynamicMethod is JRuby's method object abstraction. In each case, the
> eventual implementation of "call" invoked should be monomorphic for a
> given GWT fast path, but I will also be removing that last phase in
> favor of a DirectMethodHandle that goes straight to the actual target
> code (DynamicMethod impls are frequently generated code, since we
> don't have method handles pre-JDK7). But the basic idea is that your
> fast path should not call through any generalized Java code; it needs
> to be handles all the way to a unique target, or inlining is defeated.
> 
> With indy now showing some perf improvement over my basic code, I'm
> going to fix that last phase and prepare a blog post on all this.
> 
> - Charlie

Charlie, thanks for the clear explanation.
Tomorrow I'll try to refactor my code according to your suggestions.