Inlining

Tue Aug 25 06:31:41 PDT 2009

Raffaello Giulietti wrote:
> Charles Oliver Nutter wrote:
>> On Mon, Aug 24, 2009 at 5:08 AM, Raffaello
>> Giulietti<raffaello.giulietti at gmail.com> wrote:
>>> The target of the call site is a method handle to a method similar to
>>>
>>> static Obj invoke_0(Stub stub, Obj self) {
>>>    if (self.klass == stub.klass_0)
>>>        return stub.mh_0.<Obj>invoke(self);
>>>    // some other similar lines, depending on how polymorphic the inline
>>> cache shall be
>>>
>>>    // otherwise do a slow lookup, cache the results in the stub
>>> according to some strategy and invoke the final method
>>>    return lookupCacheInvoke(stub, self);
>>> }
>>>
>>> It is true that each call site's target refers to the same method
>>> handle, hence to the same dispatching method, i.e., invoke_0 above.
>>> However, every call site has its own Stub instance, so every call site
>>> caches its own information.
>>>
>>> But you state
>>>
>>>> It sounds like you're still dispatching through a generic piece of
>>>> code, yes? If you have a piece of code in the call path that all calls
>>>> pass through, you essentially defeat inlining entirely.
>>> Why is this so? I cannot see why invoke_0 couldn't be inlined at the
>>> invokedynamic call site. Is there a fundamental reason?
>> invoke_0 will be inlined, but the further call to the method handles
>> may not be. If this same method is being called through for many
>> different paths, then you actually have a polymorphic (even
>> megamorphic) call site at the <Obj>invoke(self) calls. It will be, to
>> the JVM, a single call site with a large number of possible targets,
>> and at least Hotspot can't inline across such a boundary.
>>
>> What you actually want to do is install a MH into the indy call site
>> that points either directly at the eventual method to be called or at
>> a guardWithTest handle that performs your polymorphic check and then
>> decides on a slow or fast path call.
>>
>> Here's what that looks like in JRuby:
>>
>>     private static MethodHandle createGWT(MethodHandle test,
>> MethodHandle target, MethodHandle fallback, CacheEntry entry, CallSite
>> site) {
>>         MethodHandle myTest = MethodHandles.insertArguments(test, 0, entry);
>>         MethodHandle myTarget = MethodHandles.insertArguments(target, 0, entry);
>>         MethodHandle myFallback =
>> MethodHandles.insertArguments(fallback, 0, site);
>>         MethodHandle guardWithTest =
>> MethodHandles.guardWithTest(myTest, myTarget, myFallback);
>>
>>         return MethodHandles.convertArguments(guardWithTest, site.type());
>>     }
>>
>> The result of this call is installed directly into the indy call site.
>> The "test" boils down to this:
>>
>>     public static boolean test(CacheEntry entry, IRubyObject self) {
>>         return entry.typeOk(self.getMetaClass());
>>     }
>>
>> This just confirms that the CacheEntry (a tuple of class token and
>> method) is valid for the incoming self.
>>
>> The fallback patch ends up like this, a megamorphic slow-path bit of logic:
>>
>>     public static IRubyObject fallback(JRubyCallSite site,
>> ThreadContext context, IRubyObject caller, IRubyObject self, String
>> name) {
>>         RubyClass selfClass = pollAndGetClass(context, self);
>>         CacheEntry entry = selfClass.searchWithCache(name);
>>         if (methodMissing(entry, site.callType(), name, caller)) {
>>             return callMethodMissing(entry, site.callType(), context,
>> self, name);
>>         }
>>         site.setTarget(createGWT(TEST_0, TARGET_0, FALLBACK_0, entry, site));
>>
>>         return entry.method.call(context, self, selfClass, name);
>>     }
>>
>> Notice that it reinstalls a *new* GWT with the new fast-path target,
>> and then calls the method directly.
>>
>> The targets don't have any generic piece of code; they all bind
>> eventually to a virtual call to DynamicMethod.call, where
>> DynamicMethod is JRuby's method object abstraction. In each case, the
>> eventual implementation of "call" invoked should be monomorphic for a
>> given GWT fast path, but I will also be removing that last phase in
>> favor of a DirectMethodHandle that goes straight to the actual target
>> code (DynamicMethod impls are frequently generated code, since we
>> don't have method handles pre-JDK7). But the basic idea is that your
>> fast path should not call through any generalized Java code; it needs
>> to be handles all the way to a unique target, or inlining is defeated.
>>
>> With indy now showing some perf improvement over my basic code, I'm
>> going to fix that last phase and prepare a blog post on all this.
>>
>> - Charlie
> 
> 
> Charlie, thanks for the clear explanation.
> Tomorrow I'll try to refactor my code according to your suggestions.
> 

OK, I tried with the GWT solution. It is 4 times slower than my own
schema. But I'm using a mlvm build from a couple of weeks ago. According
to Christian
(http://mail.openjdk.java.net/pipermail/mlvm-dev/2009-August/001049.html),
only direct mh are inlined with older builds, which could explain the
slower behavior of GWTs with respect to my DMHs-only solution.

I'll try with a new mlvm build with the latest patches partly announced
in
http://mail.openjdk.java.net/pipermail/mlvm-dev/2009-August/001067.html
as soon as I hear that the issues mentioned in
http://mail.openjdk.java.net/pipermail/mlvm-dev/2009-August/001082.html
are fixed.

Raffaello