performance issue: 7023639: JSR 292 method handle invocation needs a fast path for compiled code

Tue Mar 1 23:12:36 PST 2011

On Mar 1, 2011, at 5:42 PM, Rémi Forax wrote:

> There is also another optimization that should be done.
> Once all the optimizations that you have listed will be done,
> the code will be as fast (or as slow but I'm optimistic by nature) as 
> using inner classes.
> 
> Let say we have the following code:
> 
> ...
>   private static <E> void bar(ArrayList<E> list, Mapper<E, E> mapper) {
>     int size = list.size();
>     for(int i=0; i<size; i++) {
>       list.set(i, mapper.apply(list.get(i)));
>     }
>   }
> 
> If lambdas are implemented using inner-classes, mapper.apply is megamorphic
> and a vtable dispatch is done.

This is an important problem for classic method inlining, for MHs, and (eventually) for bulk data processing APIs.

One module contributes a loop template (bar) while another contributes the loop kernel (mapper1 = #{...}).

But in order to get full performance, the system has to combine both together in a customized form.

In this case, Hotspot has an old optimization that can almost do this:  If (as you say further down) the loop template (bar) is inlined (as bar') at a place where only one loop kernel appears, and the loop kernel is invoked via a stable monomorphic inline cache (unique to bar'), then the JVM has enough information to recompile the loop kernel call, if the compiler is run again.  There are two missing bits to make this happen:  First, our inlining heuristics do not detect loop templates for aggressive inlining.  Second, although we are just rolling out tiered compilation (yay!) we are only beginning to leverage the advantages of tiered optimization.  (The inlining of a settled monomorphic call in bar' is an example of tiered optimization.)

> If lambda are implemented using method-handles, mapper.apply will 
> directly call
> the underlying method handle (because there is only one implementation 
> of Mapper).

That's true, but it will still be an out-of-line call.  It will be a race between classic interface dispatch and whatever indirection trickery is used inside method handles.  The real way to win the race is to speed both up by increasing opportunities for inlining.  (The invokedynamic instruction may be viewed as a hook for forcing method handles to inline!)

> Here is test is not a hot method, so bar will be not be inlined in test
> and specialized for each lambda.
> The problem is how to tell the JIT that test should be inlined.
> 
> One solution is to go backward i.e detect that mapper.apply is a method 
> handle call
> so consider that all callers of bar should be compiled even if they are 
> not hot but only warm.

Yes.  I think we can get there, now that we can use tiered compilation to collect profile data from warm programs, and then re-optimize them.

So far I haven't distinguished method handles from classic interface instances.  The techniques which optimize classic interfaces will be applied to method handles, and (IMO) both will perform well.  It may be that method handles will be slightly easier to optimize, because they contain less noise data (a nominal interface implementation type).

A key missing bit in our initial implementation of method handles is an internal classification mechanism, so that method handles of similar "function shapes" will be grouped by the JVM into common "code shapes" when inlined or dispatched on.  (Method handles have classes at present, and they are profiled, but the system doesn't exploit this very well.  First we make it work, then we make it fast.)

The JVM leans heavily on concrete instance classes to determine the classic version of "instance shape".  It uses inline caches, profiling, and other techniques to do this.  If the optimizer can prove that there is a limited set of "shapes" at a given use point (ideally one "shape" but multiple are possible too) then it can output optimized code for that shape.  (N.B. I'm using the term "shape" in an informal metaphorical way here.)  I expect that the JVM's existing mechanisms and techniques for exploiting regularities in instance classes will cross-apply to method handles, in a well-tuned system.

Meanwhile, "invokedynamic" provides a unique user-visible hook to kick-start the inlining process at a given call site.

> If someone implement that before the release of JDK 8, I will praise him 
> every night.

It has always been the case that we have more ideas than we can implement.  It's all a matter of resource allocation, for my employer, and for everybody else that works on the OpenJDK code.

-- John