hg: lambda/lambda/jdk: Additional spec tweaks for lambda metafactory

Sat May 25 08:32:33 PDT 2013

Clearly lambda capture cost is an area where there is room for 
improvement.  However, many of these potential improvements compete with 
each other.  The optimization story is not finished yet, but the next 
several chapters are well outlined.

The first chapter, already written, is special handling of non-capturing 
(stateless) lambdas.  This is a very common case (lambdas like 
Object::toString or e -> e*2) and these are translated down to constants 
(the bootstrap returns a CallSite linked to a constant method handle.) 
After first capture, the VM boils this down into a constant load, 
driving capture cost to zero and reducing footprint through sharing.

The design of the metafactory enables the translation strategy to be a 
run-time implementation detail.  The V1.0 strategy, which generates a 
class per lambda capture site, is designed to be no worse than inner 
classes at anything, and better at some things (like capturing stateless 
lambdas, or compile-time static footprint), but overall the performance 
profile is comparable to inner classes.

An evolution of the translation strategy would be to generate a class 
per SAM, whose constructor took a method handle and whose SAM method 
invoked that method handle.  This would reduce dynamic footprint and 
probably some profile pollution, but until we can guarantee that the 
invocation cost is no worse than the direct bytecode invocation we do 
now, this will likely remain a prototype sitting on the shelf (or 
perhaps a time-space tradeoff for embedded.)

Looking a little farther ahead, we plan to teach the VM about the 
semantics of lambda capture.  The metafactory protocol was designed for 
this; we break it into two entry points, a "fast path" and an alternate 
path.  The fast path handles all the well-behaved cases; the alternate 
path handles all the strange cases (serializability, implementing 
additional marker types, implementing bridges in the lambda object 
because the SAM type is an old classfile and therefore lacks the needed 
bridges.)  Only the fast path will get optimized; its protocol was 
designed such that the semantics could be modeled entirely in terms of 
method handle combinators.  This enables a path where the VM 
intrinsifies the capture operation, and knows that it is a pure 
operation, enabling code motion optimizations.  Consider:

   log(DEBUG, () -> ...)

where

   void log(Level level, Supplier<String> s) {
     if (level >= curLevel)
        log(s.get());
   }

While this is better than constructing the log string only to toss it 
away, we're still paying the capture cost for the lambda (which will 
almost certainly be a capturing lambda), and then we'll toss that away. 
  But if the compiler knows the capture operation is pure, it can 
transform to this:

     if (level >= curLevel) {
        Supplier<String> s = () -> ...)
        log(s.get());
     }

and then we can observe that the capture is like a "box" operation and 
the invocation is like an "unbox" operation, and transform to:

     if (level >= curLevel) {
        log(...);
     }

This is the real payoff; eliminating the "box-unbox" entirely.  This 
eliminates capture cost and reduces invocation cost.  The same applies 
to invocations like c.forEach(lambda) -- any call where the caller will 
provide a lambda that the callee will either call or not call but not 
store for later use.

So, here's the unfortunate fact about adding hint bits to the protocol: 
adding them to the alternate metafactory entry point makes little 
difference since it only applies to the hard-to-optimize cases anyway; 
adding any complexity to the fast-path entry point makes the 
intrinsification approach more difficult and therefore less likely. 
Given that the payoff from the intrinsification approach is enormous, 
doing anything to muck with that possibility seems unwise.

On 5/25/2013 5:44 AM, Peter Levart wrote:
> Hi Brian,
>
> I just want to express an observation that I had when I was playing with
> LambdaMetafactory trying various ways to cache CallSite objects or proxy
> classes generated by it. I noticed that sometimes caching is not needed
> but sometimes it would be beneficial. When I tried to capture those
> situations I found that the caching logic had to deduce whether there is
> a single capture site generated by javac per target method or there
> could be two (serialize-able lambdas) or many of them (method
> references). The only way my logic could deduce this information was
> indirectly by interpreting the target method names and looking for know
> patterns in them and by interpreting the serializability flag in context
> of current javac compilation strategy. Now the point of
> LambdaMetafactory is to decouple the javac compilation strategy from the
> implementation of capturing and lambda creation logic to make both of
> them independent from each other and be able to evolve independently (in
> particular to enable old class files be compatible with newer runtimes).
>
> In light of that, what do you think of another boolean flag for
> LambdaMetafactory (or one or two bits in existing flags argument) that
> could be interpreted as a hint from javac telling metafactory the number
> of capture sites per target method. One bit could tell there is one/more
> than one, but 2 bits could tell there is one/two/three/more than three.
> With such hint metafactory could reliably decide whether caching should
> be attempted or not. Currently this additional info would be ignored,
> but may be needed in the future.
>
> Regards, Peter
>
> On 05/24/2013 08:17 PM, brian.goetz at oracle.com wrote:
>> Changeset: 0e779ee14d4d
>> Author:    briangoetz
>> Date:      2013-05-24 14:16 -0400
>> URL:http://hg.openjdk.java.net/lambda/lambda/jdk/rev/0e779ee14d4d
>>
>> Additional spec tweaks for lambda metafactory
>>
>> ! src/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java
>>
>>
>