hg: lambda/lambda/jdk: Additional spec tweaks for lambda metafactory
Brian Goetz
brian.goetz at oracle.com
Sat May 25 08:32:33 PDT 2013
Clearly lambda capture cost is an area where there is room for
improvement. However, many of these potential improvements compete with
each other. The optimization story is not finished yet, but the next
several chapters are well outlined.
The first chapter, already written, is special handling of non-capturing
(stateless) lambdas. This is a very common case (lambdas like
Object::toString or e -> e*2) and these are translated down to constants
(the bootstrap returns a CallSite linked to a constant method handle.)
After first capture, the VM boils this down into a constant load,
driving capture cost to zero and reducing footprint through sharing.
The design of the metafactory enables the translation strategy to be a
run-time implementation detail. The V1.0 strategy, which generates a
class per lambda capture site, is designed to be no worse than inner
classes at anything, and better at some things (like capturing stateless
lambdas, or compile-time static footprint), but overall the performance
profile is comparable to inner classes.
An evolution of the translation strategy would be to generate a class
per SAM, whose constructor took a method handle and whose SAM method
invoked that method handle. This would reduce dynamic footprint and
probably some profile pollution, but until we can guarantee that the
invocation cost is no worse than the direct bytecode invocation we do
now, this will likely remain a prototype sitting on the shelf (or
perhaps a time-space tradeoff for embedded.)
Looking a little farther ahead, we plan to teach the VM about the
semantics of lambda capture. The metafactory protocol was designed for
this; we break it into two entry points, a "fast path" and an alternate
path. The fast path handles all the well-behaved cases; the alternate
path handles all the strange cases (serializability, implementing
additional marker types, implementing bridges in the lambda object
because the SAM type is an old classfile and therefore lacks the needed
bridges.) Only the fast path will get optimized; its protocol was
designed such that the semantics could be modeled entirely in terms of
method handle combinators. This enables a path where the VM
intrinsifies the capture operation, and knows that it is a pure
operation, enabling code motion optimizations. Consider:
log(DEBUG, () -> ...)
where
void log(Level level, Supplier<String> s) {
if (level >= curLevel)
log(s.get());
}
While this is better than constructing the log string only to toss it
away, we're still paying the capture cost for the lambda (which will
almost certainly be a capturing lambda), and then we'll toss that away.
But if the compiler knows the capture operation is pure, it can
transform to this:
if (level >= curLevel) {
Supplier<String> s = () -> ...)
log(s.get());
}
and then we can observe that the capture is like a "box" operation and
the invocation is like an "unbox" operation, and transform to:
if (level >= curLevel) {
log(...);
}
This is the real payoff; eliminating the "box-unbox" entirely. This
eliminates capture cost and reduces invocation cost. The same applies
to invocations like c.forEach(lambda) -- any call where the caller will
provide a lambda that the callee will either call or not call but not
store for later use.
So, here's the unfortunate fact about adding hint bits to the protocol:
adding them to the alternate metafactory entry point makes little
difference since it only applies to the hard-to-optimize cases anyway;
adding any complexity to the fast-path entry point makes the
intrinsification approach more difficult and therefore less likely.
Given that the payoff from the intrinsification approach is enormous,
doing anything to muck with that possibility seems unwise.
On 5/25/2013 5:44 AM, Peter Levart wrote:
> Hi Brian,
>
> I just want to express an observation that I had when I was playing with
> LambdaMetafactory trying various ways to cache CallSite objects or proxy
> classes generated by it. I noticed that sometimes caching is not needed
> but sometimes it would be beneficial. When I tried to capture those
> situations I found that the caching logic had to deduce whether there is
> a single capture site generated by javac per target method or there
> could be two (serialize-able lambdas) or many of them (method
> references). The only way my logic could deduce this information was
> indirectly by interpreting the target method names and looking for know
> patterns in them and by interpreting the serializability flag in context
> of current javac compilation strategy. Now the point of
> LambdaMetafactory is to decouple the javac compilation strategy from the
> implementation of capturing and lambda creation logic to make both of
> them independent from each other and be able to evolve independently (in
> particular to enable old class files be compatible with newer runtimes).
>
> In light of that, what do you think of another boolean flag for
> LambdaMetafactory (or one or two bits in existing flags argument) that
> could be interpreted as a hint from javac telling metafactory the number
> of capture sites per target method. One bit could tell there is one/more
> than one, but 2 bits could tell there is one/two/three/more than three.
> With such hint metafactory could reliably decide whether caching should
> be attempted or not. Currently this additional info would be ignored,
> but may be needed in the future.
>
> Regards, Peter
>
> On 05/24/2013 08:17 PM, brian.goetz at oracle.com wrote:
>> Changeset: 0e779ee14d4d
>> Author: briangoetz
>> Date: 2013-05-24 14:16 -0400
>> URL:http://hg.openjdk.java.net/lambda/lambda/jdk/rev/0e779ee14d4d
>>
>> Additional spec tweaks for lambda metafactory
>>
>> ! src/share/classes/java/lang/invoke/InnerClassLambdaMetafactory.java
>>
>>
>
More information about the lambda-dev
mailing list