[8] Review request for 8008770: SerializedLambda incorrect class loader for lambda deserializing class

Peter Levart peter.levart at gmail.com
Mon Mar 11 11:07:33 PDT 2013


On 03/10/2013 11:35 PM, Brian Goetz wrote:
> There's definitely room for improving the metafactory.  For Java SE 8, 
> we chose to use a very simple implementation: spin a new class per 
> call site.  This minimizes the overhead and memory leaks and other 
> complexities that can come out of caching, and has one very good 
> property: after the initial linkage, capture of stateless lambdas turn 
> into constant loads.
>
> There are lots of ways it can be improved, and you have hit on two of 
> them:
> 1.  Duplicate class generation for serializable lambdas;
> 2.  Duplicate class generation for identical (method ref, SAM) pairs 
> across classes.
>
> For the former, I think there is a better strategy, that takes 
> advantage of the following observation: there are *exactly two* 
> capture sites for each serializable lambda -- the real capture site 
> and the one in the $deserializeLambda$ method.  By having the compiler 
> "outline" the capture site into a method, that is called by both the 
> original capture site and the deserialization code, then there will be 
> exactly one call site, obviating the need for a cache.

Ok, the code generated with "outlining" javac would not produce 
duplicate classes, while code compiled with "non-outlining" javac would. 
I mean, if something like that is introduced after the JDK8 release, to 
take advantage of it, classes would have to be recompiled with newer 
javac. That's not something that was meant to be "fixed" just in runtime 
library, right? Different javacs (or 3rd party Java compilers) could 
produce code with different performance characteristics...

>
> The question is whether the memory usage overhead of the cache and key 
> objects makes up for the decreased class generation.  Have you done 
> any measurements that would help determine what the breakeven cache 
> hit rate is?

Not at first, but now that I did, I saw that it was a huge overhead. I 
tried to minimize this overhead on 3 fronts:

- instead of keeping references to MethodhandleInfo and MethodType 
objects in the cache keys, which also retained unneeded deeply-nested 
private structures in those objects, I rather extracted symbolic names 
(Class and method names) from those objects and dumped them into a 
single Sintrng[] array (using nulls as delimiters between variable-sized 
sub-sequences). As it happens, all those Strings are already interned, 
so the overhead is just the length of the array that way.

- instead of referencing CallSite objects as the values in cache, which 
also retain unneeded deeply-nested private structures, I rather cached 
just j.l.Class objects representing proxy classes and re-compute 
CallSite each time called.

- instead of using ClassValue as a means to maintain per-class caches, I 
experimented with a plain ConcurrentHashMap referenced directly from the 
j.l.Class object (with new field added to j.l.Class). Such 
platform-internal "Class.privateMap" could be used also for other 
purposes inside JDK since it's about 4x more space-efficient than 
ClassValue approach (there are a lot of places in JDK that maintain 
per-class associations using WeakHashMap<Class, ...> wrapped into a 
synchronized Map, which is not very scalable - scalability issues for 
Proxy.isProxyClass and Proxy.getProxyClass and others have already been 
reported some time ago)...

With all 3 optimizations combined, I managed to so squeeze the overhead 
to 232 bytes for 1st cached class and additional 120 bytes for each 
subsequent cached class (per target capturing class) with occasional 
"jumps" as CHM is resized...

Comparing that with 504 bytes which is the size of j.l.Class object 
(admittedly this also measures non-java fields, but excludes deep 
non-java structures) of a freshly loaded class, we can get a feeling of 
the effectivity of caching. This is only Java-side of things. I don't 
know what's the additional overhead of each class in the JVM though. But 
I can imagine it's not zero.

Additional area I'm still experimenting with is the caching strategy:
- caching is not needed for non-serializable proxy classes that call 
synthetic methods generated by javac for lambda bodies. Those are 
already guaranteed to contain only one capture site per target method.
- caching of proxy classes for method references could be performed on 
the class that hosts the target method not on the capturing class. This 
could be much more efficient (think of frequently used shortcuts like 
Math::min, etc...). If all types appearing in LambdaMetaFactory 
arguments are also visible from the ClassLoader of the target method's 
declaring class, this could be done, otherwise caching can fall-back to 
the capturing class.

I will folow-up with my findings (and some code ;-).

> A more general way to reduce the number of generated classes was 
> outlined in the translation document: instead of generating one 
> wrapper class per lambda capture site, generate one wrapper class per 
> SAM type, whose ctor takes a method handle and whose SAM methods 
> invoke it.  We plan to do this in the future.

But currently, non-constant method handles don't yet perform as fast as 
direct calls, right?

Regards, Peter


>
>
>
> On 3/10/2013 6:20 PM, Peter Levart wrote:
>> Hi,
>>
>> I experimented with caching of CallSites by the LambdaMetafactory.
>> Here's what I came up with:
>>
>> http://dl.dropbox.com/u/101777488/jdk8-lambda/InnerClassLambdaMetafactory/webrev.01/index.html 
>>
>>
>> This implementation is very simple, but effective. Besides serializable
>> lambda being always deserialized into same class, the presented caching
>> also optimizes method references: each distinct (method reference, SAM
>> target type) pair uses a single proxy class per capturing class. Without
>> caching each method reference expression uses a distinct proxy class.
>>
>> What do you think? Am I correct in asserting that all types appearing in
>> LambdaMetafactory arguments are always visible from the capturing class
>> and therefore there is no problem referencing the j.l.Class objects of
>> those types from the capturing class' j.l.Class object as far as class
>> memory leaks are concerned? If I'm mistaken, the caching key could be
>> modified to retain only symbolic names of types instead of j.l.Class
>> objects and still be unique, since no two distinct types with same
>> symbolic name can be visible from the same class.
>>
>> Regards, Peter
>>
>> On 03/05/2013 12:03 AM, Peter Levart wrote:
>>> If there is a 1-to-1 mapping between a lambda expression and a
>>> synthetic implementation method generated by javac and further if
>>> there could be a 1-to-1 mapping between the synthetic implementation
>>> method and the SAM proxy class name (calculated in advance) then the
>>> class-loader of the capturing class itself could be used as a cache to
>>> look-up (findLoadedClass) if the proxy class has already been defined
>>> or if it has to be defined atomically. Once we have a single Class
>>> object per lambda expression/implementation method, ClassValue can be
>>> used to cache the CallSite object on it.
>>>
>>> I experimented with the following caching scheme which returns a singe
>>> CallSite instance for both capturing sites (lambda expression and
>>> $deserializeLambda$):
>>>
>>> public static CallSite altMetaFactory(MethodHandles.Lookup caller,
>>>                                           String invokedName,
>>>                                           MethodType invokedType,
>>>                                           Object... args)
>>>             throws ReflectiveOperationException,
>>> LambdaConversionException {
>>>         MethodHandle samMethod = (MethodHandle)args[0];
>>>         MethodHandle implMethod = (MethodHandle)args[1];
>>>         MethodType instantiatedMethodType = (MethodType)args[2];
>>>         int flags = (Integer) args[3];
>>>         Class[] markerInterfaces;
>>>         int argIndex = 4;
>>>         if ((flags & FLAG_MARKERS) != 0) {
>>>             int markerCount = (Integer) args[argIndex++];
>>>             markerInterfaces = new Class[markerCount];
>>>             System.arraycopy(args, argIndex, markerInterfaces, 0,
>>> markerCount);
>>>             argIndex += markerCount;
>>>         }
>>>         else
>>>             markerInterfaces = EMPTY_CLASS_ARRAY;
>>>         AbstractValidatingLambdaMetafactory mf;
>>>         mf = new InnerClassLambdaMetafactory(caller, invokedType,
>>> samMethod, implMethod, instantiatedMethodType,
>>>                                              flags, markerInterfaces);
>>>         mf.validateMetafactoryArgs();
>>>
>>>         MethodHandleInfo implInfo = new MethodHandleInfo(implMethod);
>>>         ConcurrentMap<String, CallSite> callSites =
>>> CALL_SITES_CV.get(implInfo.getDeclaringClass());
>>>         CallSite callSite = callSites.get(implInfo.getName());
>>>         if (callSite == null) {
>>>             callSite = mf.buildCallSite();
>>>             CallSite oldCallSite =
>>> callSites.putIfAbsent(implInfo.getName(), callSite);
>>>             if (oldCallSite != null)
>>>                 callSite = oldCallSite;
>>>         }
>>>
>>>         return callSite;
>>>     }
>>>
>>>     // each implementation method's declaring class has a Map of
>>> CallSites keyed by implementation method name
>>>
>>>     private static final ClassValue<ConcurrentMap<String, CallSite>>
>>> CALL_SITES_CV =
>>>         new ClassValue<ConcurrentMap<String, CallSite>>() {
>>>             @Override
>>>             protected ConcurrentMap<String, CallSite>
>>> computeValue(Class<?> type) {
>>>                 return new ConcurrentHashMap<>();
>>>             }
>>>         };
>>>
>>>
>>> ... and it does it's job for the price of ClassValue and
>>> ConcurrentHashMap overhead even if only one call to metafactory is
>>> ever performed for each lambda expression.
>>>
>>> Using ClassLoader as a cache might be an opportunity to do it without
>>> overhead.
>>>
>>>
>>> Regards, Peter
>>>
>>>
>>>
>>> On 03/04/2013 08:04 PM, Brian Goetz wrote:
>>>> Its a good problem to kick down the road.  When/if we switch to a
>>>> class-per-SAM instead of class-per-callsite implementation of lambda
>>>> conversion, the issue goes away.
>>>>
>>>> On 3/4/2013 1:38 PM, Remi Forax wrote:
>>>>> On 03/04/2013 03:26 PM, Brian Goetz wrote:
>>>>>> You are correct.  This is less than ideal, but allowable, and we're
>>>>>> treating this as a quality-of-implementation issue.  The solution 
>>>>>> would
>>>>>> be to "outline" both capture sites into a private method and replace
>>>>>> them both with a call to that method; then they would share the
>>>>>> invokedynamic call site.  It's on the list of "possible future
>>>>>> optimizations."
>>>>> One solution is, if the lambda is serializable, to emulate 
>>>>> invokedynamic
>>>>> using constant method handles stored in static final fields, thus
>>>>> shareable .
>>>>> But this make the translation scheme for javac ugly.
>>>>>
>>>>> Another solution is to have an API to query already existing CallSite
>>>>> objects,
>>>>> so the invokedynamic in $deserializeLambda$ will effectively share 
>>>>> the
>>>>> same lambda proxy.
>>>>> It's a good question for the JSR 292 EG :)
>>>>>
>>>>> Rémi
>>>>>
>>>>>> On 3/4/2013 3:18 AM, Peter Levart wrote:
>>>>>>> Hi Robert,
>>>>>>>
>>>>>>> I noticed that when the same VM is used both for evaluating a 
>>>>>>> lambda
>>>>>>> expression (producing a SAM instance) and for de-serializing a
>>>>>>> previously serialized SAM instance representing the same lambda
>>>>>>> expression, two distinct SAM proxy classes are generated (and
>>>>>>> consequently, even non-capturing lambdas come out as two distinct
>>>>>>> instances). I believe this is because there are two places where
>>>>>>> LambdaMetafactory is called - the "indy" call generated by javac 
>>>>>>> in the
>>>>>>> place of a lambda expression and one in the 
>>>>>>> "$deserializeLambda$" method
>>>>>>> of the capturing class - and each call produces a distinct lambda
>>>>>>> factory with a distinct generated proxy class.
>>>>>>>
>>>>>>> Do you feel that this situation is rare and/or that maintaining 
>>>>>>> a cache
>>>>>>> of "CallSite" objects per "altMetaFactory" request parameters would
>>>>>>> actually present a greater overhead than generating two classes 
>>>>>>> in such
>>>>>>> scenarios?
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>> On 02/25/2013 06:24 PM, Robert Field wrote:
>>>>>>>> Please review the fixes for CRs:
>>>>>>>>
>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8008770
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~rfield/8008770_2/
>>>>>>>>
>>>>>>>> Thank,
>>>>>>>> Robert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>



More information about the lambda-dev mailing list