Stability of lambda serialization

Brian Goetz brian.goetz at oracle.com
Tue Aug 6 13:23:37 PDT 2013


Can't just be the representation of the lambda; needs to fold in the enclosing context as well.  Otherwise:

foo() { 
    String x = ...;
    bar(() -> x.length());
}

to

foo() { 
    File x = ...;
    bar(() -> x.length());
}

will fool the hash.  


On Aug 6, 2013, at 12:35 PM, Doug Lea wrote:

> On 08/06/13 13:04, David M. Lloyd wrote:
>> On 08/06/2013 11:43 AM, David M. Lloyd wrote:
>>> On 08/06/2013 11:14 AM, David M. Lloyd wrote:
>>>> On 08/06/2013 09:36 AM, Doug Lea wrote:
>>>>> On 08/06/13 09:17, David M. Lloyd wrote:
>>>>> 
>>>>>> Me: Reordering captured variables, reordering lambda incidence.  The
>>>>>> EG's stance
>>>>>> is just a generalization.  It's not a stance in any case: things which
>>>>>> destabilize lambdas in terms of serializability are not a question of
>>>>>> opinion,
>>>>>> and it's bizarre to frame it that way.
>>>>>> 
>>>>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>>>>> conservatively prohibit serialization of capturing lambdas; third
>>>>>>> alternative: conservatively detect problems and break at
>>>>>>> deserialization
>>>>>> 
>>>>>> I'm OK with either "you" or "third alternative".
>>>>> 
>>>>> I'm OK with 3rd alternative if some reasonably efficient
>>>>> checksum/serial id ensuring breakage could be devised.
>>>>> David, any ideas?
>>>> 
>>>> It's a good idea.  I can think of a few requirements offhand:
>>>> 
>>>> * Generation of the hash would necessarily occur at compile time
>>>> * The hash would have to be unique for each lambda within a class and/or
>>>> compilation unit
>>>> * The hash would have to be sensitive to any changes which would cause
>>>> any indeterminism in how the lambda is resolved - this may extend to
>>>> hashing even the bytecode of methods which include the lambda.  This is
>>>> the key/most complex concept to tackle to make this solution work.
>>>> * The last step is to tag the UID value on to the serialized lambda
>>>> representation.
>>>> 
>>>> I don't think there is much more to it than this; the hardest part is
>>>> determining what/how to hash.  If it happens at compile time then
>>>> resolution at run time (i.e. the more performance-sensitive context)
>>>> should be the same kind of numerical comparison which is already done
>>>> for serialVersionUID.
>>> 
>>> Brian pointed out a couple things:
>>> 
>>> * Such a scheme would have to be very strongly and clearly specified
>>> * The scheme cannot depend on any particular non-spec compiler behavior
>>> (i.e. the same source file should create the same hashes regardless of
>>> compiler version or vendor)
>>> 
>>> I suggested as a possible starting point a scheme which could create a
>>> 64-bit hash based on a combination of:
> 
> How about just a hash of its actual string representation,
> plus its context (enclosing method etc). A little crazy but among the
> few  simple and feasible ones I can think of. It means you blow up
> if you add a space. Fine: If you are going to draw the line somewhere,
> it might as well be here.
> 
> Although at this point you wonder, why bother serializing.
> Just pass the string and invoke a compiler to parse... Probably
> not a lot slower.
> 
> -Doug
> 
> 
> 
> 
> 
>>> 
>>> * Any captured variables' name and declaration order
>>> * The declaration order of the lambda
>>> * Information about the enclosing method: name and signature, maybe decl
>>> order? (though it should be redundant wrt. the lambda decl order)
>>> * The usual serialVersionUID calculation
>>> 
>>> I would really appreciate anyone's thoughts as to the efficacy of this
>>> approach and any potential weaknesses; in particular I'd like to hear if
>>> anyone things this is a non-trivial change in terms of compilation and
>>> runtime.
>>> 
>>> In particular, it is not 100% clear how the calculation would work with
>>> nested lambdas or lambdas nested in inner classes for example.
>> 
>> For runtime it seems to me that this would largely consist of bundling the hash
>> with the method handle information which can be passed to its serialized
>> representation.  The deserialization of the lambda could then hopefully just
>> verify the hash against the local method handle and throw an exception if it has
>> changed.
> 



More information about the lambda-spec-observers mailing list