Stability of lambda serialization

Tue Aug 6 10:04:50 PDT 2013

On 08/06/2013 11:43 AM, David M. Lloyd wrote:
> On 08/06/2013 11:14 AM, David M. Lloyd wrote:
>> On 08/06/2013 09:36 AM, Doug Lea wrote:
>>> On 08/06/13 09:17, David M. Lloyd wrote:
>>>
>>>> Me: Reordering captured variables, reordering lambda incidence.  The
>>>> EG's stance
>>>> is just a generalization.  It's not a stance in any case: things which
>>>> destabilize lambdas in terms of serializability are not a question of
>>>> opinion,
>>>> and it's bizarre to frame it that way.
>>>>
>>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>>> conservatively prohibit serialization of capturing lambdas; third
>>>>> alternative: conservatively detect problems and break at
>>>>> deserialization
>>>>
>>>> I'm OK with either "you" or "third alternative".
>>>
>>> I'm OK with 3rd alternative if some reasonably efficient
>>> checksum/serial id ensuring breakage could be devised.
>>> David, any ideas?
>>
>> It's a good idea.  I can think of a few requirements offhand:
>>
>> * Generation of the hash would necessarily occur at compile time
>> * The hash would have to be unique for each lambda within a class and/or
>> compilation unit
>> * The hash would have to be sensitive to any changes which would cause
>> any indeterminism in how the lambda is resolved - this may extend to
>> hashing even the bytecode of methods which include the lambda.  This is
>> the key/most complex concept to tackle to make this solution work.
>> * The last step is to tag the UID value on to the serialized lambda
>> representation.
>>
>> I don't think there is much more to it than this; the hardest part is
>> determining what/how to hash.  If it happens at compile time then
>> resolution at run time (i.e. the more performance-sensitive context)
>> should be the same kind of numerical comparison which is already done
>> for serialVersionUID.
>
> Brian pointed out a couple things:
>
> * Such a scheme would have to be very strongly and clearly specified
> * The scheme cannot depend on any particular non-spec compiler behavior
> (i.e. the same source file should create the same hashes regardless of
> compiler version or vendor)
>
> I suggested as a possible starting point a scheme which could create a
> 64-bit hash based on a combination of:
>
> * Any captured variables' name and declaration order
> * The declaration order of the lambda
> * Information about the enclosing method: name and signature, maybe decl
> order? (though it should be redundant wrt. the lambda decl order)
> * The usual serialVersionUID calculation
>
> I would really appreciate anyone's thoughts as to the efficacy of this
> approach and any potential weaknesses; in particular I'd like to hear if
> anyone things this is a non-trivial change in terms of compilation and
> runtime.
>
> In particular, it is not 100% clear how the calculation would work with
> nested lambdas or lambdas nested in inner classes for example.

For runtime it seems to me that this would largely consist of bundling 
the hash with the method handle information which can be passed to its 
serialized representation.  The deserialization of the lambda could then 
hopefully just verify the hash against the local method handle and throw 
an exception if it has changed.
-- 
- DML