Stability of lambda serialization

Tue Aug 6 12:35:21 PDT 2013

On 08/06/13 13:04, David M. Lloyd wrote:
> On 08/06/2013 11:43 AM, David M. Lloyd wrote:
>> On 08/06/2013 11:14 AM, David M. Lloyd wrote:
>>> On 08/06/2013 09:36 AM, Doug Lea wrote:
>>>> On 08/06/13 09:17, David M. Lloyd wrote:
>>>>
>>>>> Me: Reordering captured variables, reordering lambda incidence.  The
>>>>> EG's stance
>>>>> is just a generalization.  It's not a stance in any case: things which
>>>>> destabilize lambdas in terms of serializability are not a question of
>>>>> opinion,
>>>>> and it's bizarre to frame it that way.
>>>>>
>>>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>>>> conservatively prohibit serialization of capturing lambdas; third
>>>>>> alternative: conservatively detect problems and break at
>>>>>> deserialization
>>>>>
>>>>> I'm OK with either "you" or "third alternative".
>>>>
>>>> I'm OK with 3rd alternative if some reasonably efficient
>>>> checksum/serial id ensuring breakage could be devised.
>>>> David, any ideas?
>>>
>>> It's a good idea.  I can think of a few requirements offhand:
>>>
>>> * Generation of the hash would necessarily occur at compile time
>>> * The hash would have to be unique for each lambda within a class and/or
>>> compilation unit
>>> * The hash would have to be sensitive to any changes which would cause
>>> any indeterminism in how the lambda is resolved - this may extend to
>>> hashing even the bytecode of methods which include the lambda.  This is
>>> the key/most complex concept to tackle to make this solution work.
>>> * The last step is to tag the UID value on to the serialized lambda
>>> representation.
>>>
>>> I don't think there is much more to it than this; the hardest part is
>>> determining what/how to hash.  If it happens at compile time then
>>> resolution at run time (i.e. the more performance-sensitive context)
>>> should be the same kind of numerical comparison which is already done
>>> for serialVersionUID.
>>
>> Brian pointed out a couple things:
>>
>> * Such a scheme would have to be very strongly and clearly specified
>> * The scheme cannot depend on any particular non-spec compiler behavior
>> (i.e. the same source file should create the same hashes regardless of
>> compiler version or vendor)
>>
>> I suggested as a possible starting point a scheme which could create a
>> 64-bit hash based on a combination of:

How about just a hash of its actual string representation,
plus its context (enclosing method etc). A little crazy but among the
few  simple and feasible ones I can think of. It means you blow up
if you add a space. Fine: If you are going to draw the line somewhere,
it might as well be here.

Although at this point you wonder, why bother serializing.
Just pass the string and invoke a compiler to parse... Probably
not a lot slower.

-Doug

>>
>> * Any captured variables' name and declaration order
>> * The declaration order of the lambda
>> * Information about the enclosing method: name and signature, maybe decl
>> order? (though it should be redundant wrt. the lambda decl order)
>> * The usual serialVersionUID calculation
>>
>> I would really appreciate anyone's thoughts as to the efficacy of this
>> approach and any potential weaknesses; in particular I'd like to hear if
>> anyone things this is a non-trivial change in terms of compilation and
>> runtime.
>>
>> In particular, it is not 100% clear how the calculation would work with
>> nested lambdas or lambdas nested in inner classes for example.
>
> For runtime it seems to me that this would largely consist of bundling the hash
> with the method handle information which can be passed to its serialized
> representation.  The deserialization of the lambda could then hopefully just
> verify the hash against the local method handle and throw an exception if it has
> changed.