Stability of lambda serialization
Doug Lea
dl at cs.oswego.edu
Tue Aug 6 12:35:21 PDT 2013
On 08/06/13 13:04, David M. Lloyd wrote:
> On 08/06/2013 11:43 AM, David M. Lloyd wrote:
>> On 08/06/2013 11:14 AM, David M. Lloyd wrote:
>>> On 08/06/2013 09:36 AM, Doug Lea wrote:
>>>> On 08/06/13 09:17, David M. Lloyd wrote:
>>>>
>>>>> Me: Reordering captured variables, reordering lambda incidence. The
>>>>> EG's stance
>>>>> is just a generalization. It's not a stance in any case: things which
>>>>> destabilize lambdas in terms of serializability are not a question of
>>>>> opinion,
>>>>> and it's bizarre to frame it that way.
>>>>>
>>>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>>>> conservatively prohibit serialization of capturing lambdas; third
>>>>>> alternative: conservatively detect problems and break at
>>>>>> deserialization
>>>>>
>>>>> I'm OK with either "you" or "third alternative".
>>>>
>>>> I'm OK with 3rd alternative if some reasonably efficient
>>>> checksum/serial id ensuring breakage could be devised.
>>>> David, any ideas?
>>>
>>> It's a good idea. I can think of a few requirements offhand:
>>>
>>> * Generation of the hash would necessarily occur at compile time
>>> * The hash would have to be unique for each lambda within a class and/or
>>> compilation unit
>>> * The hash would have to be sensitive to any changes which would cause
>>> any indeterminism in how the lambda is resolved - this may extend to
>>> hashing even the bytecode of methods which include the lambda. This is
>>> the key/most complex concept to tackle to make this solution work.
>>> * The last step is to tag the UID value on to the serialized lambda
>>> representation.
>>>
>>> I don't think there is much more to it than this; the hardest part is
>>> determining what/how to hash. If it happens at compile time then
>>> resolution at run time (i.e. the more performance-sensitive context)
>>> should be the same kind of numerical comparison which is already done
>>> for serialVersionUID.
>>
>> Brian pointed out a couple things:
>>
>> * Such a scheme would have to be very strongly and clearly specified
>> * The scheme cannot depend on any particular non-spec compiler behavior
>> (i.e. the same source file should create the same hashes regardless of
>> compiler version or vendor)
>>
>> I suggested as a possible starting point a scheme which could create a
>> 64-bit hash based on a combination of:
How about just a hash of its actual string representation,
plus its context (enclosing method etc). A little crazy but among the
few simple and feasible ones I can think of. It means you blow up
if you add a space. Fine: If you are going to draw the line somewhere,
it might as well be here.
Although at this point you wonder, why bother serializing.
Just pass the string and invoke a compiler to parse... Probably
not a lot slower.
-Doug
>>
>> * Any captured variables' name and declaration order
>> * The declaration order of the lambda
>> * Information about the enclosing method: name and signature, maybe decl
>> order? (though it should be redundant wrt. the lambda decl order)
>> * The usual serialVersionUID calculation
>>
>> I would really appreciate anyone's thoughts as to the efficacy of this
>> approach and any potential weaknesses; in particular I'd like to hear if
>> anyone things this is a non-trivial change in terms of compilation and
>> runtime.
>>
>> In particular, it is not 100% clear how the calculation would work with
>> nested lambdas or lambdas nested in inner classes for example.
>
> For runtime it seems to me that this would largely consist of bundling the hash
> with the method handle information which can be passed to its serialized
> representation. The deserialization of the lambda could then hopefully just
> verify the hash against the local method handle and throw an exception if it has
> changed.
More information about the lambda-spec-experts
mailing list