Stability of lambda serialization

Tue Aug 6 09:43:40 PDT 2013

On 08/06/2013 11:14 AM, David M. Lloyd wrote:
> On 08/06/2013 09:36 AM, Doug Lea wrote:
>> On 08/06/13 09:17, David M. Lloyd wrote:
>>
>>> Me: Reordering captured variables, reordering lambda incidence.  The
>>> EG's stance
>>> is just a generalization.  It's not a stance in any case: things which
>>> destabilize lambdas in terms of serializability are not a question of
>>> opinion,
>>> and it's bizarre to frame it that way.
>>>
>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>> conservatively prohibit serialization of capturing lambdas; third
>>>> alternative: conservatively detect problems and break at
>>>> deserialization
>>>
>>> I'm OK with either "you" or "third alternative".
>>
>> I'm OK with 3rd alternative if some reasonably efficient
>> checksum/serial id ensuring breakage could be devised.
>> David, any ideas?
>
> It's a good idea.  I can think of a few requirements offhand:
>
> * Generation of the hash would necessarily occur at compile time
> * The hash would have to be unique for each lambda within a class and/or
> compilation unit
> * The hash would have to be sensitive to any changes which would cause
> any indeterminism in how the lambda is resolved - this may extend to
> hashing even the bytecode of methods which include the lambda.  This is
> the key/most complex concept to tackle to make this solution work.
> * The last step is to tag the UID value on to the serialized lambda
> representation.
>
> I don't think there is much more to it than this; the hardest part is
> determining what/how to hash.  If it happens at compile time then
> resolution at run time (i.e. the more performance-sensitive context)
> should be the same kind of numerical comparison which is already done
> for serialVersionUID.

Brian pointed out a couple things:

* Such a scheme would have to be very strongly and clearly specified
* The scheme cannot depend on any particular non-spec compiler behavior 
(i.e. the same source file should create the same hashes regardless of 
compiler version or vendor)

I suggested as a possible starting point a scheme which could create a 
64-bit hash based on a combination of:

* Any captured variables' name and declaration order
* The declaration order of the lambda
* Information about the enclosing method: name and signature, maybe decl 
order? (though it should be redundant wrt. the lambda decl order)
* The usual serialVersionUID calculation

I would really appreciate anyone's thoughts as to the efficacy of this 
approach and any potential weaknesses; in particular I'd like to hear if 
anyone things this is a non-trivial change in terms of compilation and 
runtime.

In particular, it is not 100% clear how the calculation would work with 
nested lambdas or lambdas nested in inner classes for example.

>
>>
>> -Doug
>>
>>
>>
>>>
>>>> If we have zero tolerance for destabilization of the serialized form,
>>>> then we should either prohibit serialization of _all_ lambdas, or
>>>> somehow encode the method contents (a hash?) and detect changes at
>>>> deserialization time.  Prohibiting just capturing lambdas is a half
>>>> measure.
>>>
>>> I always advocate security over tolerance - to do otherwise invites
>>> CVEs (a
>>> possibly familiar story).  Capturing lambdas has been the focus of a
>>> couple
>>> recent emails of mine but indeed I do not think we should have any
>>> tolerance for
>>> any destabilizing of serialized lambdas.
>>>
>>> I think that calculating a non-changeable serialVersionUID might be a
>>> good way
>>> forward, if we can work out what must enter in to the calculation
>>> (perhaps it's
>>> simply the entire compilation unit which includes the lambda).
>>>
>>>> The EG agreed instead to be tolerant, acknowledging that in the
>>>> presence of a destabilizing change, sometimes everything works
>>>> perfectly well, and occasionally things will break.
>>>
>>> I'd say breakage is the least concern.  But in any case I do not
>>> agree, and
>>> unless we can come to some solution, we will codify that disagreement
>>> in a No
>>> vote, since that's what it's for, after all.
>>
>
>

-- 
- DML