Stability of lambda serialization

David M. Lloyd david.lloyd at redhat.com
Tue Aug 6 13:06:16 PDT 2013


On 08/06/2013 02:35 PM, Doug Lea wrote:
> On 08/06/13 13:04, David M. Lloyd wrote:
>> On 08/06/2013 11:43 AM, David M. Lloyd wrote:
>>> On 08/06/2013 11:14 AM, David M. Lloyd wrote:
>>>> On 08/06/2013 09:36 AM, Doug Lea wrote:
>>>>> On 08/06/13 09:17, David M. Lloyd wrote:
>>>>>
>>>>>> Me: Reordering captured variables, reordering lambda incidence.  The
>>>>>> EG's stance
>>>>>> is just a generalization.  It's not a stance in any case: things
>>>>>> which
>>>>>> destabilize lambdas in terms of serializability are not a question of
>>>>>> opinion,
>>>>>> and it's bizarre to frame it that way.
>>>>>>
>>>>>>> What to do? EG: make a best effort, with documented caveats; you:
>>>>>>> conservatively prohibit serialization of capturing lambdas; third
>>>>>>> alternative: conservatively detect problems and break at
>>>>>>> deserialization
>>>>>>
>>>>>> I'm OK with either "you" or "third alternative".
>>>>>
>>>>> I'm OK with 3rd alternative if some reasonably efficient
>>>>> checksum/serial id ensuring breakage could be devised.
>>>>> David, any ideas?
>>>>
>>>> It's a good idea.  I can think of a few requirements offhand:
>>>>
>>>> * Generation of the hash would necessarily occur at compile time
>>>> * The hash would have to be unique for each lambda within a class
>>>> and/or
>>>> compilation unit
>>>> * The hash would have to be sensitive to any changes which would cause
>>>> any indeterminism in how the lambda is resolved - this may extend to
>>>> hashing even the bytecode of methods which include the lambda.  This is
>>>> the key/most complex concept to tackle to make this solution work.
>>>> * The last step is to tag the UID value on to the serialized lambda
>>>> representation.
>>>>
>>>> I don't think there is much more to it than this; the hardest part is
>>>> determining what/how to hash.  If it happens at compile time then
>>>> resolution at run time (i.e. the more performance-sensitive context)
>>>> should be the same kind of numerical comparison which is already done
>>>> for serialVersionUID.
>>>
>>> Brian pointed out a couple things:
>>>
>>> * Such a scheme would have to be very strongly and clearly specified
>>> * The scheme cannot depend on any particular non-spec compiler behavior
>>> (i.e. the same source file should create the same hashes regardless of
>>> compiler version or vendor)
>>>
>>> I suggested as a possible starting point a scheme which could create a
>>> 64-bit hash based on a combination of:
>
> How about just a hash of its actual string representation,
> plus its context (enclosing method etc). A little crazy but among the
> few  simple and feasible ones I can think of. It means you blow up
> if you add a space. Fine: If you are going to draw the line somewhere,
> it might as well be here.

That actually seems like a pretty reasonable and simple approach (though 
I'd say "original byte representation" given that transcoding might be 
lossy and things might get weird as a result).  Add in to the context 
the declaration order sequence number and captured var names in order.

> Although at this point you wonder, why bother serializing.
> Just pass the string and invoke a compiler to parse... Probably
> not a lot slower.

Well bear in mind that we're talking about a simple numerical comparison 
- all the hashing would (should) be done at compile time, not at run 
time.  Compiling on deserialize (while a nifty/intriguing idea, all 
practical concerns aside) will definitely be much slower.

-- 
- DML


More information about the lambda-spec-observers mailing list