Serializable lambdas -- where we are, how we got here

Fri Aug 16 13:56:16 PDT 2013

On 08/16/2013 07:47 PM, Brian Goetz wrote:
> Several concerns have been recently (re)raised again about the 
> stability of serializable lambdas.  This attempts to provide an 
> inventory of where we are and how we got here.
>
> There were some who initially (wishfully) suggested that it would be 
> best to declare serialization a mistake and not make lambdas 
> serializable at all.  While this was a very tempting target, 
> ultimately this conflicted with another decision we made: that of 
> using nominal function types (functional interfaces) to type lambdas.
>
> For example, imagine:
>
>   interface SerializablePredicate<T>
>       extends Predicate<T>, Serializable { }
>
> If the user does:
>
>   SerializablePredicate<String> p = s -> false;
> or
>   SerializablePredicate<String> p = String::isEmpty;
>
> It would violate the principle of least surprise that the resulting 
> objects (whose lambda-heritage should be invisible to anyone who later 
> touches it) to not be serializable.  Hence begun our slide down the 
> slippery slope.
>
> An intrinsic challenge of serialization is that, when confronted with 
> different class files at deserialization time than were present at 
> serialization time, to make a good-faith effort to figure out what to 
> do.  For classes, the default behavior (in the absence of an explicit 
> serial version UID) is to consider any change to the class signatures 
> to invalidate existing serialized forms, but in the presence of a 
> serial version UID, to attempt to deal gracefully with added or 
> removed fields.  Inherent in this is the assumption that if the *name* 
> and *signature* of something hasn't changed, its semantics haven't, 
> either.  If you change the meaning of a field or a method, but not its 
> name, you're out of luck.
>
> Anonymous classes are less forgiving than nominal classes, because (a) 
> their names are generated at compile time and may change if the source 
> changes "too much", and (b) their field names / constructor signature 
> may change based on changes in method bodies even if the class and 
> method signatures don't change.  This problem has been with us since 
> 1997.  There are two possible failure modes that come out of this:
>  Type 1) An instance may fail to deserialize, due to changes that have 
> nothing to do with the object being serialized;
>  Type 2) An instance may deserialize successfully, but may be bound to 
> the *wrong* implementation due to bad luck.
>
> Still, many users successfully deal with serialization and anonymous 
> classes by following a simple rule: have the same bits on both sides 
> of the wire.  In reality, the situation is more forgiving than that: 
> if you recompile the same source with the same compiler, things still 
> work -- and users fundamentally expect this to be the case.  And the 
> same is true for "lightly modified" versions of the same sources 
> (adding comments, adding debugging statements, etc.)
>
> Lambdas are similar to anonymous classes in some ways, and we were 
> aware of these failure modes at the time we first discussed 
> serialization of lambdas.  Obviously we would have preferred to 
> prevent these failures if possible, but all the approaches explored 
> were either too restrictive or incomplete.  Restrictions that were 
> explored and rejected include:
>  - No serializable lambdas at all
>  - Only serialize static or unbound method refs
>  - Only serialize named, non-capturing lambdas
>
> The various hash-the-world options that have been suggested (hash the 
> source or bytecode) are too weird, too brittle, too hard to specify, 
> and will result in users being confounded by, say, recompiling what 
> they perceive as identical sources with an identical compiler and 
> still getting runtime failures, violating (reasonable) user 
> expectations.  (It would be almost better to generate a *random* name 
> on every compilation, but we're not going to do that.)
>
> In the absence of being able to make it perfect, having exactly the 
> same drawbacks of an existing mechanism, which users are familiar with 
> and have learned to work around, was deemed better than making it 
> imperfect in yet a new way.
>
> That said, if there's a possibility to reduce type-2 failures without 
> undermining the usability of serialization or the simplicity of the 
> user model, we're willing to continue to explore these (despite the 
> extreme lateness of the hour).
>
> At the recent EG meeting, we specifically discussed whether it would 
> be worthwhile to try and address recovering from capture-order issues. 
> This *is* tractible (subject to the same caveats with nominal classes 
> -- that same-name means same-meaning).  But, the sense of the room 
> then was that this doesn't help enough, because there is still the 
> name-induced stability issue, and that fixing one without the other 
> just encourages users to think that they can make arbitrary code 
> changes and expect serialization stability, and makes it even more 
> surprising when we get a failure due to, say, adding a new lambda to a 
> method.  However, if we felt we were likely to do named lambdas later, 
> then this approach could close half the problem now and we could close 
> the other half of the problem later.
>
> One possibility that has not yet been discussed is to issue a lint 
> warning for serializable lambdas/method refs that are subject to 
> stability issues.
>
> Here's where we are:
>  - We're not revisiting the decisions about what lambdas and method 
> references should be serializable.  This has been reopened several 
> times with no change in consensus, and no new information has come to 
> light that would change the decision.
>  - "Just like inner classes" is a local maxima.  Better to not ask the 
> user to create a new mental model than to require a new one that is 
> just as flawed but in different ways.  However, we already make some 
> departures from inner class treatment, so this is a more "spirit of 
> the rule" thing than a "letter of the rule."  If we can do *much* 
> better, great, but "slightly better but different" is worse.
>  - We might be able to revisit some translation decisions if they 
> result in significant improvements to stability without cost to 
> usability, but we are almost, if not completely, out of time.
>  - We're open to adding more lint warnings at compile time.
>
>
> Stay tuned for a specific proposal.

So you want a lint warning saying serialization sucks :)
You want a warning when a lambda/method ref capture local variables, 
it's logical to have the same warning for inner class.too.
But in that case you will raise warnings in already written and valid code.
Not a good idea, IMO.

Rémi