Serializable lambdas -- where we are, how we got here

Fri Aug 16 10:47:52 PDT 2013

Several concerns have been recently (re)raised again about the stability 
of serializable lambdas.  This attempts to provide an inventory of where 
we are and how we got here.

There were some who initially (wishfully) suggested that it would be 
best to declare serialization a mistake and not make lambdas 
serializable at all.  While this was a very tempting target, ultimately 
this conflicted with another decision we made: that of using nominal 
function types (functional interfaces) to type lambdas.

For example, imagine:

   interface SerializablePredicate<T>
       extends Predicate<T>, Serializable { }

If the user does:

   SerializablePredicate<String> p = s -> false;
or
   SerializablePredicate<String> p = String::isEmpty;

It would violate the principle of least surprise that the resulting 
objects (whose lambda-heritage should be invisible to anyone who later 
touches it) to not be serializable.  Hence begun our slide down the 
slippery slope.

An intrinsic challenge of serialization is that, when confronted with 
different class files at deserialization time than were present at 
serialization time, to make a good-faith effort to figure out what to 
do.  For classes, the default behavior (in the absence of an explicit 
serial version UID) is to consider any change to the class signatures to 
invalidate existing serialized forms, but in the presence of a serial 
version UID, to attempt to deal gracefully with added or removed fields. 
  Inherent in this is the assumption that if the *name* and *signature* 
of something hasn't changed, its semantics haven't, either.  If you 
change the meaning of a field or a method, but not its name, you're out 
of luck.

Anonymous classes are less forgiving than nominal classes, because (a) 
their names are generated at compile time and may change if the source 
changes "too much", and (b) their field names / constructor signature 
may change based on changes in method bodies even if the class and 
method signatures don't change.  This problem has been with us since 
1997.  There are two possible failure modes that come out of this:
  Type 1) An instance may fail to deserialize, due to changes that have 
nothing to do with the object being serialized;
  Type 2) An instance may deserialize successfully, but may be bound to 
the *wrong* implementation due to bad luck.

Still, many users successfully deal with serialization and anonymous 
classes by following a simple rule: have the same bits on both sides of 
the wire.  In reality, the situation is more forgiving than that: if you 
recompile the same source with the same compiler, things still work -- 
and users fundamentally expect this to be the case.  And the same is 
true for "lightly modified" versions of the same sources (adding 
comments, adding debugging statements, etc.)

Lambdas are similar to anonymous classes in some ways, and we were aware 
of these failure modes at the time we first discussed serialization of 
lambdas.  Obviously we would have preferred to prevent these failures if 
possible, but all the approaches explored were either too restrictive or 
incomplete.  Restrictions that were explored and rejected include:
  - No serializable lambdas at all
  - Only serialize static or unbound method refs
  - Only serialize named, non-capturing lambdas

The various hash-the-world options that have been suggested (hash the 
source or bytecode) are too weird, too brittle, too hard to specify, and 
will result in users being confounded by, say, recompiling what they 
perceive as identical sources with an identical compiler and still 
getting runtime failures, violating (reasonable) user expectations.  (It 
would be almost better to generate a *random* name on every compilation, 
but we're not going to do that.)

In the absence of being able to make it perfect, having exactly the same 
drawbacks of an existing mechanism, which users are familiar with and 
have learned to work around, was deemed better than making it imperfect 
in yet a new way.

That said, if there's a possibility to reduce type-2 failures without 
undermining the usability of serialization or the simplicity of the user 
model, we're willing to continue to explore these (despite the extreme 
lateness of the hour).

At the recent EG meeting, we specifically discussed whether it would be 
worthwhile to try and address recovering from capture-order issues. 
This *is* tractible (subject to the same caveats with nominal classes -- 
that same-name means same-meaning).  But, the sense of the room then was 
that this doesn't help enough, because there is still the name-induced 
stability issue, and that fixing one without the other just encourages 
users to think that they can make arbitrary code changes and expect 
serialization stability, and makes it even more surprising when we get a 
failure due to, say, adding a new lambda to a method.  However, if we 
felt we were likely to do named lambdas later, then this approach could 
close half the problem now and we could close the other half of the 
problem later.

One possibility that has not yet been discussed is to issue a lint 
warning for serializable lambdas/method refs that are subject to 
stability issues.

Here's where we are:
  - We're not revisiting the decisions about what lambdas and method 
references should be serializable.  This has been reopened several times 
with no change in consensus, and no new information has come to light 
that would change the decision.
  - "Just like inner classes" is a local maxima.  Better to not ask the 
user to create a new mental model than to require a new one that is just 
as flawed but in different ways.  However, we already make some 
departures from inner class treatment, so this is a more "spirit of the 
rule" thing than a "letter of the rule."  If we can do *much* better, 
great, but "slightly better but different" is worse.
  - We might be able to revisit some translation decisions if they 
result in significant improvements to stability without cost to 
usability, but we are almost, if not completely, out of time.
  - We're open to adding more lint warnings at compile time.

Stay tuned for a specific proposal.