Serializable lambdas -- where we are, how we got here
Brian Goetz
brian.goetz at oracle.com
Fri Aug 16 10:47:52 PDT 2013
Several concerns have been recently (re)raised again about the stability
of serializable lambdas. This attempts to provide an inventory of where
we are and how we got here.
There were some who initially (wishfully) suggested that it would be
best to declare serialization a mistake and not make lambdas
serializable at all. While this was a very tempting target, ultimately
this conflicted with another decision we made: that of using nominal
function types (functional interfaces) to type lambdas.
For example, imagine:
interface SerializablePredicate<T>
extends Predicate<T>, Serializable { }
If the user does:
SerializablePredicate<String> p = s -> false;
or
SerializablePredicate<String> p = String::isEmpty;
It would violate the principle of least surprise that the resulting
objects (whose lambda-heritage should be invisible to anyone who later
touches it) to not be serializable. Hence begun our slide down the
slippery slope.
An intrinsic challenge of serialization is that, when confronted with
different class files at deserialization time than were present at
serialization time, to make a good-faith effort to figure out what to
do. For classes, the default behavior (in the absence of an explicit
serial version UID) is to consider any change to the class signatures to
invalidate existing serialized forms, but in the presence of a serial
version UID, to attempt to deal gracefully with added or removed fields.
Inherent in this is the assumption that if the *name* and *signature*
of something hasn't changed, its semantics haven't, either. If you
change the meaning of a field or a method, but not its name, you're out
of luck.
Anonymous classes are less forgiving than nominal classes, because (a)
their names are generated at compile time and may change if the source
changes "too much", and (b) their field names / constructor signature
may change based on changes in method bodies even if the class and
method signatures don't change. This problem has been with us since
1997. There are two possible failure modes that come out of this:
Type 1) An instance may fail to deserialize, due to changes that have
nothing to do with the object being serialized;
Type 2) An instance may deserialize successfully, but may be bound to
the *wrong* implementation due to bad luck.
Still, many users successfully deal with serialization and anonymous
classes by following a simple rule: have the same bits on both sides of
the wire. In reality, the situation is more forgiving than that: if you
recompile the same source with the same compiler, things still work --
and users fundamentally expect this to be the case. And the same is
true for "lightly modified" versions of the same sources (adding
comments, adding debugging statements, etc.)
Lambdas are similar to anonymous classes in some ways, and we were aware
of these failure modes at the time we first discussed serialization of
lambdas. Obviously we would have preferred to prevent these failures if
possible, but all the approaches explored were either too restrictive or
incomplete. Restrictions that were explored and rejected include:
- No serializable lambdas at all
- Only serialize static or unbound method refs
- Only serialize named, non-capturing lambdas
The various hash-the-world options that have been suggested (hash the
source or bytecode) are too weird, too brittle, too hard to specify, and
will result in users being confounded by, say, recompiling what they
perceive as identical sources with an identical compiler and still
getting runtime failures, violating (reasonable) user expectations. (It
would be almost better to generate a *random* name on every compilation,
but we're not going to do that.)
In the absence of being able to make it perfect, having exactly the same
drawbacks of an existing mechanism, which users are familiar with and
have learned to work around, was deemed better than making it imperfect
in yet a new way.
That said, if there's a possibility to reduce type-2 failures without
undermining the usability of serialization or the simplicity of the user
model, we're willing to continue to explore these (despite the extreme
lateness of the hour).
At the recent EG meeting, we specifically discussed whether it would be
worthwhile to try and address recovering from capture-order issues.
This *is* tractible (subject to the same caveats with nominal classes --
that same-name means same-meaning). But, the sense of the room then was
that this doesn't help enough, because there is still the name-induced
stability issue, and that fixing one without the other just encourages
users to think that they can make arbitrary code changes and expect
serialization stability, and makes it even more surprising when we get a
failure due to, say, adding a new lambda to a method. However, if we
felt we were likely to do named lambdas later, then this approach could
close half the problem now and we could close the other half of the
problem later.
One possibility that has not yet been discussed is to issue a lint
warning for serializable lambdas/method refs that are subject to
stability issues.
Here's where we are:
- We're not revisiting the decisions about what lambdas and method
references should be serializable. This has been reopened several times
with no change in consensus, and no new information has come to light
that would change the decision.
- "Just like inner classes" is a local maxima. Better to not ask the
user to create a new mental model than to require a new one that is just
as flawed but in different ways. However, we already make some
departures from inner class treatment, so this is a more "spirit of the
rule" thing than a "letter of the rule." If we can do *much* better,
great, but "slightly better but different" is worse.
- We might be able to revisit some translation decisions if they
result in significant improvements to stability without cost to
usability, but we are almost, if not completely, out of time.
- We're open to adding more lint warnings at compile time.
Stay tuned for a specific proposal.
More information about the lambda-libs-spec-experts
mailing list