Serialization stability and naming (and syntax)

Sun Sep 30 07:43:04 PDT 2012

A confounding factor with picking a serialization syntax is "does the syntax provide a place to put a name, or do we need to do something else for that?"  

The reality is that there is relatively little we can do to make serialized lambdas stable across recompilation.  The desugared method name is one aspect of instability, but it is only one.  This problem is worse than the "unstable class name" problem of inner classes, and I don't think we have the will to distort the design very much to make it all that much better.  My conclusion is that being able to specify a stable name is only slightly less brittle than not being able to, which leads me to believe that we should not value a "name" slot very highly in picking a syntax.

Example: order of capture.  Suppose we have a lambda 
  x -> (capA < capB) ? x : 0;

If this is refactored into

  x -> (capB >= capA) ? 0 : x;

as your IDE will happily do for you, you've broken serialized instances.  If the compiler picks a stable algorithm at all for assigning captured values to indexed slots, it will probably be something like "the order in which the captured vars appear in the body."  

Example: capture arity change.  This one is obvious; if you add or remove a captured variable, you change the arity of the desugared lambda body.

Example: captured type change.  If we have:

Collection c = ...
x -> c.contains(x);

and change the type of the captured variable outside the lambda:

List c = ...
x -> c.contains(x);

this will change the signature of the desguared lambda body.

Example: compiler optimization

Given:

class X { 
    public final int x = 3;

   ... 
   y -> (y > x) ? y : 0;
}

Ordinarily we translate this by capturing 'this' so we can refer to this.x; a smarter compiler could realize that we can constant-fold x into the lambda and make it noncapturing.  Again, breaking serialized instances.

The bottom line is that there are so many ways that we can break eisting serialized instances -- even more than with inner classes.  The target profile we set at the outset -- serialization works when you have bytecode-identical classes on both sides -- is really the only thing that is guaranteed to work.  

Eliminate one source of instability -- the name -- yields only a minor improvement.  If we care about name stability, perhaps we should spend the effort on a better naming scheme than lambda$nnn, one that is less sensitive to small changes into the source code.