Implementing Towards Better Serialization

Sun Aug 9 15:06:58 UTC 2020

> I should be careful about my "shoulds" and "coulds". :) To be clearer, given a class with private final fields, ctor(s) and accessors, I need a way to determine which elements of that class make up the serialised form. An annotated ctor and then matching that to accessors is one way that could work. I'm open to other suggestions. There's also (what I think is a fair) an underlying assumption that serialization here uses the same N elements for serialization/deserialization. 

If those N elements in the public API completely describe the representation of the object, then yes.  This is not something we can infer from outside; this is something the programmer has to tell us.  It will often be the case, and if so, great.  In the cases where it is not, or where the logical at-rest representation is different from that of the logical public API, the programmer will have to tell us something else.  Your’e saying “I’m willing to let the something else be part of a next phase”, which is fine, as long as we’ve got our eye on that ball.  

> Do you have a link to explain what you mean by an embedded-projection pair? 

Most definitions come attached to a ball of domain theory or category theory.  Let me take a whack at it without all that.

Imagine two domains, B and S,  for “big” and “small”.  The idea is that there is a subset of B that looks and behaves like S, so we are going to *embed* S in B.  We have two functions:

    e : S -> B
    p:  B -> S

We’ll start by saying that P is partial on B; for some values, it throws (this can be refined).  Let’s let B’ be the subset of B where it does not throw, and pg be the restriction of p  to B’ (the “good” part of p).  What we want is for the composition e-then-p to behave like an identity, and also for pg-then-e.  (Waving hands about whether I mean == or .equals or some sort of observational equivalence.)  

An example may help.  Let S=int and B=Integer.  We  can  embed  S in B  by boxing;  we  can project B to S by unboxing, for all of B  except null.   (In domains like this where things like + are defined on both, e and p are homomorphic.)  Similarly, there are e-p pairs from int to double (though this requires more mechanics since casting from  3.14 to int doesn’t throw, it truncates, but the notion can be extended by defining B’ to be the image of e(S), and defining a partial ordering for approximation, where throwing (bottom) is the worst possible approximation), and similar with widening between int and long.  

Here,  the e-p pairs we want are between an object’s representation and a tuple that carries its external state; I can construct one between Point and tuples (x, y), or between List<T> and a tuple (T[]).  If the constructor checks invariants (Rational will reject denom=0), that’s where the B’ subset comes in.  

The point is, this relationship comes up all the time, when we want to describe one domain by a “richer” domain.  The set of (n,d) tuples is richer than the set of rationals, but if you restrict to the set of tuples where d != 0, the two are isomorphic.  The set of doubles is richer than the set of ints, but you only have to give a double one push, and then you’re in a domain that behaves isomorphically.  

> Yes, the following classes can have their serialized form derived very easily using the "front door" apis.

Where, when you cay “can be derived”, you mean “with a little help from the author.”  The author can do the deriving, and tell the compiler.  

>  
> This was why I was leaning, in my draft, towards using a matched ctor/dtor pair (or factory/dtor pair).  From a practical perspective, they can be matched solely by the "shape" of their argument lists, and it wouldn't matter what the names are.  (It's also nice because they can be assumed to form an _embedding-projection pair_, which has nice algebraic properties.)  But, there's other ways to get there too -- I'm just not sure that, without help, we can do it with just ctor + accessors.
> 
> I'd be concerned that relying on the "shape" of the argument list is also quite brittle. The cut and paste typo where you've transposed similarly named parameters would be very difficult to find. Do you have a link to explain what you mean by an embedded-projection pair? 

There is some brittleness, but I think this is counterweighted by the fact that the design of the deconstruction feature is meant to mirror that of constructors, and I expect that “matching ctor/dtor pairs” will be a common programming pattern (because you can derive so many good properties, not just serialization, from them.)  So while “methods with names like getXxx” are a noisy space to draw from and try to match up with constructors, dtors are the natural dual of constructors and so I expect a lot less “drift”.  

> Would you consider an annotation like @NamedParameters that could be applied to specific ctors or methods to force the parameter names to be compiled into the class? This could be used for a range of purposes. Could it be possible that the @Serializer annotation could be in some way derived from the @NamedParameters annotation.

Never say never, but we rejected this approach when we added the -parameters support to Javac, for the same reason I say “dangerously close”; annotations are not for language semantics.  You could argue (and people will) that this is just a translation hint, but this argument will fall apart as soon as we want to start building features on “if the constructor has parameter names, then you can use the names in X way”, which we will surely want to do.  So I think annotations are not the droids you are looking for; if we want this as a language feature, it needs to be a language feature.  (That said, I brought it up because this  has come up in the context of a  handful of other language features,  so it is not out of the question that it migh eventually be so.)  

> 
> 
> As I mentioned previously, I've hit a bit of a road-block with my design as I'll need a fallback solution for users on Java 11 (current target version). An annotation on the ctor or its parameters is the likely solution:

Yeah,  that’s ugly but doable.  Your users won’t like you.  

>   public class Point {
>      private final int x;
>      private final int y;
> 
>      @Deserializer( fields = {"x", "y" } ) 
>      public Point(  int x,  int y ) { this.x = x; this.y = y; }
> 
>      public int x() { return x; }
>      public int y() { return y; }
>   }
> 
> The second option is winning for me at the moment as it looks a little cleaner, even though there's more chance for errors.

You gotta do what you gotta do :)  

> 
> For instance:
> 
>  public class Location {
>     private final float latitude;
>     private final float longitude;
> 
>     @Serialize( version = 1 )
>     public record LocationVer1( float latitude, float longitude ) {}
> 
>     @Serialize( version = 2 )
>     public record LocationVer2( byte latDegrees, byte latMinutes, byte latSeconds, byte lonDegrees, byte lonMinutes, byte lonSeconds, byte direction ) {}
> 
>      ... ctors and accessors for both serializations ...
>  }

Right, this is one way to stack this.