Implementing Towards Better Serialization

Sun Aug 9 03:40:53 UTC 2020

Hi Brian,

Thanks for the quick and detailed response. Your response narrows in on the
first issue I raised, which is my main stumbling block at the moment. I'll
do the same with my responses.

On Sun, Aug 9, 2020 at 1:29 AM Brian Goetz <brian.goetz at oracle.com> wrote:

> Thanks for getting involved!
>
> How this is exposed in the programming model is indeed an open issue (in
> some sense, it's "syntax"), but of course not all schemes are created
> equally.  Let's put out a few requirements:
>
>  - If a class has an externally accessible
> construction/deconstruction/access protocol, it should be easy to just say
> "use that for serialization too" (records do this already.)
>  - A class should be able to expose a serialization protocol without
> exposing that as part of the public API.
>

Yes, this is really following what your proposal outlined.  The starting
point is that using the internal fields is the wrong way to derive the
serialised form. Given that, it would make sense that if everything
required for the serialized form is available in the "front door" API, then
we should use it. No need for a special ctor/dtor for these types of
classes. I'm not too concerned with the second point right now, but yes, it
should be a choice of if the serialisation is part of the exposed interface.

>
> You are saying, in effect, "many classes have a N-ary constructor and
> accessors for all N components, so by requirement #1 above, I want to just
> infer a serialization protocol from those."  This makes perfect sense,
> until you get prescriptive ("should be the place").  But let's restate that
> as "could be a place", and I'm with you.
>

I should be careful about my "shoulds" and "coulds". :) To be clearer,
given a class with private final fields, ctor(s) and accessors, I need a
way to determine which elements of that class make up the serialised form.
An annotated ctor and then matching that to accessors is one way that could
work. I'm open to other suggestions. There's also (what I think is a fair)
an underlying assumption that serialization here uses the same N elements
for serialization/deserialization.

> Now, we've got an (accidental) problem; how to match the constructor with
> the accessors.  If you tag the ctor and the N accessors, the serialization
> library has to match them up, and your only move here is "name and type".
> With record-style accessor names, you get the name from the accessor (with
> JavaBean-style names, you have to mangle them, but you can get there), but
> ... names of constructor arguments are not, by default, reified in the
> classfile.  (sad trombone sound.)  And the types don't help much, since I
> could have seven `int` parameters (and often do.)  And the order of
> declaration doesn't help at all, because there's no guarantee reflection
> returns them in the order they are declared (and even if it did, that would
> be unacceptably brittle.)
>

Yes, the following classes can have their serialized form derived very
easily using the "front door" apis.

  public class Point {
    public int x;
    public int y;
  }

   public class Point {
     private int x;
     private int y;
     public int getX() { return x; }
     public int getY() { return y; }
     public void setX(int x) { this.x = x; }
     public void setY(int y) { this.y = y; }
  }

  public record Point( int x, int y ) { }

  // terrible example, but just to show separation of serialized form.
  public class Point {
      private int x;
      private int y;
      public record PointData( int x, int y ) {}
      public Point( PointData data ) { this.x = data.x(); this.y = data.y()
}
      public PointData data() { return new PointData( x , y ); }
      ...
  }

The problem layout is also one of the most common (especially Java <14).
Even for this, the serialization is ok, it is just the constructor.

  public class Point {
     private final int x;
     private final int y;
     public Point( int x, int y ) { this.x = x; this.y = y; }
     public int x() { return x; }
     public int y() { return y; }
  }

> This was why I was leaning, in my draft, towards using a matched ctor/dtor
> pair (or factory/dtor pair).  From a practical perspective, they can be
> matched solely by the "shape" of their argument lists, and it wouldn't
> matter what the names are.  (It's also nice because they can be assumed to
> form an _embedding-projection pair_, which has nice algebraic properties.)
> But, there's other ways to get there too -- I'm just not sure that, without
> help, we can do it with just ctor + accessors.
>

I'd be concerned that relying on the "shape" of the argument list is also
quite brittle. The cut and paste typo where you've transposed similarly
named parameters would be very difficult to find. Do you have a link to
explain what you mean by an embedded-projection pair?

> You might say "if the ctor has the @Serializer anno, then reify the
> parameter names as if the class were compiled with javac -parameters."
> This comes dangerously close to trying to introduce language semantics via
> annotations.  And it means that alpha-renaming a parameter name would no
> longer be a serialization-compatible change.  So this seems dodgy; I think
> you need more help there.
>

I like this concept, or something like it. I think if a user alpha-renames
a parameters name then they could use an additional annotation to name it
differently for serialization. I already have this as an option as
described previously. Using parameter names for the serialisation is a nice
helper but doesn't need to be relied upon. The aim is to find a way to link
each parameter in the ctor to an accessor; the name is one potential
solution.

Would you consider an annotation like @NamedParameters that could be
applied to specific ctors or methods to force the parameter names to be
compiled into the class? This could be used for a range of purposes. Could
it be possible that the @Serializer annotation could be in some way derived
from the @NamedParameters annotation.

> On the other hand, more help may be at hand.  The notion that "these names
> are part of the API, please treat them as such" is a concept that comes up
> in a number of credible feature requests (one example is keyword-based
> invocation: new Foo(x: 1, y: 2).).  If we had this concept, then your
> suggestion would be a possible way to specify the serialization API.
>
> I wouldn't want it to be the only one, though.  If the class has a
> suitable ctor/dtor pair, this is more direct and less brittle, and more
> easily type-checked by IDEs.  Since this is all in the serialization
> library, it is free to define the annotations and member search rules in
> anyway that it can handle reflectively, so its possible that there be more
> than one way to get there.
>

As I mentioned previously, I've hit a bit of a road-block with my design as
I'll need a fallback solution for users on Java 11 (current target
version). An annotation on the ctor or its parameters is the likely
solution:

  public class Point {
     private final int x;
     private final int y;
     public Point( @Field("x") int x, @Field("y") int y ) { this.x = x;
this.y = y; }
     public int x() { return x; }
     public int y() { return y; }
  }

or:

  public class Point {
     private final int x;
     private final int y;

     @Deserializer( fields = {"x", "y" } )
     public Point(  int x,  int y ) { this.x = x; this.y = y; }

     public int x() { return x; }
     public int y() { return y; }
  }

The second option is winning for me at the moment as it looks a little
cleaner, even though there's more chance for errors.

> Your suggestion of using an explicit record declaration as the proxy form
> has already come up in discussion, and in fact, in that case, maybe the
> record is the only thing that needs to be annotated -- and the
> serialization library can look for members in the record to map back and
> forth to the type being serialized.  We're investigating this as well.
>

That makes a lot of sense. The explicit record declaration concept was
taken from your proposal. For any complex object with numerous methods and
fields, this would be what I suggest to users of the library. It also
provides a similar way to reason about versions with the definition of a
record per version. For instance:

 public class Location {
    private final float latitude;
    private final float longitude;

    @Serialize( version = 1 )
    public record LocationVer1( float latitude, float longitude ) {}

    @Serialize( version = 2 )
    public record LocationVer2( byte latDegrees, byte latMinutes, byte
latSeconds, byte lonDegrees, byte lonMinutes, byte lonSeconds, byte
direction ) {}

     ... ctors and accessors for both serializations ...
 }

Regards,
David.

>
> The fussy annotations in "Issue 2" are something I would hope are needed
> no more than .01% of the time, because people surely won't want to use
> them.  I can imagine using them to support migration compatibility when
> things are renamed, but I'd still probably rather use format versioning for
> that.
>
> Cheers,
> -Brian
>
>
> On 8/8/2020 1:51 AM, David Ryan wrote:
>
> I've just joined the mailing list to share some thoughts on using the
> concepts in the "Towards Better Serialization" as the starting point for
> the Java implementation for a serialization library I'm working on (because
> we need more right? [1]). Apologies if this isn't the right place to do it.
>
> The only relevant details of the serialization library design to this
> discussion is that the concept is to use a binary schema embedded into the
> encoding with strong typing. It is designed to be interoperable with other
> languages, but Java is the starting point. Reflection is used to describe
> the structure in the schema.
>
> I appreciate that the towards better serialization proposal is only a draft
> and not to be taken as a final design, so in a similar way, the following
> is just some experiences, not to be taken as a criticism of the design.
> I'll take the basic Point class as a starting point.
>
>    package com.myproject.chart;
>    public class Point {
>       private final int x;
>       private final int y;
>
>       @Deserializer(version=1)
>       public Point( int x, int y ) {
>       this.x = x;
>       this.y = y;
>       }
>
>       @Serializer
>       public int x() { return x; }
>
>       @Serializer
>       public int y() { return y; }
>     }
>
> The structure in the serialization schema can be derived from the class.
> This could be XML, JSON Schema or something else. I'm using a sort of
> s-expression:
>
>    (structure namespace:"com.myproject.chart" name:"Point" version=1
>        definition:(record fields:[
>             (field name:"x" type:"int32" )
>             (field name:"y" type:"int32" ) ] );
>
> Issue 1: I think the constructor should be the place to derive the
> structure for the schema, and then map those names to the getters.
> Constructor/method argument name inclusion in reflection is currently
> optional. Fallback can be to use the method names x,y, but there's no way
> to work out the order of the constructor. The pattern matching alternative
> to the serializer doesn't help this unless there's a change to require
> names to be available in the compiled classes.
>
> One of the other options discussed is to implement something like the
> following, which is a good way of separating the state from the object.
> Reflecting on the record for names and types would currently work which
> solves issue 1.
>
>    package com.myproject.chart;
>    public class Point {
>       private int x;
>       private int y;
>
>       public record PointData( int xIndex, int yIndex ) {}
>
>       @Deserializer
>       public Point( PointData data ) {
>       this.x = data.xIndex();
>       this.y = data.yIndex();
>       }
>
>       @Serializer
>       public PointData data() {
>       return new PointData( x, y );
>       }
>     }
>
>
> Issue 2:  When performing reflection on Point class and mapping it to the
> serialization schema, it isn't clear if PointData is a member of Point, or
> if PointData is the representation. I could say that because PointData is
> public record and doesn't have any annotation that it is the Point
> representation, but doesn't feel right. Another annotation on the PointData
> record, or option in @Deserializer?
>
> When developing a schema based serialization there's always the question of
> which comes first, the schema or the class. Wanting to be developer
> friendly, I'd like to allow class based design. This is where the light
> touch becomes more difficult. There will most likely be a mismatch between
> serialization type system and Java's type system. It is very easy to end up
> with a lot of embellishments. For example:
>
>    package com.myproject.chart;
>
>    @Schema( namespace="chart", name="point", structureType=Sequence.class )
>    public class Point {
>       private final int x;
>       private final int y;
>
>       @Deserializer(version=1)
>       public Point( @SchemaType( name="x_index", type="uint32",
> optional=false) int x,
>                     @SchemaType( name="y_index", type="uint32",
> optional=false) int y ) {
>       this.x = x;
>       this.y = y;
>       }
>
>       @Serializer
>       @SchemaField( name="x_index" )
>       public int x() { return x; }
>
>       @Serializer
>       @SchemaField( name="y_index" )
>       public int y() { return y; }
>     }
>
> Which upon reflection provides the serialization schema:
>
>    (structure namespace:"chart" name:"point" version=1
>        definition:(sequence fields:[
>              (field name:"x_index" type:"uint32" optional:false )
>              (field name:"y_index" type:"uint32" optional:false ) ] );
>
> Issue 3: The above is a very contrived example, but is there to show that
> it doesn't matter if it is JSON, XML, or the esoteric serialization library
> I'm designing, it is hard to predict what is required to map the Java state
> to the serialization state. Another example of this is the Jackson JSON
> annotation set [2]. If as library designer, I'm having to add annotations
> all over the class to get the mapping right, the value of having
> @Serializer/@Deserializer is diminished and their purpose will end up
> replicated in different library annotations. I'm not sure where the line
> should be drawn?
>
> I'm still in the design/draft stage of the library, so still a lot of work
> to get it right. However, I thought it was an interesting exercise and has
> helped to influence the design in a positive way. Harder issues are
> interfaces, classes that extend other classes and collections. If there's
> interest I can expand on those at a later time.
>
> Regards,
> David.
>
> [1] https://xkcd.com/927/
> [2]https://github.com/FasterXML/jackson-annotations/wiki/Jackson-Annotations
>
>
>