Implementing Towards Better Serialization

Sat Aug 8 05:51:40 UTC 2020

I've just joined the mailing list to share some thoughts on using the
concepts in the "Towards Better Serialization" as the starting point for
the Java implementation for a serialization library I'm working on (because
we need more right? [1]). Apologies if this isn't the right place to do it.

The only relevant details of the serialization library design to this
discussion is that the concept is to use a binary schema embedded into the
encoding with strong typing. It is designed to be interoperable with other
languages, but Java is the starting point. Reflection is used to describe
the structure in the schema.

I appreciate that the towards better serialization proposal is only a draft
and not to be taken as a final design, so in a similar way, the following
is just some experiences, not to be taken as a criticism of the design.
I'll take the basic Point class as a starting point.

   package com.myproject.chart;
   public class Point {
      private final int x;
      private final int y;

      @Deserializer(version=1)
      public Point( int x, int y ) {
      this.x = x;
      this.y = y;
      }

      @Serializer
      public int x() { return x; }

      @Serializer
      public int y() { return y; }
    }

The structure in the serialization schema can be derived from the class.
This could be XML, JSON Schema or something else. I'm using a sort of
s-expression:

   (structure namespace:"com.myproject.chart" name:"Point" version=1
       definition:(record fields:[
            (field name:"x" type:"int32" )
            (field name:"y" type:"int32" ) ] );

Issue 1: I think the constructor should be the place to derive the
structure for the schema, and then map those names to the getters.
Constructor/method argument name inclusion in reflection is currently
optional. Fallback can be to use the method names x,y, but there's no way
to work out the order of the constructor. The pattern matching alternative
to the serializer doesn't help this unless there's a change to require
names to be available in the compiled classes.

One of the other options discussed is to implement something like the
following, which is a good way of separating the state from the object.
Reflecting on the record for names and types would currently work which
solves issue 1.

   package com.myproject.chart;
   public class Point {
      private int x;
      private int y;

      public record PointData( int xIndex, int yIndex ) {}

      @Deserializer
      public Point( PointData data ) {
      this.x = data.xIndex();
      this.y = data.yIndex();
      }

      @Serializer
      public PointData data() {
      return new PointData( x, y );
      }
    }

Issue 2:  When performing reflection on Point class and mapping it to the
serialization schema, it isn't clear if PointData is a member of Point, or
if PointData is the representation. I could say that because PointData is
public record and doesn't have any annotation that it is the Point
representation, but doesn't feel right. Another annotation on the PointData
record, or option in @Deserializer?

When developing a schema based serialization there's always the question of
which comes first, the schema or the class. Wanting to be developer
friendly, I'd like to allow class based design. This is where the light
touch becomes more difficult. There will most likely be a mismatch between
serialization type system and Java's type system. It is very easy to end up
with a lot of embellishments. For example:

   package com.myproject.chart;

   @Schema( namespace="chart", name="point", structureType=Sequence.class )
   public class Point {
      private final int x;
      private final int y;

      @Deserializer(version=1)
      public Point( @SchemaType( name="x_index", type="uint32",
optional=false) int x,
                    @SchemaType( name="y_index", type="uint32",
optional=false) int y ) {
      this.x = x;
      this.y = y;
      }

      @Serializer
      @SchemaField( name="x_index" )
      public int x() { return x; }

      @Serializer
      @SchemaField( name="y_index" )
      public int y() { return y; }
    }

Which upon reflection provides the serialization schema:

   (structure namespace:"chart" name:"point" version=1
       definition:(sequence fields:[
             (field name:"x_index" type:"uint32" optional:false )
             (field name:"y_index" type:"uint32" optional:false ) ] );

Issue 3: The above is a very contrived example, but is there to show that
it doesn't matter if it is JSON, XML, or the esoteric serialization library
I'm designing, it is hard to predict what is required to map the Java state
to the serialization state. Another example of this is the Jackson JSON
annotation set [2]. If as library designer, I'm having to add annotations
all over the class to get the mapping right, the value of having
@Serializer/@Deserializer is diminished and their purpose will end up
replicated in different library annotations. I'm not sure where the line
should be drawn?

I'm still in the design/draft stage of the library, so still a lot of work
to get it right. However, I thought it was an interesting exercise and has
helped to influence the design in a positive way. Harder issues are
interfaces, classes that extend other classes and collections. If there's
interest I can expand on those at a later time.

Regards,
David.

[1] https://xkcd.com/927/
[2]
https://github.com/FasterXML/jackson-annotations/wiki/Jackson-Annotations