Implementing Towards Better Serialization

Sat Aug 8 15:29:19 UTC 2020

Thanks for getting involved!

How this is exposed in the programming model is indeed an open issue (in 
some sense, it's "syntax"), but of course not all schemes are created 
equally.  Let's put out a few requirements:

  - If a class has an externally accessible 
construction/deconstruction/access protocol, it should be easy to just 
say "use that for serialization too" (records do this already.)
  - A class should be able to expose a serialization protocol without 
exposing that as part of the public API.

You are saying, in effect, "many classes have a N-ary constructor and 
accessors for all N components, so by requirement #1 above, I want to 
just infer a serialization protocol from those."  This makes perfect 
sense, until you get prescriptive ("should be the place").  But let's 
restate that as "could be a place", and I'm with you.

Now, we've got an (accidental) problem; how to match the constructor 
with the accessors.  If you tag the ctor and the N accessors, the 
serialization library has to match them up, and your only move here is 
"name and type".  With record-style accessor names, you get the name 
from the accessor (with JavaBean-style names, you have to mangle them, 
but you can get there), but ... names of constructor arguments are not, 
by default, reified in the classfile.  (sad trombone sound.)  And the 
types don't help much, since I could have seven `int` parameters (and 
often do.)  And the order of declaration doesn't help at all, because 
there's no guarantee reflection returns them in the order they are 
declared (and even if it did, that would be unacceptably brittle.)

This was why I was leaning, in my draft, towards using a matched 
ctor/dtor pair (or factory/dtor pair).  From a practical perspective, 
they can be matched solely by the "shape" of their argument lists, and 
it wouldn't matter what the names are. (It's also nice because they can 
be assumed to form an _embedding-projection pair_, which has nice 
algebraic properties.)  But, there's other ways to get there too -- I'm 
just not sure that, without help, we can do it with just ctor + accessors.

You might say "if the ctor has the @Serializer anno, then reify the 
parameter names as if the class were compiled with javac -parameters."  
This comes dangerously close to trying to introduce language semantics 
via annotations.  And it means that alpha-renaming a parameter name 
would no longer be a serialization-compatible change.  So this seems 
dodgy; I think you need more help there.

On the other hand, more help may be at hand.  The notion that "these 
names are part of the API, please treat them as such" is a concept that 
comes up in a number of credible feature requests (one example is 
keyword-based invocation: new Foo(x: 1, y: 2).).  If we had this 
concept, then your suggestion would be a possible way to specify the 
serialization API.

I wouldn't want it to be the only one, though.  If the class has a 
suitable ctor/dtor pair, this is more direct and less brittle, and more 
easily type-checked by IDEs.  Since this is all in the serialization 
library, it is free to define the annotations and member search rules in 
anyway that it can handle reflectively, so its possible that there be 
more than one way to get there.

Your suggestion of using an explicit record declaration as the proxy 
form has already come up in discussion, and in fact, in that case, maybe 
the record is the only thing that needs to be annotated -- and the 
serialization library can look for members in the record to map back and 
forth to the type being serialized.  We're investigating this as well.

The fussy annotations in "Issue 2" are something I would hope are needed 
no more than .01% of the time, because people surely won't want to use 
them.  I can imagine using them to support migration compatibility when 
things are renamed, but I'd still probably rather use format versioning 
for that.

Cheers,
-Brian

On 8/8/2020 1:51 AM, David Ryan wrote:
> I've just joined the mailing list to share some thoughts on using the
> concepts in the "Towards Better Serialization" as the starting point for
> the Java implementation for a serialization library I'm working on (because
> we need more right? [1]). Apologies if this isn't the right place to do it.
>
> The only relevant details of the serialization library design to this
> discussion is that the concept is to use a binary schema embedded into the
> encoding with strong typing. It is designed to be interoperable with other
> languages, but Java is the starting point. Reflection is used to describe
> the structure in the schema.
>
> I appreciate that the towards better serialization proposal is only a draft
> and not to be taken as a final design, so in a similar way, the following
> is just some experiences, not to be taken as a criticism of the design.
> I'll take the basic Point class as a starting point.
>
>     package com.myproject.chart;
>     public class Point {
>        private final int x;
>        private final int y;
>
>        @Deserializer(version=1)
>        public Point( int x, int y ) {
>        this.x = x;
>        this.y = y;
>        }
>
>        @Serializer
>        public int x() { return x; }
>
>        @Serializer
>        public int y() { return y; }
>      }
>
> The structure in the serialization schema can be derived from the class.
> This could be XML, JSON Schema or something else. I'm using a sort of
> s-expression:
>
>     (structure namespace:"com.myproject.chart" name:"Point" version=1
>         definition:(record fields:[
>              (field name:"x" type:"int32" )
>              (field name:"y" type:"int32" ) ] );
>
> Issue 1: I think the constructor should be the place to derive the
> structure for the schema, and then map those names to the getters.
> Constructor/method argument name inclusion in reflection is currently
> optional. Fallback can be to use the method names x,y, but there's no way
> to work out the order of the constructor. The pattern matching alternative
> to the serializer doesn't help this unless there's a change to require
> names to be available in the compiled classes.
>
> One of the other options discussed is to implement something like the
> following, which is a good way of separating the state from the object.
> Reflecting on the record for names and types would currently work which
> solves issue 1.
>
>     package com.myproject.chart;
>     public class Point {
>        private int x;
>        private int y;
>
>        public record PointData( int xIndex, int yIndex ) {}
>
>        @Deserializer
>        public Point( PointData data ) {
>        this.x = data.xIndex();
>        this.y = data.yIndex();
>        }
>
>        @Serializer
>        public PointData data() {
>        return new PointData( x, y );
>        }
>      }
>
>
> Issue 2:  When performing reflection on Point class and mapping it to the
> serialization schema, it isn't clear if PointData is a member of Point, or
> if PointData is the representation. I could say that because PointData is
> public record and doesn't have any annotation that it is the Point
> representation, but doesn't feel right. Another annotation on the PointData
> record, or option in @Deserializer?
>
> When developing a schema based serialization there's always the question of
> which comes first, the schema or the class. Wanting to be developer
> friendly, I'd like to allow class based design. This is where the light
> touch becomes more difficult. There will most likely be a mismatch between
> serialization type system and Java's type system. It is very easy to end up
> with a lot of embellishments. For example:
>
>     package com.myproject.chart;
>
>     @Schema( namespace="chart", name="point", structureType=Sequence.class )
>     public class Point {
>        private final int x;
>        private final int y;
>
>        @Deserializer(version=1)
>        public Point( @SchemaType( name="x_index", type="uint32",
> optional=false) int x,
>                      @SchemaType( name="y_index", type="uint32",
> optional=false) int y ) {
>        this.x = x;
>        this.y = y;
>        }
>
>        @Serializer
>        @SchemaField( name="x_index" )
>        public int x() { return x; }
>
>        @Serializer
>        @SchemaField( name="y_index" )
>        public int y() { return y; }
>      }
>
> Which upon reflection provides the serialization schema:
>
>     (structure namespace:"chart" name:"point" version=1
>         definition:(sequence fields:[
>               (field name:"x_index" type:"uint32" optional:false )
>               (field name:"y_index" type:"uint32" optional:false ) ] );
>
> Issue 3: The above is a very contrived example, but is there to show that
> it doesn't matter if it is JSON, XML, or the esoteric serialization library
> I'm designing, it is hard to predict what is required to map the Java state
> to the serialization state. Another example of this is the Jackson JSON
> annotation set [2]. If as library designer, I'm having to add annotations
> all over the class to get the mapping right, the value of having
> @Serializer/@Deserializer is diminished and their purpose will end up
> replicated in different library annotations. I'm not sure where the line
> should be drawn?
>
> I'm still in the design/draft stage of the library, so still a lot of work
> to get it right. However, I thought it was an interesting exercise and has
> helped to influence the design in a positive way. Harder issues are
> interfaces, classes that extend other classes and collections. If there's
> interest I can expand on those at a later time.
>
> Regards,
> David.
>
> [1] https://xkcd.com/927/
> [2]
> https://github.com/FasterXML/jackson-annotations/wiki/Jackson-Annotations