Towards Better Serialization: Who holds the metadata?

Fri Jul 26 13:17:30 UTC 2019

One detail I missed in brian's June 2019 edition of the proposal [1] is the
following:

Where goes the metadata? Who controls this?

As an example, I'll use:

package mypkg;

@Value // generate all args constructor and such.
public class Person {
    int unid;
    LocalDateTime birthDate;

    @Serializer(version = 1)
    public pattern Person(int unid, LocalDateTime birthDate) {
        unid = this.unid;
        birthDate = this.birthDate;
    }

    @Deserializer(version = 1)
    public static Person make(int unid, LocalDateTime birthDate) {
        return new Person(unid, birthDate);
    }
}

The metadata consists of 3 things.

[A] The class name (here: 'mypkg.Person')

[B] the version param of the @Serializer annotation (here: 1)

[C] The tupletype. Here: <int, java.time.LocalDateTime>.

This metadata is required for any deserializer to know how to convert the
raw streamformat's actual data component (in this case, treat it as first
having an int value and then a LocalDateTime value), as well as to find the
proper static @Deserializer method that needs to be invoked with this data
in order to successfully deserialize this.

There are 3 ways to go here:

Strategy 1: Store the metadata together with the data in whatever
streamformat we end up using for this.

Strategy 2: Do not store the metadata at all; the metadata must be provided
together with the bytestream.

Strategy 3: Both; make it pluggable: The chosen serializer is handed the
streamed data + the metadata and can do whatever it wants with it, and a
chosen deserializer must provide the streamed data + the metadata, obtained
however it wants, to the serialization framework which will then find the
right deserializer and apply it.

The java.io.Serializable serialization uses strategy 1
(metadata-with-data). However, a lot of commonly used alternative
libraries, such as GSON and jackson, use strategy 2 (metadata-external).

strategy 1 might look like (I picked JSON solely to illustrate the point, I
presume debate about the precise stream format is premature at this phase):

{
    "java-serialization-format-version": 1,
    "data": {
        "type": "mypkg.Person",
        "version": 1,
        "value": [
            { "type": "int", "name": "unid", "value": 5 },
            { type: "java.time.LocalDateTime", version: 0, name:
"birthDate", "value":
                { "type": "String", "name": "spec", "value":
"20190722T140030" }
            }
        ]
    }
}

Here we assume LDT has a serializer that serializes itself into a single
string (20190722T140030 in this example).

strategy 2 might look like:

{
    "unid": 5,
    "birthDate": "20190722T140030"
}

Oof, that's a lot smaller. That is in fact so small and simple, it doesn't
even look like anything java specific: This could well describe some random
REST API. If somehow I can tell the java deserialization system to
deserialize this straight into my Person class, that skips me an entire
step.

strategy 2, whilst very enticing in how compact the data becomes and how it
looks like the serialized format can seamlessly interop with many other
languages, has all sorts of obvious issues. Some food for thought:

How would the deserializer do the job turning the bytes in the stream into
an instance of java.lang.Object[] for the purposes of the proposed
LinkedList deserializer, _even if_ the deserializer is provided with the
information: "You must deserialize this into java.util.LinkedList"? The
deserializer must also be provided the information: "The component type is
LocalDateTime". What if I have a LinkedList that is a mix of integers,
numbers, and localdatetimes?

Given that the goal then becomes that you can deserialize REST API
specifications straight into java objects, how do you specify a fieldname
that isn't a valid java parameter name? '@Deserializer static Person
make(@PropertyName("birth-date") LocalDate birthDate)"?

Are we still exploring which of the 3 strategies to pick, or has a choice
already been made? If not, how seriously should we attempt to find a way to
do strategy #2? Alternatively, is part of the goal of this proposal to
allow e.g. JSON mappers like GSON to use the new language features
introduced here (such as deconstructors and the @Serializer annotation
along with the pseudo-opens concept to deal with the reflective access
issue)?

I assume the answers are: "The serialization framework that ships with java
is going to be focused solely on data-intermixed-with-metadata, but the
language features would be usable by GSON and similar libraries as well",
but I thought I'd check.

[1] http://cr.openjdk.java.net/~briangoetz/amber/serialization.html

 --Reinier Zwitserloot