Towards Better Serialization: Who holds the metadata?
Brian Goetz
brian.goetz at oracle.com
Fri Jul 26 14:54:51 UTC 2019
I think you’ve got the right intuition about how this should work. The idea is that we let the programmer say “here’s how you extract my at-rest state”. This is a lower-level service than serialization; serialization is a consumer of this service, but may have to make up some additional information in order to live up to its commitments.
If the serialization framework wants to produce a self-contained stream, then yes, it would have to do something like (1). If all it cared about was list and maps of tuples, it could strip away all the class information and just do something like (2). There is a range of reproduction fidelities which a serialization framework could aim for, and if they are aiming for lower fidelity, than they can put less information in the stream. (For example, suppose you have a FooList in your object graph; could a serialization framework substitute a generic List? java.io <http://java.io/> serialization might not, but yours might, in which case you only need record the “listiness” of the object.) Or, supposed you had a closed-world assumption, where the universe of classes was known. Then you could encode something more compact about which class it came from. There are many degrees of freedom for serialization frameworks; we’re trying to separate the object freeze/thaw part of the story from the encoding / fidelity / evolution part of the story.
> On Jul 26, 2019, at 9:17 AM, Reinier Zwitserloot <reinier at zwitserloot.com> wrote:
>
> One detail I missed in brian's June 2019 edition of the proposal [1] is the
> following:
>
> Where goes the metadata? Who controls this?
>
> As an example, I'll use:
>
> package mypkg;
>
> @Value // generate all args constructor and such.
> public class Person {
> int unid;
> LocalDateTime birthDate;
>
> @Serializer(version = 1)
> public pattern Person(int unid, LocalDateTime birthDate) {
> unid = this.unid;
> birthDate = this.birthDate;
> }
>
> @Deserializer(version = 1)
> public static Person make(int unid, LocalDateTime birthDate) {
> return new Person(unid, birthDate);
> }
> }
>
> The metadata consists of 3 things.
>
> [A] The class name (here: 'mypkg.Person')
>
> [B] the version param of the @Serializer annotation (here: 1)
>
> [C] The tupletype. Here: <int, java.time.LocalDateTime>.
>
> This metadata is required for any deserializer to know how to convert the
> raw streamformat's actual data component (in this case, treat it as first
> having an int value and then a LocalDateTime value), as well as to find the
> proper static @Deserializer method that needs to be invoked with this data
> in order to successfully deserialize this.
>
> There are 3 ways to go here:
>
> Strategy 1: Store the metadata together with the data in whatever
> streamformat we end up using for this.
>
> Strategy 2: Do not store the metadata at all; the metadata must be provided
> together with the bytestream.
>
> Strategy 3: Both; make it pluggable: The chosen serializer is handed the
> streamed data + the metadata and can do whatever it wants with it, and a
> chosen deserializer must provide the streamed data + the metadata, obtained
> however it wants, to the serialization framework which will then find the
> right deserializer and apply it.
>
>
>
> The java.io.Serializable serialization uses strategy 1
> (metadata-with-data). However, a lot of commonly used alternative
> libraries, such as GSON and jackson, use strategy 2 (metadata-external).
>
> strategy 1 might look like (I picked JSON solely to illustrate the point, I
> presume debate about the precise stream format is premature at this phase):
>
> {
> "java-serialization-format-version": 1,
> "data": {
> "type": "mypkg.Person",
> "version": 1,
> "value": [
> { "type": "int", "name": "unid", "value": 5 },
> { type: "java.time.LocalDateTime", version: 0, name:
> "birthDate", "value":
> { "type": "String", "name": "spec", "value":
> "20190722T140030" }
> }
> ]
> }
> }
>
> Here we assume LDT has a serializer that serializes itself into a single
> string (20190722T140030 in this example).
>
> strategy 2 might look like:
>
> {
> "unid": 5,
> "birthDate": "20190722T140030"
> }
>
> Oof, that's a lot smaller. That is in fact so small and simple, it doesn't
> even look like anything java specific: This could well describe some random
> REST API. If somehow I can tell the java deserialization system to
> deserialize this straight into my Person class, that skips me an entire
> step.
>
> strategy 2, whilst very enticing in how compact the data becomes and how it
> looks like the serialized format can seamlessly interop with many other
> languages, has all sorts of obvious issues. Some food for thought:
>
> How would the deserializer do the job turning the bytes in the stream into
> an instance of java.lang.Object[] for the purposes of the proposed
> LinkedList deserializer, _even if_ the deserializer is provided with the
> information: "You must deserialize this into java.util.LinkedList"? The
> deserializer must also be provided the information: "The component type is
> LocalDateTime". What if I have a LinkedList that is a mix of integers,
> numbers, and localdatetimes?
>
> Given that the goal then becomes that you can deserialize REST API
> specifications straight into java objects, how do you specify a fieldname
> that isn't a valid java parameter name? '@Deserializer static Person
> make(@PropertyName("birth-date") LocalDate birthDate)"?
>
>
> Are we still exploring which of the 3 strategies to pick, or has a choice
> already been made? If not, how seriously should we attempt to find a way to
> do strategy #2? Alternatively, is part of the goal of this proposal to
> allow e.g. JSON mappers like GSON to use the new language features
> introduced here (such as deconstructors and the @Serializer annotation
> along with the pseudo-opens concept to deal with the reflective access
> issue)?
>
> I assume the answers are: "The serialization framework that ships with java
> is going to be focused solely on data-intermixed-with-metadata, but the
> language features would be usable by GSON and similar libraries as well",
> but I thought I'd check.
>
>
> [1] http://cr.openjdk.java.net/~briangoetz/amber/serialization.html
>
> --Reinier Zwitserloot
More information about the amber-dev
mailing list