Implementing Towards Better PEP/Serialization
Remi Forax
forax at univ-mlv.fr
Sun Aug 16 12:58:52 UTC 2020
Hi David,
here is my take on this
https://gist.github.com/forax/85303febe5c6bebda4ec7dd2fb51a9e2
The TLDR is if a class has a deconstructor (and a reconstructor), it's enough to automatically serialize/deserialize a class without all the quirks of the Serialization API.
I prefer your original idea that a record class mirrors the schema so a record instance is conceptually equivalent to a list of pairs of key/values.
This is how it works, the idea is that converting an class to a record is enough to serialize/deserialize it,
to do so a user should provide a deconstructor (instance -> record) and one or more reconstructor (record -> instance),
you can have more than one reconstructor if there are several versions (several records) used previously.
> - Data not objects: “applications use serialization to persist data, or to
> exchange data with other applications. Not objects; data.” And "We should
> seek to support serialization of data, not objects."
yep, any instance is converted to a record instance (data) before being serialized and vice versa.
>
> - Make serialisation explicit and bring serialization into the object
> model: "If a class has an externally accessible
> construction/deconstruction/access protocol, it should be easy to just say
> 'use that for serialization too' (records do this already.)"
I've used an annotation @Marshall to explicitly indicate that a class uses "data serialization" instead of the classical serialization.
The annotation also specify the record class used to convert an instance to a data and the record classes to convert a data to an instance of the class.
Specifying the record classes also the Marshaller to find the deconstructor/reconstructor directly (the record class give the signature of the method)
without using the reflection to list all possible methods. I've chosen that a reconstructor is a static method instead of being a constructor because
it's a little cleaner (by example you )
>
> - Use the "front door" api, or let the developer specify "back door" api
> and secure it: "A class should be able to expose a serialization protocol
> without exposing that as part of the public API."
The front door is the Marshaller, the backdoors are the deconstructor/reconstructor methods.
>
> - Move away from "readObject/writeObject" mechanisms. This encodes the
> form of the data in code instead of a form that can be used for automated
> serialization and schema definitions.
Here readObject/writeObject are implemented on top of the conversion between an instance to a data and vice versa, so it can be easily extended to JSON, etc
>
> - Backward compatible. My requirement, not yours. It should be possible to
> put together an implementation in Java 11 at minimum.
oops, my implementation uses record, so not Java 11 compatible. It's Java 14/Java 15.
Here is an example, let suppose i've a class MutablePoint
class MutablePoint {
int x;
int y;
public MutablePoint(int x, int y) {
this.x = x;
this.y = y;
}
}
If i want to serialize it, i will first add a record containing the data I want to serialize
record Point(int x, int y) implements Serializable { }
then i wall add an annotation @Marshall and specify to use the record class for the marshalling/unmarshalling
@Marshall(deconstruct = Point.class, reconstructs = Point.class)
class MutablePoint {
...
and i will add a deconstructor and a reconstructor to indicate how to convert from a MutablePoint to a Point (and vice versa)
@Marshall(deconstruct = Point.class, reconstructs = Point.class)
static class MutablePoint {
int x;
int y;
public MutablePoint(int x, int y) {
this.x = x;
this.y = y;
}
private Point deconstructor() {
return new Point(x, y); // convert to data
}
private static MutablePoint reconstructor(Point point) {
return new MutablePoint(point.x, point.y); // extract from data
}
}
That's all, I can now serialize/deserialize the class MutablePoint
By first creating a Marshaller (the Lookup object is used to find the constructor/deconstructor)
var marshaller = Marshaller.of(lookup());
to serialize
marshaller.writeObject(...ObjectOutputStream..., mutablePoint);
to deserialize
var mutablePoint2 = (MutablePoint) marshaller.readObject(... ObjectInputStream ...);
regards,
Rémi
----- Mail original -----
> De: "David Ryan" <david at livemedia.com.au>
> À: "Brian Goetz" <brian.goetz at oracle.com>
> Cc: "amber-dev" <amber-dev at openjdk.java.net>
> Envoyé: Dimanche 16 Août 2020 12:23:42
> Objet: Re: Implementing Towards Better PEP/Serialization
> I'm back a bit sooner than I expected. I've had a busy week digging into
> the concepts and developing a proof of concept. It would be great to get
> some feedback on the direction I'm heading and how it aligns or doesn't
> align with yours. Before I discuss the POC, I wanted to go back over the
> requirements and how I arrived at the current proof of concept. It should
> provide the rationale and give you or anyone else a chance to poke holes in
> it.
>
> Re-stating the summarized requirements based on elements of our discussion
> and the original proposal.
>
> - Data not objects: “applications use serialization to persist data, or to
> exchange data with other applications. Not objects; data.” And "We should
> seek to support serialization of data, not objects."
>
> - Make serialisation explicit and bring serialization into the object
> model: "If a class has an externally accessible
> construction/deconstruction/access protocol, it should be easy to just say
> 'use that for serialization too' (records do this already.)"
>
> - Use the "front door" api, or let the developer specify "back door" api
> and secure it: "A class should be able to expose a serialization protocol
> without exposing that as part of the public API."
>
> - Move away from "readObject/writeObject" mechanisms. This encodes the
> form of the data in code instead of a form that can be used for automated
> serialization and schema definitions.
>
> - Backward compatible. My requirement, not yours. It should be possible to
> put together an implementation in Java 11 at minimum.
>
> Step 1: Define Data.
>
> Based on those requirements and much more detailed discussion in your
> proposal, the pattern matching constructor/destruction pattern was
> proposed. A reason you're drawn to the ctor/dtor mechanism is that it forms
> an embedding-projection pair. Would it be safe to say that based on this
> and reading between the lines, that what we're after is in fact an
> embedding-projection pair between an Object and Data?
>
> e: Data -> Object/Class
> p: Object/Class -> Data
>
> If you accept that as the core requirement, then before we go much further
> we better decide what "Data" means. For the purposes of Java language
> design, "Data" is not the byte stream encoding. As the proposal states, and
> I agree, "the stream format is probably the least interesting part of the
> serialization mechanism". Based on this, "Data" is defined as something
> between the encoding and the object. However, if we use your ctor/dtor
> mechanism as the example, then "Data" is the parameter list tuple. We can
> then make a small leap and say that "Data" in Java is an Object[] of values
> and the associated metadata (types, names and order). Once again, reading
> between the lines of what "better serialization" means, I think it is and
> embedded projection pair of:
>
> e: Object[] -> Object
> (Metadata) (Class)
>
> p: Object -> Object[]
> (Class) (metadata)
>
> Note: It's the metadata associated with the Object[] that is really what
> we're after. A serialization protocol could bypass the Object[] in-memory
> representation altogether. The Object[] is just the simplest way to
> represent the tuple in java for the rest of the discussion. The Java
> serialization implementation has the metadata we're talking about
> implemented in the ObjectStreamClass and the ObjectStreamField. So,
> potentially, an aim is to create a better ObjectStreamClass that can be
> used by serialization libraries without that magic it currently contains?
>
> From this embedded projection pair between Object and Object[] a
> serialization library can come along and add the encoding:
>
> Serialization: Object -> Object[] -> encoding
> (class) (metadata) (schema)
>
> Deserialization: encoding -> Object[] -> Object.
> (schema) (metadata) (class)
>
> On this basis, I've changed the title to "Implementing Towards Better PEP
> (Projection-Embedded Pairs)", as that's the key concept that can help
> Serialization. There's another side discussion to be had regarding what if
> any restriction could be placed on the elements of the Object[]. I don't
> think it matters, but a class could project data elements in the array that
> can't be serialized.
>
> Step 2: Possible embedded projection pairs.
>
> Now that I've been shown the embedded projection pair hammer, everything
> looks like a nail. :) So, using the embedded projection pairs between
> Object[] and Object, what mechanisms can be found to implement it using
> front door APIs:
>
> Class specified:
> Constructor:Destructor - An n-arg constructor with n-arg destructor. The
> proposal suggests using this pattern, but it is not available yet.
> Constructor/Accessors - Available, but potentially difficult to match
> parameters from constructor with accessors.
> Setters/Getters - Simple, but requires no-args constructor and
> immutability of objects is where a lot of developers are moving.
> Encapsulated projection - The class has an alternative form and provides
> constructor and accessor for the alternate form. The alternate form
> recursively uses another mechanism listed here. Requires something to
> inform if data is encapsulated or the encapsulation is the data.
>
> Externally specified:
> Encapsulated embedding - An external class extracts and embeds a target
> class, with the target class not having defined a direct embedded
> projection pair.
> Intermediate ep-pair - A third class that provides both projection and
> embedding functions between two other classes.
>
> There's variations on the above with factory classes and facades etc, but
> they generally can be fit into those categories.
>
> Step 3: Solve the Constructor/Accessors parameter matching
>
>>> As I mentioned previously, I've hit a bit of a road-block with my design
> as I'll need a fallback solution for users on Java 11 (current target
> version). An annotation on the ctor or its parameters is the likely
> solution:
>
>> Yeah, that’s ugly but doable. Your users won’t like you.
>
> This comment rolled around in my head for a little while, so I looked
> closer at the problem. In many cases the classes we're talking about that
> have immutable fields have the following form:
>
> public class Point {
> private final int x;
> private final int y;
>
> public Point( int x, int y ) {
> this.x = x;
> this.y = y;
> }
>
> public int x() { return x; }
> public int y() { return y; }
> }
>
> It is pretty clear from our perspective that the constructor parameters
> match up with x,y fields and x,y accessors. However, without the names
> available in the class, reflection doesn't help. If we can prove that
> constructor parameters are invariant before being written to the field, we
> can safely match the constructor to the fields/accessors. So doing some
> deep reflection, we can implement a really simple checking for invariance
> by finding the following patterns and extracting the parameter and field id.
>
> 4: aload_0
> 5: iload_1
> 6: putfield #14 // Field x:I
>
> This can then be matched up with the accessor:
>
> 0: aload_0
> 1: getfield #14 // Field x:I
> 4: ireturn
>
> Based on this the above, the class can have its "Data" made available with
> no additional annotations. I'm sure there's plenty of research and
> implementations for testing for parameter invariance that could be applied
> here.
>
> With a single constructor we don't *need* anything else to say the data can
> be serialised. This bypasses requirement 2, "make serialization explicit",
> but an annotation could be added.
>
> Step 4. The PEP Proof of Concept (because PEP sounds better than EPP)
>
> https://github.com/litterat/pep-java
>
> The proof of concept is designed (still being implemented) to provide five
> of the six mechanisms (obviously dtor is missing) for Object to Object[]
> embedded projection pairs. It also implements the simple check for
> invariance based on above. The general usage being:
>
> // Create an instance object to be projected.
> Point p1 = new Point(1,2);
>
> // Create a context and a descriptor for the target class.
> PepContext context = new PepContext();
> PepClassDescriptor pointDescriptor = context.getDescriptor(Point.class);
>
> // Extract the values to an array
> Object[] values = pointDescriptor.project(p1);
>
> // Create the object from the values
> Point p2 = pointDescriptor.embed(values);
>
>
> The PepClassDescriptor is logically equivalent to the ObjectStreamClass of
> Java serialization. The PepContext is equivalent to the ObjectStreamClass
> static cache. I've kept it separated so there can be different data
> projections for a class based on the type of communications encoding being
> used.
>
> The PepClassDescriptor currently includes the project and embed functions,
> but they are logically different things and probably should be separate
> implementations. As mentioned previously, I'll likely just use the meta
> data as part of the serialization implementation and re-implement without
> the intermediary Object[] later.
>
> The documentation provided in the project README is currently the design
> document, so if anyone has time/interest to read that I'd be interested in
> feedback.
>
> Step 5. Implementation
>
> There's still plenty of work to be done on the proof of concept, but the
> general idea feels like it will work well being separate to the
> serialization library itself. I still think there will be a need for the
> serialization library to add additional annotations which are encoding
> specific, but, if the language can provide the "Data" metadata, half the
> job is done.
>
> The implementation syntax is likely to change a bit as I get to understand
> how the library interacts with the actual serialization library. I can also
> see some or all the concepts being part of the platform eventually. A week
> ago I was skeptical that a useful separation could be achieved. I can
> envisage a reflection api something like:
>
> DataDescriptor data = object.getClass().getDataDescriptor();
> DataFields[] fields = data.fields();
>
> Anyway, back to the implementation. Thanks for the discussion and direction
> so far, it has clearly helped.
>
> Regards,
> David.
More information about the amber-dev
mailing list