Implementing Towards Better PEP/Serialization
David Ryan
david at livemedia.com.au
Sat Mar 20 12:45:49 UTC 2021
I've spent the last while refining the serialization model and looking at
how data maps to/from Java. The model consists of:
Record - Product type with fields (required or optional) of specific types.
Union - Sum type or tagged union.
Atom - Any atomic value that has one or more representations.
Array - Repeating element.
Annotations - Additional data.
I've been reading the JEP tea-leaves and can see a rather interesting
pattern emerging:
JEP395 - Records (maps directly to data product types)
JEP397 - Sealed Classes (maps directly to tagged union/sum types)
JEP401/402 - Primitive objects (maps directly to atom types)
JEP8261099 - Frozen arrays (provides equivalent of final array values)
Combining these concepts provides a really strong data serialization basis.
It's like you've had a plan around this all along?
While investigating the mapping of tagged unions from data to/from java
I've come up with two interesting problems. Imagine a data spec with a
schema like (please ignore syntax):
some_record: record( some_field:union( string | int ) )
The idea here is that some_field is a field that can either be a string or
int. There are a couple of ways this could be mapped to Java. Option1 as
different fields:
public record SomeRecord(int someFieldInt, String someFieldString) {...}
or using sealed type.
public sealed interface SomeFieldType permits SomeFieldInt, SomeFieldString;
public record SomeFieldInt( int someField ) {...}
public record SomeFieldString( String someField ) {...}
public record SomeRecord( SomeFieldType someField ) { ... }
*Question1*: Any other ways this could be mapped?
*Question2*: Are there any plans/thoughts/ideas to add a short-hand tagged
union type to Java? So I can do:
public record SomeRecord( int|String someField ) {...}
I can sort of see that pattern matching might provide a path to supporting
this through the language.
The second problem is that java base classes that are instantiable maps to
both records and tagged unions on the data side.
*Question3*: I was wondering if that was one of the reasons why records are
final?
Take for example the following example:
class Vehicle {
private final String make;
private final String model;
private final int year;
...
}
class Car extends Vehicle {
private final int horsePower;
...
}
class Truck extends Vehicle {
private final int numberOfAxles;
...
}
If I was to remodel this using Records and Sealed interface I'd need to
repeat the common fields in each record and separate the tagged union
Vehicle from the record type. Something like:
public sealed interface Vehicle permits GenericVehicle, Car, Truck {...}
public record GenericVehicle(String make, String model, int year) {...}
public record Car(String make, String model, int year, int horsePower) {...}
public record Truck(String make, String model, int year, int numberOfAxles)
{...}
This now maps nicely into a data grammar/schema whereas base classes would
require an additional structural type on the data side:
vehicle: union( genericVehicle, car, truck );
genericVehicle: record( string make, string model, int year );
car: record( string make, string model, int year, int horsePower );
truck: record( string make, string model, int year, int numberOfAxles );
The upshot of this is that I'm currently contemplating not allowing
anything but abstract base classes to be serializable in the Java PEP/bind
library I've been working on. Obviously out of scope for amber, but is an
interesting finding when looking at how to map between Java/OO and data
schemas.
One other conceptual mismatch between java and data I've found is "required
vs optional" fields for records.
*Question4*: Given the move to primitive objects, I was also wondering if
Optional will also become primitive at some point?
Regards,
David.
More information about the amber-dev
mailing list