Implementing Towards Better PEP/Serialization

Thu Nov 19 13:34:44 UTC 2020

There are, of course, about a million considerations when it comes to
serialization in this manner, and I'm sure Brian can list dozens of
them off the top of his head, but taking this one thing in
isolation...  If I were looking for a means to deserialize a protocol
with requirements and constraints which are similar to Java
serialization in conjunction with an interface + constructor
methodology, I would use a modified OIS.GetFIeld strategy, passing in
a GetField-like object to the constructor(s) rather than an *Input
object of some sort.  So, rather than e.g. `input.readFloat()` it
would be e.g. `gf.getFloatField("fieldName")`.  The GetField
implementation would look at the caller to see what constructor was
called (which deals cleanly with compatibility issues relating to a
mismatched class hierarchy on the reading side, missing/added classes,
and missing/added fields).

I considered adding such a mechanism to JBoss Marshalling (our
serialization implementation), but making this interoperate with
standard serialization is quite difficult (not impossible, but I
deemed it to not be worth the effort involved).

On Thu, Nov 19, 2020 at 6:30 AM David Ryan <david at livemedia.com.au> wrote:
>
> After a few months of distractions and finishing up my last job, I've given
> myself a few months to focus on serialization and a few ideas following on
> from the previous messages and a few of my own ideas. I'll keep my comments
> here to what I think is relevant implementation details only.
>
> Before I was distracted, I managed to implement a prototype library
> litterat-pep. I also took the GSON parser and tokenizer and implemented a
> JSON serializer front end. Interestingly, with the litterat-pep library
> abstracting away the object projection/embedding, the JSON implementation
> that ties PEP to the tokenizer became one file:
>
> https://github.com/litterat/litterat-json/blob/master/src/main/java/io/litterat/json/JsonMapper.java
>
> I've found this quite encouraging and is a good proof point that things are
> heading in the right direction. However, I've hit on an implementation
> detail which would be interesting to get an opinion.
>
> Using the simple immutable point as a starting point.
>
> class Point {
>    private final float x;
>    private final float y;
>    public Point(float x, float y) { this.x=x; this.y=y; }
>    public float x() { return x; }
>    public float y() { return y; }
> }
>
> There's two ways to deserialize an object like this. Using the public
> interface and a constructor method handle requires allocating an array and
> autoboxing the float values. This is the way that JsonMapper is currently
> implemented(lines 194-211 in JsonMapper). Pseudo code looks something like:
>
>   Object[] values = new Object[2];
>   values[0] = input.readFloat();
>   values[1] = input.readFloat();
>   return constructor.invoke( values );
>
> However, the internal Java serialization can construct and set fields
> something like:
>
>   Object result = Point.newInstance(); // Bypass the constructor and create
> an empty object.
>   fieldXSet(result, input.readFloat() ); // Call method handle that sets
> the field directly. Combining the readFloat and fieldSet method handles
> creates one handle that does both.
>   fieldYSet(result, input.readFloat() );
>   return result;
>
> Using the method handles and field setters, the internal library has
> bypassed two autoboxed Floats and the creation/destruction of the Object[].
> For a JSON parser, I wouldn't be too worried about this as
> text parsing is already expensive and the overhead wouldn't add much,
> however, in a binary serialization this overhead could add up. In older
> Java versions I've observed this type of autoboxing in serialization put
> huge pressure on the garbage collector.
>
> Before getting stuck on this issue. Should I care? Will the later Java
> compiler versions eventually see that the float values don't need to be
> autoboxed and the Object[] could be put on the stack?
>
> If this PEP library is to be nice and adhere to not setting private final
> fields directly and use the public constructor, I'm left wondering if
> there's any way to improve the performance of the first solution without
> waiting for the optimizer to kick in?
>
> The only potential solution I've thought of so far is to get the front end
> serializer to create a MethodHandle that looks like:
>
> constructor( input.readFloat(), input.readFloat() );
>
> The problem with this is that the values must be serialized in the correct
> order. This would potentially be ok for some binary formats.
>
> Also, following up from the previous message, I've started putting together
> a catalog of data examples.
>
> https://github.com/litterat/litterat-pep/tree/master/src/test/java/io/litterat/pep/data
>
> I'm in the middle of a refactor at the moment and will republish in a few
> weeks with potentially a binary format front end as well as JSON.
>
> Regards,
> David.
>

-- 
- DML • he/him