Implementing Towards Better PEP/Serialization
David Ryan
david at livemedia.com.au
Thu Nov 19 12:29:11 UTC 2020
After a few months of distractions and finishing up my last job, I've given
myself a few months to focus on serialization and a few ideas following on
from the previous messages and a few of my own ideas. I'll keep my comments
here to what I think is relevant implementation details only.
Before I was distracted, I managed to implement a prototype library
litterat-pep. I also took the GSON parser and tokenizer and implemented a
JSON serializer front end. Interestingly, with the litterat-pep library
abstracting away the object projection/embedding, the JSON implementation
that ties PEP to the tokenizer became one file:
https://github.com/litterat/litterat-json/blob/master/src/main/java/io/litterat/json/JsonMapper.java
I've found this quite encouraging and is a good proof point that things are
heading in the right direction. However, I've hit on an implementation
detail which would be interesting to get an opinion.
Using the simple immutable point as a starting point.
class Point {
private final float x;
private final float y;
public Point(float x, float y) { this.x=x; this.y=y; }
public float x() { return x; }
public float y() { return y; }
}
There's two ways to deserialize an object like this. Using the public
interface and a constructor method handle requires allocating an array and
autoboxing the float values. This is the way that JsonMapper is currently
implemented(lines 194-211 in JsonMapper). Pseudo code looks something like:
Object[] values = new Object[2];
values[0] = input.readFloat();
values[1] = input.readFloat();
return constructor.invoke( values );
However, the internal Java serialization can construct and set fields
something like:
Object result = Point.newInstance(); // Bypass the constructor and create
an empty object.
fieldXSet(result, input.readFloat() ); // Call method handle that sets
the field directly. Combining the readFloat and fieldSet method handles
creates one handle that does both.
fieldYSet(result, input.readFloat() );
return result;
Using the method handles and field setters, the internal library has
bypassed two autoboxed Floats and the creation/destruction of the Object[].
For a JSON parser, I wouldn't be too worried about this as
text parsing is already expensive and the overhead wouldn't add much,
however, in a binary serialization this overhead could add up. In older
Java versions I've observed this type of autoboxing in serialization put
huge pressure on the garbage collector.
Before getting stuck on this issue. Should I care? Will the later Java
compiler versions eventually see that the float values don't need to be
autoboxed and the Object[] could be put on the stack?
If this PEP library is to be nice and adhere to not setting private final
fields directly and use the public constructor, I'm left wondering if
there's any way to improve the performance of the first solution without
waiting for the optimizer to kick in?
The only potential solution I've thought of so far is to get the front end
serializer to create a MethodHandle that looks like:
constructor( input.readFloat(), input.readFloat() );
The problem with this is that the values must be serialized in the correct
order. This would potentially be ok for some binary formats.
Also, following up from the previous message, I've started putting together
a catalog of data examples.
https://github.com/litterat/litterat-pep/tree/master/src/test/java/io/litterat/pep/data
I'm in the middle of a refactor at the moment and will republish in a few
weeks with potentially a binary format front end as well as JSON.
Regards,
David.
More information about the amber-dev
mailing list