Implementing Towards Better PEP/Serialization

Sat Dec 12 10:07:46 UTC 2020

Hi Suminda, and thanks for the links. I've noticed that a few of these are
benchmarked on the Java serialization benchmarking site [1]. I'll
definitely have a look at implementing the benchmarking code, however, this
would be more out of curiosity than anything else. The point here is not to
implement the fastest serialization or even the most compact. I suppose,
there's an underlying question to be answered. Which in summary is, why
implement another serialization library and what's the purpose of this
implementation when so many are already available?

I started looking at Serialization again after reading Brian's Toward's
Better Serialization which outlines some of the problems with Java's
inbuilt seriazliation. This tweet I really like from Brian, also got me
back into thinking more about it.

"It's easy to get caught up on surface syntax, but the problem runs much
deeper: every one of these tools thinks it can get away with "just modeling
data", when in reality they are creating a terrible, ad-hoc language without
the benefit of language design. Syntax is deck chairs."

To partly answer my own question, part of the answer is to investigate the
Java language/data interface which is the point of the PEP library [2]. The
PEP library jumps through hoops not to use Unsafe or directly write to
private fields. These solutions might come some way to inform future
libraries when Java tightens security further and stops allowing this
functionality. I suppose Brian and team need to decide if they can
deprecate Java serialization or adapt it to work in a tightened security
model, so maybe PEP can help inform those decisions.

The second part has been to implement some ideas that have been bouncing
around in my head for 20+ years (ie the data independent schema [3]) which
in theory might answer Brian's point about ad-hoc data languages. This also
gives me excuse to implement serialization using the PEP library and see
what issues I find. The actual data format syntax (the deck chairs) hasn't
really been thought about deeply yet. I would expect that there's some good
design practices that have been learned by other formats that I haven't
even started to explore.

That was a very long winded way of saying I've got an implementation, but
without requirements or project to work against, this is just another
serialization solution looking for a problem to solve. Over the next couple
of months I'll finish the implementation and look at benchmarking, but
without a user in mind it probably won't go much beyond that. While I can
claim some tenous link to Amber and Java serialization I'll continue to
post my findings to the list. :)

Regards,
David.

[1] https://github.com/eishay/jvm-serializers/wiki
[2] https://github.com/litterat/litterat/tree/main/litterat-pep
[3] https://github.com/litterat/litterat/tree/main/litterat-schema

On Sat, Dec 12, 2020 at 3:45 PM Suminda Sirinath Salpitikorala Dharmasena <
sirinath1978m at gmail.com> wrote:

> Add https://github.com/protostuff/protostuff for benchmarking also.
>
> Benchmark for:
> - serialisations speed
> - deserialization speed
> - size
>
>>