RFR(s): 8133977 add specification for serial form of immutable collections

Stephen Colebourne scolebourne at joda.org
Wed May 18 09:42:22 UTC 2016


My original blog on the topic was in 2010:
http://blog.joda.org/2010/02/serialization-shared-delegates_9131.html

Bear in mind that a key reason for sharing the serialization proxy is
to share the "serialized object header, serial version UID, class
descriptor" etc. It is that header overhead that is the main reason
for serialization being so space inefficient on the wire. It is thus a
positive thing that "unrelated classes" share the proxy.

JSR-310 goes to great lengths to save bytes in the stream - see
LocalTime for example. IMO, it would be really good to see
serialization move to a single package-level shared proxy in java.util
as well, as it would dramatically reduce many stream sizes (as per the
blog post).
So, the key aspects of the pattern that I see are:
- shared between multiple classes
- use a flag (byte) to distinguish classes
- top level class with a short name
- externalizable, not serializable

JSR-310 chooses to delegate the actual logic back to the class itself,
but this is not required by the pattern. What CollSer does not do is
implement Externalizable. And as I've argued, I believe it is a *good*
thing to share a Ser class across a package (to overcome the
limitations of the ancient Serialization spec).

Anyway, I've done half the work for you ;-)
https://gist.github.com/jodastephen/2bb70e1f1180b030d46b5a6366c0a0c4

With a collection of 1 string, CollSer uses 136 bytes while my
Externalizable Ser uses 58.
With a collection of 3 strings, CollSer uses 171 bytes while my
Externalizable Ser uses 87.

These are the contents of the stream. As can be seen, the
Externalizable form avoids two java.lang.Object references.

CollSer:
136 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]
171 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]sq[0]~[0][0][0][0][0][1]uq[0]~[0][3][0][0][0][3]t[0][1]at[0][2]bbt[0][3]ccc

Ser:
58 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]x
87 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]xsq[0]~[0][0]w[5][1][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx



While I understand the forward/backward compatibility argument of (int
& 0xff) however I'm unconvinced of the need. If things change, I don't
expect the new format to be loadable in an earlier JDK version. What
is not in doubt is that what is proposed is incredibly wasteful. In
all current known cases, just 2 bits out of 32 are being used.
Altering the current logic to use (byte & 0xf) would have the same
effect. Unless you have some specific future changes in mind that need
24 bits of flag/data, the likelihood is that YAGNI applies.

(The main question for the future is going to be value types, and how
the serialized form evolves to support reified collections. I suspect
that this will require writing out the class reference of the reified
collection content, something that will need more than 24 bits
anyway).

Stephen


On 18 May 2016 at 00:33, Stuart Marks <stuart.marks at oracle.com> wrote:
> On 5/17/16 1:09 PM, Peter Levart wrote:
>>>
>>> I don't think it's possible to have a single form for all new
>>> serializable objects in java.util. The java.util package isn't as cohesive
>>> as java.time. There's a bunch of random stuff in it. Consider the
>>> non-serializable things currently in java.util:
>>>
>>>     *SummaryStatistics, Optional, Formatter, ResourceBundle,
>>>     Scanner, ServiceLoader, StringJoiner, Timer
>>>
>>> If any of these were to be made serializable, I don't think it would make
>>> sense for it to share the serialized form with CollSer.
>>
>> They would not share the form. The java.time.Ser does not specify
>> serialized form by itself (short of a single byte stream prefix that selects
>> the sub-format/implementaion typically hosted in the implementation classes
>> themselves). java.time.Ser is just an adapter that allows for
>> Externalizable-like functionality of implementations but not requiring them
>> to implement a public no-arg constructor that constructs uninitialized
>> instances. So all above mentioned classes could simply share a single
>> java.util.Ser serialization proxy however different they are.
>
> OK, I guess I overstated this by saying that I didn't think it was possible.
> Of course it's possible, given sufficient hacking.
>
> But a unified "java.util.Ser" proxy for everything in java.util would have
> the objects share the same serialized form in that they'd share the
> serialized object header, serial version UID, class descriptor, and fields
> (if any). Clearly the responsibility for handling actual instance data would
> be delegated back to the classes themselves.
>
> I just don't think there's any advantage to doing this for unrelated
> classes. Indeed, it's a disadvantage to couple together the serial forms of
> classes that are otherwise unrelated.
>
> s'marks
>



More information about the core-libs-dev mailing list