RFR(s): 8133977 add specification for serial form of immutable collections

Stuart Marks stuart.marks at oracle.com
Wed May 18 21:21:10 UTC 2016


On 5/18/16 2:42 AM, Stephen Colebourne wrote:
> My original blog on the topic was in 2010:
> http://blog.joda.org/2010/02/serialization-shared-delegates_9131.html
>
> Bear in mind that a key reason for sharing the serialization proxy is
> to share the "serialized object header, serial version UID, class
> descriptor" etc. It is that header overhead that is the main reason
> for serialization being so space inefficient on the wire. It is thus a
> positive thing that "unrelated classes" share the proxy.
>
> JSR-310 goes to great lengths to save bytes in the stream - see
> LocalTime for example. IMO, it would be really good to see
> serialization move to a single package-level shared proxy in java.util
> as well, as it would dramatically reduce many stream sizes (as per the
> blog post).
> So, the key aspects of the pattern that I see are:
> - shared between multiple classes
> - use a flag (byte) to distinguish classes
> - top level class with a short name
> - externalizable, not serializable

The primary goal of the serialization proxy in this case is to prevent the 
concrete collections implementation classes from leaking into the serial format. 
Another major goal is to provide for backward *and* forward compatibility. 
Minimizing the serial stream size is nice, but is less important than compatibility.

I'd like to set aside this notion of a "single package-level shared proxy" for 
java.util. There are too many other unrelated things already in java.util that 
have their own serial formats that cannot be changed. It's too much of a blanket 
statement to say that "all new serializable things in java.util" should use a 
single proxy, since we have zero examples of this, and they potentially could 
have arbitrarily different requirements for their serial formats.

Future-proofing the serial proxy for future *collections* implementations in 
java.util is quite sensible, though.

> JSR-310 chooses to delegate the actual logic back to the class itself,
> but this is not required by the pattern. What CollSer does not do is
> implement Externalizable. And as I've argued, I believe it is a *good*
> thing to share a Ser class across a package (to overcome the
> limitations of the ancient Serialization spec).

OK, it's good to know you don't consider the back-delegation to be required. 
It's a fairly prominent feature of the java.time classes. I was concerned that 
was what was being proposed here, and it would be a fairly intrusive change.

> Anyway, I've done half the work for you ;-)
> https://gist.github.com/jodastephen/2bb70e1f1180b030d46b5a6366c0a0c4
>
> With a collection of 1 string, CollSer uses 136 bytes while my
> Externalizable Ser uses 58.
> With a collection of 3 strings, CollSer uses 171 bytes while my
> Externalizable Ser uses 87.
>
> These are the contents of the stream. As can be seen, the
> Externalizable form avoids two java.lang.Object references.
>
> CollSer:
> 136 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]
> 171 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]sq[0]~[0][0][0][0][0][1]uq[0]~[0][3][0][0][0][3]t[0][1]at[0][2]bbt[0][3]ccc
>
> Ser:
> 58 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]x
> 87 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]xsq[0]~[0][0]w[5][1][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx

Interesting, nice testbed, thanks.

It turns out the main culprit here is the field "Object[] array" which is 
responsible for most of the bulk. (For others' edification, the serial stream 
contains descriptions of fields including the type names, which is 
"[Ljava/lang/Object;". And when the array is serialized, its class descriptor, 
including its name -- again -- is included.)

This suggests an easy way to reduce the bulk of the serial data, which is to 
make the Object[] field transient and to use custom serial data to write the 
array's length followed by its contents. (This is similar to what the other 
collections' serial forms do.)

After renaming the modified class "Se3" to match the length of the name "Ser", 
and renaming the "flags" field to "tag" to save a couple more bytes, running the 
serialization tester gets the following:

63 
[ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]x
91 
[ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]xsq[0]~[0][0]w[4][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx

This is only a handful of bytes larger than the Externalizable alternative.

> While I understand the forward/backward compatibility argument of (int
> & 0xff) however I'm unconvinced of the need. If things change, I don't
> expect the new format to be loadable in an earlier JDK version.

Backward and forward serial compatibility has historically been an issue for 
serializable classes in the JDK. I'm not willing to shave off a few more bytes 
in order to compromise this.

s'marks



More information about the core-libs-dev mailing list