Re: RFR(s): 8133977 add specification for serial form of immutable collections
CollSer should not be public, especially not just for serialization reasons. Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310 Stephen On 17 May 2016 8:29 a.m., "Peter Levart" <peter.levart@gmail.com> wrote: Hi Stuart, On 05/17/2016 12:48 AM, Stuart Marks wrote:
Hi all,
Please review this changeset that adds specifications of the serialized forms (really, a single serialization proxy class) for the immutable collections implementation. There are no code changes in this changeset, just documentation.
It's somewhat odd, but the class doc for the serialization proxy isn't actually included in the serialized-form.html output. I had to jigger around the method doc for readResolve() to include some general information about the class.
I also added links manually from the List, Map, and Set interfaces to the serialized form and vice-versa. I'm not aware of another way for a private class to link to its (proxied) serialized form.
I was able to coerce specdiff into giving a diff of serialized-form.html. It's not very convenient, though; like serialized-form.html, the html diff is one big file. The only difference is the addition of java.util.CollSer.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8133977/webrev.0/
API specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/api.specdiff/o...
serialized-form.html diff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/serial.specdif...
Thanks,
s'marks
Perhaps java.util.CollSer could be promoted to be a public class. It does, after all, specify the API (serialization form == API). I don't see a point in hiding it and then jiggering around the method doc for readResolve()... You could just restrict its use by keeping the constructor(s) package-private... What do you think? Regards, Peter
On 05/17/2016 10:55 AM, Stephen Colebourne wrote:
CollSer should not be public, especially not just for serialization reasons.
I don't see a compelling reason why. Javadocs mention it by name. By making it Serializable, it is effectively public, so it can't go away or be renamed. What is gained by hiding it? Regards, Peter
Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310
Stephen On 17 May 2016 8:29 a.m., "Peter Levart" <peter.levart@gmail.com> wrote:
Hi Stuart,
On 05/17/2016 12:48 AM, Stuart Marks wrote:
Hi all,
Please review this changeset that adds specifications of the serialized forms (really, a single serialization proxy class) for the immutable collections implementation. There are no code changes in this changeset, just documentation.
It's somewhat odd, but the class doc for the serialization proxy isn't actually included in the serialized-form.html output. I had to jigger around the method doc for readResolve() to include some general information about the class.
I also added links manually from the List, Map, and Set interfaces to the serialized form and vice-versa. I'm not aware of another way for a private class to link to its (proxied) serialized form.
I was able to coerce specdiff into giving a diff of serialized-form.html. It's not very convenient, though; like serialized-form.html, the html diff is one big file. The only difference is the addition of java.util.CollSer.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8133977/webrev.0/
API specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/api.specdiff/o...
serialized-form.html diff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/serial.specdif...
Thanks,
s'marks
Perhaps java.util.CollSer could be promoted to be a public class. It does, after all, specify the API (serialization form == API). I don't see a point in hiding it and then jiggering around the method doc for readResolve()... You could just restrict its use by keeping the constructor(s) package-private...
What do you think?
Regards, Peter
On 5/17/16 3:33 AM, Peter Levart wrote:
On 05/17/2016 10:55 AM, Stephen Colebourne wrote:
CollSer should not be public, especially not just for serialization reasons.
I don't see a compelling reason why. Javadocs mention it by name. By making it Serializable, it is effectively public, so it can't go away or be renamed. What is gained by hiding it?
CollSer should be kept private in the API. I think we need to be careful about terminology here. The 'P' in API stands for *programming* and CollSer is not used by applications or by higher-level libraries. The serialization format is "public" but in a wholly different sense from the API. The format is specified so that it's compatible across different JDK implementations and versions. By necessity, the string names of classes are visible in this format, but this isn't a programming interface. I'm happy to remove mention of CollSer from the API specification. s'marks
On 17 May 2016, at 09:55, Stephen Colebourne <scolebourne@joda.org> wrote:
CollSer should not be public, especially not just for serialization reasons.
Right, and I see no reason to refer to it by name in the javadoc, the link should be sufficient.
Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310
The JSR 310 Serialization framework is a little more involved, as you know. I wonder if it is worth following that pattern more closely here? That way java.util.Ser could be a generic proxy for all new Serializable types in the java.util package, and not just the immutable collections. The 310 pattern also results in a reasonable place to document the serial form. -Chris.
Stephen On 17 May 2016 8:29 a.m., "Peter Levart" <peter.levart@gmail.com> wrote:
Hi Stuart,
On 05/17/2016 12:48 AM, Stuart Marks wrote:
Hi all,
Please review this changeset that adds specifications of the serialized forms (really, a single serialization proxy class) for the immutable collections implementation. There are no code changes in this changeset, just documentation.
It's somewhat odd, but the class doc for the serialization proxy isn't actually included in the serialized-form.html output. I had to jigger around the method doc for readResolve() to include some general information about the class.
I also added links manually from the List, Map, and Set interfaces to the serialized form and vice-versa. I'm not aware of another way for a private class to link to its (proxied) serialized form.
I was able to coerce specdiff into giving a diff of serialized-form.html. It's not very convenient, though; like serialized-form.html, the html diff is one big file. The only difference is the addition of java.util.CollSer.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8133977/webrev.0/
API specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/api.specdiff/o...
serialized-form.html diff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/serial.specdif...
Thanks,
s'marks
Perhaps java.util.CollSer could be promoted to be a public class. It does, after all, specify the API (serialization form == API). I don't see a point in hiding it and then jiggering around the method doc for readResolve()... You could just restrict its use by keeping the constructor(s) package-private...
What do you think?
Regards, Peter
I've just checked, and yes the 310 Ser class is more involved (and more efficient) as it implements Externalizable. (Externalizable saves bytes over Serializable, by taking full control of the output IIRC). While criticisms of the 310 design are welcome, it was carefully crafted to be a generally useful package-based pattern for serialization. Following exactly that pattern here would seem desirable if possible. ie. a single Ser class for all new serialized forms in java.util, starting with immutable collections. Stephen On 17 May 2016 at 11:43, Chris Hegarty <chris.hegarty@oracle.com> wrote:
On 17 May 2016, at 09:55, Stephen Colebourne <scolebourne@joda.org> wrote:
CollSer should not be public, especially not just for serialization reasons.
Right, and I see no reason to refer to it by name in the javadoc, the link should be sufficient.
Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310
The JSR 310 Serialization framework is a little more involved, as you know. I wonder if it is worth following that pattern more closely here? That way java.util.Ser could be a generic proxy for all new Serializable types in the java.util package, and not just the immutable collections. The 310 pattern also results in a reasonable place to document the serial form.
-Chris.
Stephen On 17 May 2016 8:29 a.m., "Peter Levart" <peter.levart@gmail.com> wrote:
Hi Stuart,
On 05/17/2016 12:48 AM, Stuart Marks wrote:
Hi all,
Please review this changeset that adds specifications of the serialized forms (really, a single serialization proxy class) for the immutable collections implementation. There are no code changes in this changeset, just documentation.
It's somewhat odd, but the class doc for the serialization proxy isn't actually included in the serialized-form.html output. I had to jigger around the method doc for readResolve() to include some general information about the class.
I also added links manually from the List, Map, and Set interfaces to the serialized form and vice-versa. I'm not aware of another way for a private class to link to its (proxied) serialized form.
I was able to coerce specdiff into giving a diff of serialized-form.html. It's not very convenient, though; like serialized-form.html, the html diff is one big file. The only difference is the addition of java.util.CollSer.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8133977/webrev.0/
API specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/api.specdiff/o...
serialized-form.html diff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/serial.specdif...
Thanks,
s'marks
Perhaps java.util.CollSer could be promoted to be a public class. It does, after all, specify the API (serialization form == API). I don't see a point in hiding it and then jiggering around the method doc for readResolve()... You could just restrict its use by keeping the constructor(s) package-private...
What do you think?
Regards, Peter
On 17 May 2016, at 12:38, Stephen Colebourne <scolebourne@joda.org> wrote:
I've just checked, and yes the 310 Ser class is more involved (and more efficient) as it implements Externalizable. (Externalizable saves bytes over Serializable, by taking full control of the output IIRC).
Right. That is my understanding also.
While criticisms of the 310 design are welcome, it was carefully crafted to be a generally useful package-based pattern for serialization.
It is. No criticism here. In fact, the opposite. That is why I suggested following it.
Following exactly that pattern here would seem desirable if possible. ie. a single Ser class for all new serialized forms in java.util, starting with immutable collections.
It would be nice, and generally useful going forward. -Chris.
Stephen
On 17 May 2016 at 11:43, Chris Hegarty <chris.hegarty@oracle.com> wrote:
On 17 May 2016, at 09:55, Stephen Colebourne <scolebourne@joda.org> wrote:
CollSer should not be public, especially not just for serialization reasons.
Right, and I see no reason to refer to it by name in the javadoc, the link should be sufficient.
Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310
The JSR 310 Serialization framework is a little more involved, as you know. I wonder if it is worth following that pattern more closely here? That way java.util.Ser could be a generic proxy for all new Serializable types in the java.util package, and not just the immutable collections. The 310 pattern also results in a reasonable place to document the serial form.
-Chris.
Stephen On 17 May 2016 8:29 a.m., "Peter Levart" <peter.levart@gmail.com> wrote:
Hi Stuart,
On 05/17/2016 12:48 AM, Stuart Marks wrote:
Hi all,
Please review this changeset that adds specifications of the serialized forms (really, a single serialization proxy class) for the immutable collections implementation. There are no code changes in this changeset, just documentation.
It's somewhat odd, but the class doc for the serialization proxy isn't actually included in the serialized-form.html output. I had to jigger around the method doc for readResolve() to include some general information about the class.
I also added links manually from the List, Map, and Set interfaces to the serialized form and vice-versa. I'm not aware of another way for a private class to link to its (proxied) serialized form.
I was able to coerce specdiff into giving a diff of serialized-form.html. It's not very convenient, though; like serialized-form.html, the html diff is one big file. The only difference is the addition of java.util.CollSer.
Webrev:
http://cr.openjdk.java.net/~smarks/reviews/8133977/webrev.0/
API specdiff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/api.specdiff/o...
serialized-form.html diff:
http://cr.openjdk.java.net/~smarks/reviews/8133977/specdiff.0/serial.specdif...
Thanks,
s'marks
Perhaps java.util.CollSer could be promoted to be a public class. It does, after all, specify the API (serialization form == API). I don't see a point in hiding it and then jiggering around the method doc for readResolve()... You could just restrict its use by keeping the constructor(s) package-private...
What do you think?
Regards, Peter
On 5/17/16 3:43 AM, Chris Hegarty wrote:
The JSR 310 Serialization framework is a little more involved, as you know. I wonder if it is worth following that pattern more closely here? That way java.util.Ser could be a generic proxy for all new Serializable types in the java.util package, and not just the immutable collections. The 310 pattern also results in a reasonable place to document the serial form.
The proposed java.util.CollSer follows the java.time.Ser pattern in that they're both common serialization proxies for several different implementation classes. I'm not sure exactly what you have in mind by suggesting that the pattern be followed "more closely" though. I don't think it's possible to have a single form for all new serializable objects in java.util. The java.util package isn't as cohesive as java.time. There's a bunch of random stuff in it. Consider the non-serializable things currently in java.util: *SummaryStatistics, Optional, Formatter, ResourceBundle, Scanner, ServiceLoader, StringJoiner, Timer If any of these were to be made serializable, I don't think it would make sense for it to share the serialized form with CollSer. Now, for new *collection* classes that might be added to java.util, sure. I can imagine adding new collections, immutable or mutable, and they could use CollSer as their serialized form. Those could be accommodated easily by adding new values for the "kind" bits of the flags field. What other changes would need to be made to CollSer to prepare for that? s'marks
Hi Stuart, On 05/17/2016 09:03 PM, Stuart Marks wrote:
On 5/17/16 3:43 AM, Chris Hegarty wrote:
The JSR 310 Serialization framework is a little more involved, as you know. I wonder if it is worth following that pattern more closely here? That way java.util.Ser could be a generic proxy for all new Serializable types in the java.util package, and not just the immutable collections. The 310 pattern also results in a reasonable place to document the serial form.
The proposed java.util.CollSer follows the java.time.Ser pattern in that they're both common serialization proxies for several different implementation classes. I'm not sure exactly what you have in mind by suggesting that the pattern be followed "more closely" though.
I don't think it's possible to have a single form for all new serializable objects in java.util. The java.util package isn't as cohesive as java.time. There's a bunch of random stuff in it. Consider the non-serializable things currently in java.util:
*SummaryStatistics, Optional, Formatter, ResourceBundle, Scanner, ServiceLoader, StringJoiner, Timer
If any of these were to be made serializable, I don't think it would make sense for it to share the serialized form with CollSer.
They would not share the form. The java.time.Ser does not specify serialized form by itself (short of a single byte stream prefix that selects the sub-format/implementaion typically hosted in the implementation classes themselves). java.time.Ser is just an adapter that allows for Externalizable-like functionality of implementations but not requiring them to implement a public no-arg constructor that constructs uninitialized instances. So all above mentioned classes could simply share a single java.util.Ser serialization proxy however different they are. But there's a different problem I see with java.time.Ser as is. Namely, how does it provide for serialization format to evolve over time. Suppose, hypothetically, that java.time.Duration would like to have picosecond precision in JDK 16. How would you change the serialization format/implementation that it would be backwards and forwards compatible? Having different values in the serialization format "tagged" with field names provides for that functionality. So in that respect, java.util.CollSer is more evolvable than java.time.Ser. Regards, Peter
On 5/17/16 1:09 PM, Peter Levart wrote:
I don't think it's possible to have a single form for all new serializable objects in java.util. The java.util package isn't as cohesive as java.time. There's a bunch of random stuff in it. Consider the non-serializable things currently in java.util:
*SummaryStatistics, Optional, Formatter, ResourceBundle, Scanner, ServiceLoader, StringJoiner, Timer
If any of these were to be made serializable, I don't think it would make sense for it to share the serialized form with CollSer. They would not share the form. The java.time.Ser does not specify serialized form by itself (short of a single byte stream prefix that selects the sub-format/implementaion typically hosted in the implementation classes themselves). java.time.Ser is just an adapter that allows for Externalizable-like functionality of implementations but not requiring them to implement a public no-arg constructor that constructs uninitialized instances. So all above mentioned classes could simply share a single java.util.Ser serialization proxy however different they are. OK, I guess I overstated this by saying that I didn't think it was possible. Of course it's possible, given sufficient hacking.
But a unified "java.util.Ser" proxy for everything in java.util would have the objects share the same serialized form in that they'd share the serialized object header, serial version UID, class descriptor, and fields (if any). Clearly the responsibility for handling actual instance data would be delegated back to the classes themselves. I just don't think there's any advantage to doing this for unrelated classes. Indeed, it's a disadvantage to couple together the serial forms of classes that are otherwise unrelated. s'marks
My original blog on the topic was in 2010: http://blog.joda.org/2010/02/serialization-shared-delegates_9131.html Bear in mind that a key reason for sharing the serialization proxy is to share the "serialized object header, serial version UID, class descriptor" etc. It is that header overhead that is the main reason for serialization being so space inefficient on the wire. It is thus a positive thing that "unrelated classes" share the proxy. JSR-310 goes to great lengths to save bytes in the stream - see LocalTime for example. IMO, it would be really good to see serialization move to a single package-level shared proxy in java.util as well, as it would dramatically reduce many stream sizes (as per the blog post). So, the key aspects of the pattern that I see are: - shared between multiple classes - use a flag (byte) to distinguish classes - top level class with a short name - externalizable, not serializable JSR-310 chooses to delegate the actual logic back to the class itself, but this is not required by the pattern. What CollSer does not do is implement Externalizable. And as I've argued, I believe it is a *good* thing to share a Ser class across a package (to overcome the limitations of the ancient Serialization spec). Anyway, I've done half the work for you ;-) https://gist.github.com/jodastephen/2bb70e1f1180b030d46b5a6366c0a0c4 With a collection of 1 string, CollSer uses 136 bytes while my Externalizable Ser uses 58. With a collection of 3 strings, CollSer uses 171 bytes while my Externalizable Ser uses 87. These are the contents of the stream. As can be seen, the Externalizable form avoids two java.lang.Object references. CollSer: 136 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0] 171 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]sq[0]~[0][0][0][0][0][1]uq[0]~[0][3][0][0][0][3]t[0][1]at[0][2]bbt[0][3]ccc Ser: 58 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]x 87 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]xsq[0]~[0][0]w[5][1][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx While I understand the forward/backward compatibility argument of (int & 0xff) however I'm unconvinced of the need. If things change, I don't expect the new format to be loadable in an earlier JDK version. What is not in doubt is that what is proposed is incredibly wasteful. In all current known cases, just 2 bits out of 32 are being used. Altering the current logic to use (byte & 0xf) would have the same effect. Unless you have some specific future changes in mind that need 24 bits of flag/data, the likelihood is that YAGNI applies. (The main question for the future is going to be value types, and how the serialized form evolves to support reified collections. I suspect that this will require writing out the class reference of the reified collection content, something that will need more than 24 bits anyway). Stephen On 18 May 2016 at 00:33, Stuart Marks <stuart.marks@oracle.com> wrote:
On 5/17/16 1:09 PM, Peter Levart wrote:
I don't think it's possible to have a single form for all new serializable objects in java.util. The java.util package isn't as cohesive as java.time. There's a bunch of random stuff in it. Consider the non-serializable things currently in java.util:
*SummaryStatistics, Optional, Formatter, ResourceBundle, Scanner, ServiceLoader, StringJoiner, Timer
If any of these were to be made serializable, I don't think it would make sense for it to share the serialized form with CollSer.
They would not share the form. The java.time.Ser does not specify serialized form by itself (short of a single byte stream prefix that selects the sub-format/implementaion typically hosted in the implementation classes themselves). java.time.Ser is just an adapter that allows for Externalizable-like functionality of implementations but not requiring them to implement a public no-arg constructor that constructs uninitialized instances. So all above mentioned classes could simply share a single java.util.Ser serialization proxy however different they are.
OK, I guess I overstated this by saying that I didn't think it was possible. Of course it's possible, given sufficient hacking.
But a unified "java.util.Ser" proxy for everything in java.util would have the objects share the same serialized form in that they'd share the serialized object header, serial version UID, class descriptor, and fields (if any). Clearly the responsibility for handling actual instance data would be delegated back to the classes themselves.
I just don't think there's any advantage to doing this for unrelated classes. Indeed, it's a disadvantage to couple together the serial forms of classes that are otherwise unrelated.
s'marks
On 5/18/16 2:42 AM, Stephen Colebourne wrote:
My original blog on the topic was in 2010: http://blog.joda.org/2010/02/serialization-shared-delegates_9131.html
Bear in mind that a key reason for sharing the serialization proxy is to share the "serialized object header, serial version UID, class descriptor" etc. It is that header overhead that is the main reason for serialization being so space inefficient on the wire. It is thus a positive thing that "unrelated classes" share the proxy.
JSR-310 goes to great lengths to save bytes in the stream - see LocalTime for example. IMO, it would be really good to see serialization move to a single package-level shared proxy in java.util as well, as it would dramatically reduce many stream sizes (as per the blog post). So, the key aspects of the pattern that I see are: - shared between multiple classes - use a flag (byte) to distinguish classes - top level class with a short name - externalizable, not serializable
The primary goal of the serialization proxy in this case is to prevent the concrete collections implementation classes from leaking into the serial format. Another major goal is to provide for backward *and* forward compatibility. Minimizing the serial stream size is nice, but is less important than compatibility. I'd like to set aside this notion of a "single package-level shared proxy" for java.util. There are too many other unrelated things already in java.util that have their own serial formats that cannot be changed. It's too much of a blanket statement to say that "all new serializable things in java.util" should use a single proxy, since we have zero examples of this, and they potentially could have arbitrarily different requirements for their serial formats. Future-proofing the serial proxy for future *collections* implementations in java.util is quite sensible, though.
JSR-310 chooses to delegate the actual logic back to the class itself, but this is not required by the pattern. What CollSer does not do is implement Externalizable. And as I've argued, I believe it is a *good* thing to share a Ser class across a package (to overcome the limitations of the ancient Serialization spec).
OK, it's good to know you don't consider the back-delegation to be required. It's a fairly prominent feature of the java.time classes. I was concerned that was what was being proposed here, and it would be a fairly intrusive change.
Anyway, I've done half the work for you ;-) https://gist.github.com/jodastephen/2bb70e1f1180b030d46b5a6366c0a0c4
With a collection of 1 string, CollSer uses 136 bytes while my Externalizable Ser uses 58. With a collection of 3 strings, CollSer uses 171 bytes while my Externalizable Ser uses 87.
These are the contents of the stream. As can be seen, the Externalizable form avoids two java.lang.Object references.
CollSer: 136 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0] 171 [ac][ed][0][5]sr[0]!com.opengamma.strata.calc.CollSerW[8e][ab][b6]:[1b][a8][11][2][0][2]I[0][5]flags[[0][5]arrayt[0][13][Ljava/lang/Object;xp[0][0][0][1]ur[0][13][Ljava.lang.Object;[90][ce]X[9f][10]s)l[2][0][0]xp[0][0][0][0]sq[0]~[0][0][0][0][0][1]uq[0]~[0][3][0][0][0][3]t[0][1]at[0][2]bbt[0][3]ccc
Ser: 58 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]x 87 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.SerW[8e][ab][b6]:[1b][a8][11][c][0][0]xpw[5][1][0][0][0][0]xsq[0]~[0][0]w[5][1][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx
Interesting, nice testbed, thanks. It turns out the main culprit here is the field "Object[] array" which is responsible for most of the bulk. (For others' edification, the serial stream contains descriptions of fields including the type names, which is "[Ljava/lang/Object;". And when the array is serialized, its class descriptor, including its name -- again -- is included.) This suggests an easy way to reduce the bulk of the serial data, which is to make the Object[] field transient and to use custom serial data to write the array's length followed by its contents. (This is similar to what the other collections' serial forms do.) After renaming the modified class "Se3" to match the length of the name "Ser", and renaming the "flags" field to "tag" to save a couple more bytes, running the serialization tester gets the following: 63 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]x 91 [ac][ed][0][5]sr[0][1d]com.opengamma.strata.calc.Se3W[8e][ab][b6]:[1b][a8][11][3][0][1]I[0][3]tagxpw[4][0][0][0][0]xsq[0]~[0][0]w[4][0][0][0][3]t[0][1]at[0][2]bbt[0][3]cccx This is only a handful of bytes larger than the Externalizable alternative.
While I understand the forward/backward compatibility argument of (int & 0xff) however I'm unconvinced of the need. If things change, I don't expect the new format to be loadable in an earlier JDK version.
Backward and forward serial compatibility has historically been an issue for serializable classes in the JDK. I'm not willing to shave off a few more bytes in order to compromise this. s'marks
On 5/17/16 1:55 AM, Stephen Colebourne wrote:
Stuart, I disagree with using an int for the flags, it should be a byte. If you need future expansion you can use 0xff to indicate it with the parser reacting accordingly. That is the strategy for JSR 310
These techniques have different goals. I don't see 0xff used at all in the current java.time.Ser class. (Did I miss something?) What I think you mean is that, in a future JDK, a sentinel value like 0xff could be used as the type value, indicating the presence of some additional or changed values following in the data stream. That's fine, but such usage is necessarily forward-incompatible. By this I mean that the JDK 8 implementation of java.time.Ser is incompatible with a future JDK's serialized form that uses this new 0xff type value (or in fact with any other new type value). Deserializing such a thing on JDK 8 would result in an exception. That might be exactly what you want, if some future java.time class cannot possibly be handled on JDK 8. For the proposed java.util.CollSer, exactly the same technique would apply by using a different "kind" value in the low order 8 bits. A future JDK might have some new kind of collection that can't be represented in JDK 9. The future JDK would use a new "kind" value, and deserializing it on JDK 9 should throw an exception. The other 24 bits are for *forward compatible* uses by future JDKs. Suppose a future JDK has a variation on List that uses a different internal storage format that works better in some cases. The future JDK might want to set a hint somewhere when serializing it, so that another instance of a future JDK could pick up this hint when deserializing. If a JDK 9 were to deserialize this, it would ignore the hint and deserialize an ordinary (immutable) List. It's always possible for a serialized form to add fields, which are specified to be ignored by older versions. But sometimes you don't want to add an entire new field, and you just need an extra bit. That's what the extra bits in the flags field are for. s'marks
participants (4)
-
Chris Hegarty
-
Peter Levart
-
Stephen Colebourne
-
Stuart Marks