The future of Serialization
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams. The burden of implementing Serializable can significantly hamper developers efforts when refactoring, it's quite common for some projects to make no guarantee regarding Serialization compatibility between releases. Also implementation of Serializable can double project development hours, hamper future development and increase software maintenance costs. Serialization also presents opportunities for attackers and has been responsible for a number of zero day exploits. I don't know if isolates will be included with JDK 9 for Jigsaw, or whether ClassLoaders alone will provide isolation for modules. The ability to limit visibility and provide isolation of implementation classes as well as providing limits on memory and threads for isolated modules would also improve platform security. Serialization may provide a means to hot upgrade modules, but more flexible options that doesn't cause serial data lock in need to be developed. Should Serializable eventually be deprecated? Should Serialization be disabled by default? Should a new mechanism be developed? If a new mechanism is developed, what about circular object relationships? Regards, Peter.
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
I sense your frustration, but I think you may be reaching the wrong conclusion. The lack of response is probably not evidence that there's no interest in fixing serialization; its that fixing serialization, with all the constraints that "fix" entails, is just really really hard, and its much easier to complain about it (and even say "let's just get rid of it") than to fix it.
Should Serializable eventually be deprecated? Should Serialization be disabled by default? Should a new mechanism be developed? If a new mechanism is developed, what about circular object relationships?
As I delved into my own explorations of serialization, I started to realize why such a horrible approach was the one that was ultimately chosen; while serialization is horrible and awful and leaky and insecure and complex and brittle, it does address problems like cyclic data structures and independent evolution of subclass and superclass better than the "clean" models. My conclusion is, at best, a new mechanism would have to live side-by-side with the old one, since it could only handle 95% of the cases. It might handle those 95% much better -- more cleanly, securely, and allowing easier schema evolution -- but the hard cases are still there. Still, reducing the use of the horrible old mechanism may still be a worthy goal, even if it can't be killed outright.
Brian, Thanks for picking up on my frustration ;) I have something in mind for Serializable2 to address cyclic data structures and the possibility of independant evolution of super and child classes, while retaining a relatively clean public api, with one optional private method. The methods and interfaces proposed are suitable for any alternative ObjectInput and ObjectOutput implementation. An interface exists in Apache River, it's called Startable, it has one method: public void start() throws Exception; It's called by a framework to allow an Object to start threads, publish "this" or throw an exception after construction. The intent is to allow an object to be immutable with final fields and be provided with a thread of execution after construction and before publication. Something similar can be used to wire up circular relations, let met explain: Every class that implements Serializable has one thing in common, the Serialization protocol and every Object instance of a Serializable class has an arbitrary serial form. I propose a final class representing SerialForm for an object, that cannot be extended, requires privilege to instantiate and also performs method guard security checks, for all callers with the exception of a calling class reading or writing its own serial form. SerialForm needs a parameter field key identity represented by the calling Class, the method name and the field's Class type, this key would be used for both writing and retrieving a field entry in SerialForm. SerialForm will also provide a method to advise if a field key contains a circular relation, any field entry in SerialForm that would contain a circular relation is not populated until after construction of the current object is complete. An arbitrary Serializable2 Object instance may be composed of a hierarchy of classes, each belonging to a separate ProtectionDomain. For the following interface: public interface Serializable2 { void writeObject(SerialForm serial) throws IOException; } Implementers of Serializable2 must: 1. Implement writeObject 2. Implement a constructor with the signature: (SerialForm serial). Implementors that need to check invariants, delay throwing an Exception, publish "this" or set a circular reference after construction should: 4. Implement: private void readObjectNoData() throws InvalidObjectException; Child class implementations should: 5. Call their super class writeObject method and superclass constructor, but may call any super class constructor or methods. Compatibility and Evolution: 1. Fields can be included or omitted from SerialForm, by an implementation, without breaking compatibility, provided a null reference is accepted during deserialization. 2. Child classes in a hierarchy; all Serializable2 implementing superclass constructors have the same signature; the superclass implementation can be substituted, without breaking child class deserialization (provided this is the constructor used by the child class). 3. There is no serialVersionUID. 4. Child class Serializable2 implementations can extend a superclass without a zero arg constructor that doesn't itself implement Serializable2. 5. Child classes that do not override writeObject will not be serialized, so can effectively opt out. 6. Because implementations are required to implement public methods, there is no "Magic". 7. Serializable2 shouldn't extend Serializable, allowing classes to implement both interfaces for a period of time (for that reason the signature for readObjectNoData may need to be changed for Serializable2). 8. ObjectInputStream and ObjectOutputStream can be extended to support both implementations for compatibility, however alternative stream implementations would be preferable for Serializable2 to avoid Serializable security issues. The new implementations should be possible to substitute because both types would use the same Stream Protocol, provided the classes being deserialized implement Serializable2. My reasoning for retaining readObjectNoData() and for updating field entry's in SerialForm that contain circular relations after construction, is: 1. An object reference for the object currently being deserialized can be passed to another object's constructor (via a SerialForm instance) after the current Object's constructor completes, allowing safe publication of final field freezes that occur at the end of construction. 2. When the Serialization2 Framework becomes aware of an object that contains a circular relationship while that object is in the process of being deserialized, the second object will not be instantiated until after the constructor of the first object in the relationship completes. Data read in from the stream can be stored in a SerialForm without requiring object instantation. 3. After construction completes, the object that has just been deserialized can retain a copy of its SerialForm and look up the field containing a circular relationship, the Serialization framework will update its SerialForm with the new object that holds a circular relationship, prior to calling readObjectNoData() on the first object. 4. If the developer of the implementing class is not aware of the possibility of a circular relationship, then the worst consequence is a field will be set to null during construction, "this" will not escape. 5. The second Object holding a link to an object that apears earlier in the stream, may not be aware that the object it holds a reference to also needs a reference to it. The first object will not obtain a reference to the second until both Object constructors have completed. The second object may not need to implement readObjectNoData(). 6. readObjectNoData() needs to be called on every class belonging to a single Object's inheritance hierarchy, when defined, after all constructors have completed, it should be called in the order of superclass to child class. Thoughts? Regards, Peter. On 10/08/2014 3:20 AM, Brian Goetz wrote:
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
I sense your frustration, but I think you may be reaching the wrong conclusion. The lack of response is probably not evidence that there's no interest in fixing serialization; its that fixing serialization, with all the constraints that "fix" entails, is just really really hard, and its much easier to complain about it (and even say "let's just get rid of it") than to fix it.
Should Serializable eventually be deprecated? Should Serialization be disabled by default? Should a new mechanism be developed? If a new mechanism is developed, what about circular object relationships?
As I delved into my own explorations of serialization, I started to realize why such a horrible approach was the one that was ultimately chosen; while serialization is horrible and awful and leaky and insecure and complex and brittle, it does address problems like cyclic data structures and independent evolution of subclass and superclass better than the "clean" models.
My conclusion is, at best, a new mechanism would have to live side-by-side with the old one, since it could only handle 95% of the cases. It might handle those 95% much better -- more cleanly, securely, and allowing easier schema evolution -- but the hard cases are still there. Still, reducing the use of the horrible old mechanism may still be a worthy goal, even if it can't be killed outright.
On 11/08/2014 8:12 PM, Peter Firmstone wrote:
Brian,
Thanks for picking up on my frustration ;)
I have something in mind for Serializable2 to address cyclic data structures and the possibility of independant evolution of super and child classes, while retaining a relatively clean public api, with one optional private method. The methods and interfaces proposed are suitable for any alternative ObjectInput and ObjectOutput implementation.
An interface exists in Apache River, it's called Startable, it has one method:
public void start() throws Exception;
It's called by a framework to allow an Object to start threads, publish "this" or throw an exception after construction. The intent is to allow an object to be immutable with final fields and be provided with a thread of execution after construction and before publication.
Something similar can be used to wire up circular relations, let met explain:
Every class that implements Serializable has one thing in common, the Serialization protocol and every Object instance of a Serializable class has an arbitrary serial form.
I propose a final class representing SerialForm for an object, that cannot be extended, requires privilege to instantiate and also performs method guard security checks, for all callers with the exception of a calling class reading or writing its own serial form. SerialForm needs a parameter field key identity represented by the calling Class,
Sorry, that should read "field name", not "method name".
the method name and the field's Class type, this key would be used for both writing and retrieving a field entry in SerialForm. SerialForm will also provide a method to advise if a field key contains a circular relation, any field entry in SerialForm that would contain a circular relation is not populated until after construction of the current object is complete.
An arbitrary Serializable2 Object instance may be composed of a hierarchy of classes, each belonging to a separate ProtectionDomain.
For the following interface:
public interface Serializable2 {
void writeObject(SerialForm serial) throws IOException;
}
Implementers of Serializable2 must:
1. Implement writeObject 2. Implement a constructor with the signature: (SerialForm serial).
Implementors that need to check invariants, delay throwing an Exception, publish "this" or set a circular reference after construction should:
4. Implement: private void readObjectNoData() throws InvalidObjectException;
Child class implementations should:
5. Call their super class writeObject method and superclass constructor, but may call any super class constructor or methods.
Compatibility and Evolution:
1. Fields can be included or omitted from SerialForm, by an implementation, without breaking compatibility, provided a null reference is accepted during deserialization. 2. Child classes in a hierarchy; all Serializable2 implementing superclass constructors have the same signature; the superclass implementation can be substituted, without breaking child class deserialization (provided this is the constructor used by the child class). 3. There is no serialVersionUID. 4. Child class Serializable2 implementations can extend a superclass without a zero arg constructor that doesn't itself implement Serializable2. 5. Child classes that do not override writeObject will not be serialized, so can effectively opt out. 6. Because implementations are required to implement public methods, there is no "Magic". 7. Serializable2 shouldn't extend Serializable, allowing classes to implement both interfaces for a period of time (for that reason the signature for readObjectNoData may need to be changed for Serializable2). 8. ObjectInputStream and ObjectOutputStream can be extended to support both implementations for compatibility, however alternative stream implementations would be preferable for Serializable2 to avoid Serializable security issues. The new implementations should be possible to substitute because both types would use the same Stream Protocol, provided the classes being deserialized implement Serializable2.
My reasoning for retaining readObjectNoData() and for updating field entry's in SerialForm that contain circular relations after construction, is:
1. An object reference for the object currently being deserialized can be passed to another object's constructor (via a SerialForm instance) after the current Object's constructor completes, allowing safe publication of final field freezes that occur at the end of construction. 2. When the Serialization2 Framework becomes aware of an object that contains a circular relationship while that object is in the process of being deserialized, the second object will not be instantiated until after the constructor of the first object in the relationship completes. Data read in from the stream can be stored in a SerialForm without requiring object instantation. 3. After construction completes, the object that has just been deserialized can retain a copy of its SerialForm and look up the field containing a circular relationship, the Serialization framework will update its SerialForm with the new object that holds a circular relationship, prior to calling readObjectNoData() on the first object. 4. If the developer of the implementing class is not aware of the possibility of a circular relationship, then the worst consequence is a field will be set to null during construction, "this" will not escape. 5. The second Object holding a link to an object that apears earlier in the stream, may not be aware that the object it holds a reference to also needs a reference to it. The first object will not obtain a reference to the second until both Object constructors have completed. The second object may not need to implement readObjectNoData(). 6. readObjectNoData() needs to be called on every class belonging to a single Object's inheritance hierarchy, when defined, after all constructors have completed, it should be called in the order of superclass to child class.
Thoughts?
Regards,
Peter.
On 10/08/2014 3:20 AM, Brian Goetz wrote:
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
I sense your frustration, but I think you may be reaching the wrong conclusion. The lack of response is probably not evidence that there's no interest in fixing serialization; its that fixing serialization, with all the constraints that "fix" entails, is just really really hard, and its much easier to complain about it (and even say "let's just get rid of it") than to fix it.
Should Serializable eventually be deprecated? Should Serialization be disabled by default? Should a new mechanism be developed? If a new mechanism is developed, what about circular object relationships?
As I delved into my own explorations of serialization, I started to realize why such a horrible approach was the one that was ultimately chosen; while serialization is horrible and awful and leaky and insecure and complex and brittle, it does address problems like cyclic data structures and independent evolution of subclass and superclass better than the "clean" models.
My conclusion is, at best, a new mechanism would have to live side-by-side with the old one, since it could only handle 95% of the cases. It might handle those 95% much better -- more cleanly, securely, and allowing easier schema evolution -- but the hard cases are still there. Still, reducing the use of the horrible old mechanism may still be a worthy goal, even if it can't be killed outright.
On Aug 9, 2014, at 7:20 PM, Brian Goetz <brian.goetz@Oracle.COM> wrote:
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
I sense your frustration, but I think you may be reaching the wrong conclusion. The lack of response is probably not evidence that there's no interest in fixing serialization; its that fixing serialization, with all the constraints that "fix" entails, is just really really hard, and its much easier to complain about it (and even say "let's just get rid of it") than to fix it.
Should Serializable eventually be deprecated? Should Serialization be disabled by default? Should a new mechanism be developed? If a new mechanism is developed, what about circular object relationships?
As I delved into my own explorations of serialization, I started to realize why such a horrible approach was the one that was ultimately chosen; while serialization is horrible and awful and leaky and insecure and complex and brittle, it does address problems like cyclic data structures and independent evolution of subclass and superclass better than the "clean" models.
My conclusion is, at best, a new mechanism would have to live side-by-side with the old one, since it could only handle 95% of the cases. It might handle those 95% much better -- more cleanly, securely, and allowing easier schema evolution -- but the hard cases are still there. Still, reducing the use of the horrible old mechanism may still be a worthy goal, even if it can't be killed outright.
Also many serialization-based libraries use sun.misc.Unsafe or sun.reflect.ReflectionFactory for various reasons (with backup plans if such classes are not available or accessible). As part to the future of serialization i think we need to evaluate libraries such as XStream and Objenesis to see what unsafe/internal mechanisms can be replaced by functionally equivalent safe public mechanisms. I have more questions than answers at the moment with regards to that :-( Paul.
On 09/08/2014 06:56, Peter Firmstone wrote:
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
Just to add to Brian's comments, I think part of it is that many people are busy with other things, preparing for JDK 9 for example. So I think there is a lot of support for investigation and proposals that would improve things, it's just that some people are too busy to respond.
I don't know if isolates will be included with JDK 9 for Jigsaw, or whether ClassLoaders alone will provide isolation for modules.
The ability to limit visibility and provide isolation of implementation classes as well as providing limits on memory and threads for isolated modules would also improve platform security.
If by "isolates" you mean JSR 121 then I think that would be well beyond the scope, as would resource management. This isn't really the thread to discuss how module boundaries will work but just to say that class loaders and visibility can be weak when it comes to module boundaries. There are other options available, particularly when the ability to extend the access control rules are on the table. So I would suggest not making any assumptions here for now. -Alan.
Thanks Alan, I can relate to time poverty :) I might be assuming too much, but if there's interest in doing something with Serialization, I'd be interested in learning about plans or difficulties involved in deserialization and modules. It can be a little more difficult to find the correct ClassLoader to resolve classes during deserialization when ClassLoader relationships aren't hierarchial. Object streams can be annotated with module information to assist resolution. On the subject of isolates, I found Ribbons interesting: https://www.cs.purdue.edu/homes/peugster/Ribbons/RJ.pdf https://www.cs.purdue.edu/homes/peugster/Ribbons/ Got any links to info on extending access control rules? Regards, Peter. On 11/08/2014 9:21 PM, Alan Bateman wrote:
On 09/08/2014 06:56, Peter Firmstone wrote:
I've noticed there's not much interest in improving Serialization on these lists. This makes me wonder if java Serialization has lost relevance in recent years with the rise of protocol buffers apache thrift and other means of data transfer over byte streams.
Just to add to Brian's comments, I think part of it is that many people are busy with other things, preparing for JDK 9 for example. So I think there is a lot of support for investigation and proposals that would improve things, it's just that some people are too busy to respond.
I don't know if isolates will be included with JDK 9 for Jigsaw, or whether ClassLoaders alone will provide isolation for modules.
The ability to limit visibility and provide isolation of implementation classes as well as providing limits on memory and threads for isolated modules would also improve platform security.
If by "isolates" you mean JSR 121 then I think that would be well beyond the scope, as would resource management. This isn't really the thread to discuss how module boundaries will work but just to say that class loaders and visibility can be weak when it comes to module boundaries. There are other options available, particularly when the ability to extend the access control rules are on the table. So I would suggest not making any assumptions here for now.
-Alan.
On 11/08/2014 13:06, Peter Firmstone wrote:
Thanks Alan, I can relate to time poverty :)
I might be assuming too much, but if there's interest in doing something with Serialization, I'd be interested in learning about plans or difficulties involved in deserialization and modules. It can be a little more difficult to find the correct ClassLoader to resolve classes during deserialization when ClassLoader relationships aren't hierarchial. Object streams can be annotated with module information to assist resolution. The issues around visibility when deserializing are somewhat independent of modules. The usual way to deal with such matters is to override the resolveClass method. Another long standing suggestion is for ObjectInputStream to define a new constructor that takes a class loader to avoid the stack walk to get the user-defined loader. It remains to seen but if we can avoid changing visibility then we don't change the status quo. :
Got any links to info on extending access control rules? Not yet, a future JSR will define the standard module system and there will be a corresponding JEP for the implementation.
-Alan.
Interesting, language features for modules, won't necessarily involve ClassLoader's (my assumptions were based on existing systems) although you'd expect modules to have their own ProtectionDomain. An alternative to isolates, is separate processes with jvm class sharing enabled. I'll keep an eye out for the JSR. When is a better timeframe, roughly, to discuss Serializable? Regards, Peter. ----- Original message -----
On 11/08/2014 13:06, Peter Firmstone wrote:
Thanks Alan, I can relate to time poverty :)
I might be assuming too much, but if there's interest in doing something with Serialization, I'd be interested in learning about plans or difficulties involved in deserialization and modules. It can be a little more difficult to find the correct ClassLoader to resolve classes during deserialization when ClassLoader relationships aren't hierarchial. Object streams can be annotated with module information to assist resolution. The issues around visibility when deserializing are somewhat independent of modules. The usual way to deal with such matters is to override the resolveClass method. Another long standing suggestion is for ObjectInputStream to define a new constructor that takes a class loader to avoid the stack walk to get the user-defined loader. It remains to seen but if we can avoid changing visibility then we don't change the status quo. :
Got any links to info on extending access control rules? Not yet, a future JSR will define the standard module system and there will be a corresponding JEP for the implementation.
-Alan.
On 12/08/2014 10:03, Peter Firmstone wrote:
Interesting, language features for modules, won't necessarily involve ClassLoader's (my assumptions were based on existing systems) although you'd expect modules to have their own ProtectionDomain.
I think it would be reasonable to expect to that you should be able to grant permissions to specific modules. This is something for a difference discussion thread of course.
When is a better timeframe, roughly, to discuss Serializable?
I'm sure there isn't a best time that works for everyone that is interested in this topic. However, in terms of JDK 9 then it seems early enough to do the exploration and prototypes, and get moving on proposals. -Alan.
participants (4)
-
Alan Bateman
-
Brian Goetz
-
Paul Sandoz
-
Peter Firmstone