The future of Serialization

Peter Firmstone peter.firmstone at zeus.net.au
Mon Aug 11 10:18:28 UTC 2014


On 11/08/2014 8:12 PM, Peter Firmstone wrote:
> Brian,
>
> Thanks for picking up on my frustration ;)
>
> I have something in mind for Serializable2 to address cyclic data 
> structures and the possibility of independant evolution of super and 
> child classes, while retaining a relatively clean public api, with one 
> optional private method.  The methods and interfaces proposed are 
> suitable for any alternative ObjectInput and ObjectOutput implementation.
>
> An interface exists in Apache River, it's called Startable, it has one 
> method:
>
> public void start() throws Exception;
>
> It's called by a framework to allow an Object to start threads, 
> publish "this" or throw an exception after construction.  The intent 
> is to allow an object to be immutable with final fields and be 
> provided with a thread of execution after construction and before 
> publication.
>
> Something similar can be used to wire up circular relations, let met 
> explain:
>
> Every class that implements Serializable has one thing in common, the 
> Serialization protocol and every Object instance of a Serializable 
> class has an arbitrary serial form.
>
> I propose a final class representing SerialForm for an object, that 
> cannot be extended, requires privilege to instantiate and also 
> performs method guard security checks, for all callers with the 
> exception of a calling class reading or writing its own serial form.  
> SerialForm needs a parameter field key identity represented by the 
> calling Class, 

Sorry, that should read "field name", not "method name".

> the method name and the field's Class type, this key would be used for 
> both writing and retrieving a field entry in SerialForm. SerialForm 
> will also provide a method to advise if a field key contains a 
> circular relation, any field entry in SerialForm that would contain a 
> circular relation is not populated until after construction of the 
> current object is complete.
>
> An arbitrary Serializable2 Object instance may be composed of a 
> hierarchy of classes, each belonging to a separate ProtectionDomain.
>
> For the following interface:
>
> public interface Serializable2 {
>
>     void writeObject(SerialForm serial) throws IOException;
>
> }
>
> Implementers of Serializable2 must:
>
>   1. Implement writeObject
>   2. Implement a constructor with the signature:  (SerialForm serial).
>
> Implementors that need to check invariants, delay throwing an 
> Exception, publish "this" or set a circular reference after 
> construction should:
>
>   4. Implement: private void readObjectNoData() throws
>      InvalidObjectException;
>
> Child class implementations should:
>
>   5. Call their super class writeObject method and superclass
>      constructor, but may call any super class constructor or methods.
>
> Compatibility and Evolution:
>
>   1. Fields can be included or omitted from SerialForm, by an
>      implementation, without breaking compatibility, provided a null
>      reference is accepted during deserialization.
>   2. Child classes in a hierarchy;  all Serializable2 implementing
>      superclass constructors have the same signature; the superclass
>      implementation can be substituted, without breaking child class
>      deserialization (provided this is the constructor used by the
>      child class).
>   3. There is no serialVersionUID.
>   4. Child class Serializable2 implementations can extend a superclass
>      without a zero arg constructor that doesn't itself implement
>      Serializable2.
>   5. Child classes that do not override writeObject will not be
>      serialized, so can effectively opt out.
>   6. Because implementations are required to implement public methods,
>      there is no "Magic".
>   7. Serializable2 shouldn't extend Serializable, allowing classes to
>      implement both interfaces for a period of time (for that reason
>      the signature for readObjectNoData may need to be changed for
>      Serializable2).
>   8. ObjectInputStream and ObjectOutputStream can be extended to
>      support both implementations for compatibility, however
>      alternative stream implementations would be preferable for
>      Serializable2 to avoid Serializable security issues.  The new
>      implementations should be possible to substitute because both
>      types would use the same Stream Protocol, provided the classes
>      being deserialized implement Serializable2.
>
>
> My reasoning for retaining readObjectNoData() and for updating field 
> entry's in SerialForm that contain circular relations after 
> construction, is:
>
>   1. An object reference for the object currently being deserialized
>      can be passed to another object's constructor (via a SerialForm
>      instance) after the current Object's constructor completes,
>      allowing safe publication of final field freezes that occur at the
>      end of construction.
>   2. When the Serialization2 Framework becomes aware of an object that
>      contains a circular relationship while that object is in the
>      process of being deserialized, the second object will not be
>      instantiated until after the constructor of the first object in
>      the relationship completes.  Data read in from the stream can be
>      stored in a SerialForm without requiring object instantation.
>   3. After construction completes, the object that has just been
>      deserialized can retain a copy of its SerialForm and look up the
>      field containing a circular relationship, the Serialization
>      framework will update its SerialForm with the new object that
>      holds a circular relationship, prior to calling readObjectNoData()
>      on the first object.
>   4. If the developer of the implementing class is not aware of the
>      possibility of a circular relationship, then the worst consequence
>      is a field will be set to null during construction, "this" will
>      not escape.
>   5. The second Object holding a link to an object that apears earlier
>      in the stream, may not be aware that the object it holds a
>      reference to also needs a reference to it.  The first object will
>      not obtain a reference to the second until both Object
>      constructors have completed.  The second object may not need to
>      implement readObjectNoData().
>   6. readObjectNoData() needs to be called on every class belonging to
>      a single Object's inheritance hierarchy, when defined, after all
>      constructors have completed, it should be called in the order of
>      superclass to child class.
>
> Thoughts?
>
> Regards,
>
> Peter.
>
> On 10/08/2014 3:20 AM, Brian Goetz wrote:
>>> I've noticed there's not much interest in improving Serialization on
>>> these lists.  This makes me wonder if java Serialization has lost
>>> relevance in recent years with the rise of protocol buffers apache
>>> thrift and other means of data transfer over byte streams.
>>
>> I sense your frustration, but I think you may be reaching the wrong 
>> conclusion.  The lack of response is probably not evidence that 
>> there's no interest in fixing serialization; its that fixing 
>> serialization, with all the constraints that "fix" entails, is just 
>> really really hard, and its much easier to complain about it (and 
>> even say "let's just get rid of it") than to fix it.
>>
>>> Should Serializable eventually be deprecated? Should Serialization be
>>> disabled by default? Should a new mechanism be developed? If a new
>>> mechanism is developed, what about circular object relationships?
>>
>> As I delved into my own explorations of serialization, I started to 
>> realize why such a horrible approach was the one that was ultimately 
>> chosen; while serialization is horrible and awful and leaky and 
>> insecure and complex and brittle, it does address problems like 
>> cyclic data structures and independent evolution of subclass and 
>> superclass better than the "clean" models.
>>
>> My conclusion is, at best, a new mechanism would have to live 
>> side-by-side with the old one, since it could only handle 95% of the 
>> cases.  It might handle those 95% much better -- more cleanly, 
>> securely, and allowing easier schema evolution -- but the hard cases 
>> are still there.  Still, reducing the use of the horrible old 
>> mechanism may still be a worthy goal, even if it can't be killed 
>> outright.
>>
>



More information about the net-dev mailing list