The future of Serialization
Peter Firmstone
peter.firmstone at zeus.net.au
Mon Aug 11 10:12:14 UTC 2014
Brian,
Thanks for picking up on my frustration ;)
I have something in mind for Serializable2 to address cyclic data
structures and the possibility of independant evolution of super and
child classes, while retaining a relatively clean public api, with one
optional private method. The methods and interfaces proposed are
suitable for any alternative ObjectInput and ObjectOutput implementation.
An interface exists in Apache River, it's called Startable, it has one
method:
public void start() throws Exception;
It's called by a framework to allow an Object to start threads, publish
"this" or throw an exception after construction. The intent is to allow
an object to be immutable with final fields and be provided with a
thread of execution after construction and before publication.
Something similar can be used to wire up circular relations, let met
explain:
Every class that implements Serializable has one thing in common, the
Serialization protocol and every Object instance of a Serializable class
has an arbitrary serial form.
I propose a final class representing SerialForm for an object, that
cannot be extended, requires privilege to instantiate and also performs
method guard security checks, for all callers with the exception of a
calling class reading or writing its own serial form. SerialForm needs
a parameter field key identity represented by the calling Class, the
method name and the field's Class type, this key would be used for both
writing and retrieving a field entry in SerialForm. SerialForm will also
provide a method to advise if a field key contains a circular relation,
any field entry in SerialForm that would contain a circular relation is
not populated until after construction of the current object is complete.
An arbitrary Serializable2 Object instance may be composed of a
hierarchy of classes, each belonging to a separate ProtectionDomain.
For the following interface:
public interface Serializable2 {
void writeObject(SerialForm serial) throws IOException;
}
Implementers of Serializable2 must:
1. Implement writeObject
2. Implement a constructor with the signature: (SerialForm serial).
Implementors that need to check invariants, delay throwing an Exception,
publish "this" or set a circular reference after construction should:
4. Implement: private void readObjectNoData() throws
InvalidObjectException;
Child class implementations should:
5. Call their super class writeObject method and superclass
constructor, but may call any super class constructor or methods.
Compatibility and Evolution:
1. Fields can be included or omitted from SerialForm, by an
implementation, without breaking compatibility, provided a null
reference is accepted during deserialization.
2. Child classes in a hierarchy; all Serializable2 implementing
superclass constructors have the same signature; the superclass
implementation can be substituted, without breaking child class
deserialization (provided this is the constructor used by the
child class).
3. There is no serialVersionUID.
4. Child class Serializable2 implementations can extend a superclass
without a zero arg constructor that doesn't itself implement
Serializable2.
5. Child classes that do not override writeObject will not be
serialized, so can effectively opt out.
6. Because implementations are required to implement public methods,
there is no "Magic".
7. Serializable2 shouldn't extend Serializable, allowing classes to
implement both interfaces for a period of time (for that reason
the signature for readObjectNoData may need to be changed for
Serializable2).
8. ObjectInputStream and ObjectOutputStream can be extended to
support both implementations for compatibility, however
alternative stream implementations would be preferable for
Serializable2 to avoid Serializable security issues. The new
implementations should be possible to substitute because both
types would use the same Stream Protocol, provided the classes
being deserialized implement Serializable2.
My reasoning for retaining readObjectNoData() and for updating field
entry's in SerialForm that contain circular relations after
construction, is:
1. An object reference for the object currently being deserialized
can be passed to another object's constructor (via a SerialForm
instance) after the current Object's constructor completes,
allowing safe publication of final field freezes that occur at the
end of construction.
2. When the Serialization2 Framework becomes aware of an object that
contains a circular relationship while that object is in the
process of being deserialized, the second object will not be
instantiated until after the constructor of the first object in
the relationship completes. Data read in from the stream can be
stored in a SerialForm without requiring object instantation.
3. After construction completes, the object that has just been
deserialized can retain a copy of its SerialForm and look up the
field containing a circular relationship, the Serialization
framework will update its SerialForm with the new object that
holds a circular relationship, prior to calling readObjectNoData()
on the first object.
4. If the developer of the implementing class is not aware of the
possibility of a circular relationship, then the worst consequence
is a field will be set to null during construction, "this" will
not escape.
5. The second Object holding a link to an object that apears earlier
in the stream, may not be aware that the object it holds a
reference to also needs a reference to it. The first object will
not obtain a reference to the second until both Object
constructors have completed. The second object may not need to
implement readObjectNoData().
6. readObjectNoData() needs to be called on every class belonging to
a single Object's inheritance hierarchy, when defined, after all
constructors have completed, it should be called in the order of
superclass to child class.
Thoughts?
Regards,
Peter.
On 10/08/2014 3:20 AM, Brian Goetz wrote:
>> I've noticed there's not much interest in improving Serialization on
>> these lists. This makes me wonder if java Serialization has lost
>> relevance in recent years with the rise of protocol buffers apache
>> thrift and other means of data transfer over byte streams.
>
> I sense your frustration, but I think you may be reaching the wrong
> conclusion. The lack of response is probably not evidence that
> there's no interest in fixing serialization; its that fixing
> serialization, with all the constraints that "fix" entails, is just
> really really hard, and its much easier to complain about it (and even
> say "let's just get rid of it") than to fix it.
>
>> Should Serializable eventually be deprecated? Should Serialization be
>> disabled by default? Should a new mechanism be developed? If a new
>> mechanism is developed, what about circular object relationships?
>
> As I delved into my own explorations of serialization, I started to
> realize why such a horrible approach was the one that was ultimately
> chosen; while serialization is horrible and awful and leaky and
> insecure and complex and brittle, it does address problems like cyclic
> data structures and independent evolution of subclass and superclass
> better than the "clean" models.
>
> My conclusion is, at best, a new mechanism would have to live
> side-by-side with the old one, since it could only handle 95% of the
> cases. It might handle those 95% much better -- more cleanly,
> securely, and allowing easier schema evolution -- but the hard cases
> are still there. Still, reducing the use of the horrible old
> mechanism may still be a worthy goal, even if it can't be killed
> outright.
>
More information about the security-dev
mailing list