A hole in the serialization spec

Stuart Marks stuart.marks at oracle.com
Mon Feb 17 07:17:31 UTC 2014


On 2/14/14 9:43 AM, David M. Lloyd wrote:
> On 02/14/2014 09:56 AM, David M. Lloyd wrote:
>> In the JDK, java.util.Date does not read/write fields.  Perhaps others
>> as well.  Given that the behavior is presently undefined, that means the
>> serialized representation of java.util.Date (and any other such
>> non-conforming classes) are also undefined.
>
> An interesting detail here - since Date doesn't have any non-transient fields,
> this happens to work out OK for a second reason (that defaultReadFields() would
> read nothing anyway) - however it still would break if a non-transient field
> were to be added.

Hi David,

(coming late to this party)

Thanks for pointing out these clauses in the serialization specification. I 
always knew that these methods "should" behave this way but I was unaware of the 
undefined qualification in the spec, and I was also unaware that even JDK 
classes like java.util.Date have readObject/writeObject methods that don't 
fulfil this requirement.

I also think you're right that these problems are widespread. A recent blog post 
on serialization [1] has some sample code whose readObject/writeObject methods 
don't fulfil this requirement either.

On the other hand, this requirement doesn't seem to appear in the javadoc 
anyplace that I can find. The class doc for java.io.Serializable is the most 
explicit, and it says,

     The writeObject method is responsible for writing the state of the
     object for its particular class so that the corresponding readObject
     method can restore it. The default mechanism for saving the Object's
     fields can be invoked by calling out.defaultWriteObject. The method
     does not need to concern itself with the state belonging to its
     superclasses or subclasses. State is saved by writing the individual
     fields to the ObjectOutputStream using the writeObject method or by
     using the methods for primitive data types supported by DataOutput.

     The readObject method is responsible for reading from the stream and
     restoring the classes fields. It may call in.defaultReadObject to
     invoke the default mechanism for restoring the object's non-static
     and non-transient fields. The defaultReadObject method uses
     information in the stream to assign the fields of the object saved
     in the stream with the correspondingly named fields in the current
     object. This handles the case when the class has evolved to add new
     fields. The method does not need to concern itself with the state
     belonging to its superclasses or subclasses. State is saved by
     writing the individual fields to the ObjectOutputStream using the
     writeObject method or by using the methods for primitive data types
     supported by DataOutput.

The wording here seems to imply that calling defaultWriteObject and 
defaultReadObject is optional.

It does look like the various bits of the specification could use some cleanup.

In your initial post, you said that problems with the serialization 
specification that have caused user problems. Can you be more specific about 
what these problems were?

In another message earlier in this thread, you had made a few suggestions:

> 1) do nothing :(
> 2) start throwing (or writing) an exception in write/readObject when stream ops are performed without reading fields (maybe can be disabled with a sys prop or something)
> 3) leave fields cleared and risk protocol issues
> 4) silently start reading/writing empty field information (risks protocol issues)

I'd have to say that #2 is pretty close to a non-starter. Since the problem does 
appear to be widespread, a lot of software would start suffering this exception 
even if it otherwise seems to be behaving correctly. This is clearly a big 
behavioral incompatibility, and even if it could be mitigated with a system 
property, I'd question whether it was worthwhile.

#4 also seems to be a fairly large incompatibility. If a class's writeObject 
method is missing a defaultWriteObject call, it has a fairly stable behavior, 
although one that's defined by the implementation as opposed to the 
specification. (Although the specification isn't self-consistent, per the 
above.) Silently changing the bytes emitted in these cases would certainly cause 
incompatibilities with existing readObject methods that are unprepared to deal 
with them.

#3 leads me to mention another area of the serialization specification that *is* 
well-defined, which is what occurs if fields are added or removed from one 
object version to the next. This is covered in sections 5.6.1 and 5.6.2 of the 
spec. [2] Briefly, if the current object has fields for which values are not 
present in the serialization stream, those fields are initialized to their 
default values (zero, null, false). Does this have any bearing on the issues 
you're concerned about? (It doesn't say so very explicitly, but field data that 
appears in the serialized form is ignored if there is no corresponding field in 
the current class.)

Finally, another suggestion that might help with these issues is not to change 
the JDK, but to use static analysis tools such as FindBugs to help programmers 
identify problem code.

s'marks


[1] http://marxsoftware.blogspot.com/2014/02/serializing-java-objects-with-non.html

[2] 
http://docs.oracle.com/javase/7/docs/platform/serialization/spec/version.html#5172




More information about the core-libs-dev mailing list