A hole in the serialization spec

David M. Lloyd david.lloyd at redhat.com
Mon Feb 17 15:50:47 UTC 2014


On 02/17/2014 05:42 AM, Chris Hegarty wrote:
> Let me try to summarize the issue, as I see it.
>
> Circling back to David's original quote from the Serialization spec [1]
>
> "The class's writeObject method, if implemented, is responsible for
> saving the state of the class. Either ObjectOutputStream's
> defaultWriteObject or writeFields method must be called once (and only
> once) before writing any optional data that will be needed by the
> corresponding readObject method to restore the state of the object; even
> if no optional data is written, defaultWriteObject or writeFields must
> still be invoked once. If defaultWriteObject or writeFields is not
> invoked once prior to the writing of optional data (if any), then the
> behavior of instance deserialization i_s undefined in cases where the
> ObjectInputStream cannot resolve the class which defined the writeObject
> method in question._"
>
> The underlying section above is most relevant. It is a qualification of
> the scenario where the behavior is undefined. I read it to mean; the
> behavior is undefined if, and only if, the OIS cannot resolve the class
> which defined the writeObject.  And this seems in line with David's
> description [2] (which I agree).

I think I see what you're getting at, or rather, what you're trying to 
get out of it.  You are trying to say that not persisting fields is only 
undefined if the other side's class goes away, based on §2.3. 
Unfortunately I don't think we can get away with this interpretation, as 
the read side of the specification at §3.4 isn't any less valid and it 
flat-out says that not reading fields is completely undefined, so having 
defined behavior on one end doesn't really help here.

But I think it does leave a little latitude for improvement.  If the 
current behavior could be specified, this will not affect any existing 
code, but will better inform those users who read this specification. 
In particular, making it clear that skipping these methods is 
semantically equivalent to writing a class with no non-transient fields 
would be a particularly useful clarification; I am coming to think that 
any class which does this should only have transient fields as this 
allows the default*/get/putFields methods to be added in later without a 
risk to wire compatibility.

With the specification clarified in this regard, it becomes possible to 
do two important things:

1) The API JavaDoc can be updated to be not only completely consistent 
with the specification, but also with the long-standing behavior of (at 
least) Sun and Oracle JDKs and OpenJDK, and also (I believe) other JDKs.

2) It allows all implementations to be able to guarantee 
interoperability for cases which are presently undefined yet are 
definitely pervasive, even within the JDK itself.

> "I think the specifics of the quote relate to this kind of class change;
> in particular, if a class is deleted from the hierarchy on the read
> side, and that class corresponds to the class that had the misbehaving
> writeObject, I suspect that things will break at that point as the read
> side will probably try to consume and discard the field information for
> that class, which will be missing (it will start reading the next class'
> fields instead I think)."
>
> My take on this is that the above writeObject undefined qualification is
> referring to a compatibility issue. Since removing a class from the
> hierarchy is a compatible change [3], then default read/write
> Object/Fields must be called, otherwise, if a class is removed from the
> hierarchy the behavior is undefined. In my testing I get
> StreamCorruptException, but I can see how this could behave differently,
> depending on the class hierarchy and actual serialization state.
>
> If the class defining the writeObject is resolvable, then the behavior
> is *not* undefined.
>
> Do we agree on what is actually undefined, and what is not?

Also, if the class which did not write fields is not resolvable, but its 
descriptor defines no fields anyway (either they're all transient or 
there is an empty serialPersistentFields defined), then the behavior is 
still predictable, "safe", and well-understood, though still technically 
undefined by spec.

Honestly I'm coming to the conclusion that the spec should just be 
updated to match the existing behavior.  At least this way, it is 
possible to establish and explain exactly what is happening and why, and 
what the effect is on the wire protocol, and how to cope with changes to 
the affected class hierarchies in a compatible way.

-- 
- DML



More information about the core-libs-dev mailing list