A hole in the serialization spec

David M. Lloyd david.lloyd at redhat.com
Mon Feb 17 15:36:19 UTC 2014


On 02/17/2014 01:17 AM, Stuart Marks wrote:
> On 2/14/14 9:43 AM, David M. Lloyd wrote:
>> On 02/14/2014 09:56 AM, David M. Lloyd wrote:
>>> In the JDK, java.util.Date does not read/write fields.  Perhaps others
>>> as well.  Given that the behavior is presently undefined, that means the
>>> serialized representation of java.util.Date (and any other such
>>> non-conforming classes) are also undefined.
>>
>> An interesting detail here - since Date doesn't have any non-transient
>> fields,
>> this happens to work out OK for a second reason (that
>> defaultReadFields() would
>> read nothing anyway) - however it still would break if a non-transient
>> field
>> were to be added.
>
> Hi David,
>
> (coming late to this party)

Thanks for replying, comments below.

> Thanks for pointing out these clauses in the serialization
> specification. I always knew that these methods "should" behave this way
> but I was unaware of the undefined qualification in the spec, and I was
> also unaware that even JDK classes like java.util.Date have
> readObject/writeObject methods that don't fulfil this requirement.

That's the case that blew up for us first, so that's how I knew about 
it.  For a little background, we (JBoss) have a custom serialization 
wire protocol which nevertheless follows the serialization spec as 
exactly as possible from an API point of view.  When I was studying the 
specification, I noted this bit and decided our action would be the #2 
option (throw an exception when this happens), which effectively let me 
decide what to do later.  This was when we discovered that 
java.util.Date has this issue.

However I noted at the time that since this class has no non-transient 
fields, allowing this case is actually a relatively low risk.  So we 
began allowing classes with no fields to skip the put/getFields step, 
while still throwing an exception if there are fields defined (at either 
end).  This got us past the java.util.Date case (which is a particularly 
problematic case given how pervasive it appears to be in user code, 
particularly Java EE code).

Recently I "fixed" the remaining cases in our implementation by doing 
the read/discard stuff I describe in the #4 option.  I had realized that 
doing so is backwards compatible with what we had been doing (basically 
replacing the exception with "safe" behavior).  This works out very well 
for us, as you can imagine. :)

The problem with doing this in the JDK is, as you know, that there 
are/will be serialized blobs out there that would not include field 
information that would then be deserialized with implementations that 
expect it, and vice-versa.  I wonder if perhaps it would be a good idea 
to make this behavior configurable though, for cases where both ends are 
under the same control.  Or, perhaps it could be a protocol version 
thing (though IIRC there isn't a great facility for 
forwards-compatibility in the event that an "old" deserializer reads a 
"new" stream - how was this done for enums I wonder?).

> I also think you're right that these problems are widespread. A recent
> blog post on serialization [1] has some sample code whose
> readObject/writeObject methods don't fulfil this requirement either.

Yeah I wouldn't be surprised.  We recently encountered at least one 
class in one of the Bouncy Castle security provider packages as well 
with this issue - except that class has fields.

> On the other hand, this requirement doesn't seem to appear in the
> javadoc anyplace that I can find.

Yeah this is quite unfortunate.

> In your initial post, you said that problems with the serialization
> specification that have caused user problems. Can you be more specific
> about what these problems were?

As I outlined above.  We want to provide a performant, and more 
importantly, a safe experience for users; our custom protocol enabled 
the former and our conservative behavior allowed the latter.  We have 
what turns out to be a quite good solution for our own case, but this 
doesn't help the general community much.

> In another message earlier in this thread, you had made a few suggestions:
>
>> 1) do nothing :(
>> 2) start throwing (or writing) an exception in write/readObject when
>> stream ops are performed without reading fields (maybe can be disabled
>> with a sys prop or something)
>> 3) leave fields cleared and risk protocol issues
>> 4) silently start reading/writing empty field information (risks
>> protocol issues)
>
[trim response for #2 and #4 which I agree with]
>
> #3 leads me to mention another area of the serialization specification
> that *is* well-defined, which is what occurs if fields are added or
> removed from one object version to the next. This is covered in sections
> 5.6.1 and 5.6.2 of the spec. [2] Briefly, if the current object has
> fields for which values are not present in the serialization stream,
> those fields are initialized to their default values (zero, null,
> false). Does this have any bearing on the issues you're concerned about?
> (It doesn't say so very explicitly, but field data that appears in the
> serialized form is ignored if there is no corresponding field in the
> current class.)

Yeah this is consistent with my understanding of the specification.  I 
think it is also implied in 3.4, "If the class being restored is not 
present in the stream being read, then its readObjectNoData method, if 
defined, is invoked (instead of readObject); otherwise, its fields are 
initialized to the appropriate default values. For further detail, see 
section 3.5."

Wire protocol aside, this requirement is what led to my #4 proposal: 
from an API perspective, the behavior is no different.  The fields are 
initialized to default zero values.  But it enables users to add get/put 
field operations later on without protocol corruption.

Of course this change would be essentially protocol-incompatible with 
the current Oracle/OpenJDK implementation, as was pointed out already in 
various ways, unless as I said there is some clever way that I can't 
immediately see to let this work.

> Finally, another suggestion that might help with these issues is not to
> change the JDK, but to use static analysis tools such as FindBugs to
> help programmers identify problem code.

Yeah, this is basically the status quo I think.  It would however be 
extra nice if the compiler could emit a warning at the least, instead of 
relying on external static analysis tools, given how central 
serialization is.  The necessary static analysis should be relatively 
simple, from what little I recall of the compiler internals.

One possible bad side-effect would be that people might "fix" their 
broken classes without realizing they've just broken their wire 
compatibility though.  Such a tool should probably emit a message to the 
effect of "classes without OOS.dWO/wF should only have transient fields" 
instead of "this class should have a OOS.dWO/wF call".  While adding 
transient to fields of such classes would technically change the 
serialized form of the class, that form is never used in any event, so 
it's an "okay" change, and decreases the serialized class descriptor 
size besides... and in the long term may even allow adding a dWO/dRO 
pair in the future, depending on the compatibility needs of the user for 
that class.

-- 
- DML



More information about the core-libs-dev mailing list