A hole in the serialization spec
David M. Lloyd
david.lloyd at redhat.com
Mon Feb 17 15:36:19 UTC 2014
On 02/17/2014 01:17 AM, Stuart Marks wrote:
> On 2/14/14 9:43 AM, David M. Lloyd wrote:
>> On 02/14/2014 09:56 AM, David M. Lloyd wrote:
>>> In the JDK, java.util.Date does not read/write fields. Perhaps others
>>> as well. Given that the behavior is presently undefined, that means the
>>> serialized representation of java.util.Date (and any other such
>>> non-conforming classes) are also undefined.
>>
>> An interesting detail here - since Date doesn't have any non-transient
>> fields,
>> this happens to work out OK for a second reason (that
>> defaultReadFields() would
>> read nothing anyway) - however it still would break if a non-transient
>> field
>> were to be added.
>
> Hi David,
>
> (coming late to this party)
Thanks for replying, comments below.
> Thanks for pointing out these clauses in the serialization
> specification. I always knew that these methods "should" behave this way
> but I was unaware of the undefined qualification in the spec, and I was
> also unaware that even JDK classes like java.util.Date have
> readObject/writeObject methods that don't fulfil this requirement.
That's the case that blew up for us first, so that's how I knew about
it. For a little background, we (JBoss) have a custom serialization
wire protocol which nevertheless follows the serialization spec as
exactly as possible from an API point of view. When I was studying the
specification, I noted this bit and decided our action would be the #2
option (throw an exception when this happens), which effectively let me
decide what to do later. This was when we discovered that
java.util.Date has this issue.
However I noted at the time that since this class has no non-transient
fields, allowing this case is actually a relatively low risk. So we
began allowing classes with no fields to skip the put/getFields step,
while still throwing an exception if there are fields defined (at either
end). This got us past the java.util.Date case (which is a particularly
problematic case given how pervasive it appears to be in user code,
particularly Java EE code).
Recently I "fixed" the remaining cases in our implementation by doing
the read/discard stuff I describe in the #4 option. I had realized that
doing so is backwards compatible with what we had been doing (basically
replacing the exception with "safe" behavior). This works out very well
for us, as you can imagine. :)
The problem with doing this in the JDK is, as you know, that there
are/will be serialized blobs out there that would not include field
information that would then be deserialized with implementations that
expect it, and vice-versa. I wonder if perhaps it would be a good idea
to make this behavior configurable though, for cases where both ends are
under the same control. Or, perhaps it could be a protocol version
thing (though IIRC there isn't a great facility for
forwards-compatibility in the event that an "old" deserializer reads a
"new" stream - how was this done for enums I wonder?).
> I also think you're right that these problems are widespread. A recent
> blog post on serialization [1] has some sample code whose
> readObject/writeObject methods don't fulfil this requirement either.
Yeah I wouldn't be surprised. We recently encountered at least one
class in one of the Bouncy Castle security provider packages as well
with this issue - except that class has fields.
> On the other hand, this requirement doesn't seem to appear in the
> javadoc anyplace that I can find.
Yeah this is quite unfortunate.
> In your initial post, you said that problems with the serialization
> specification that have caused user problems. Can you be more specific
> about what these problems were?
As I outlined above. We want to provide a performant, and more
importantly, a safe experience for users; our custom protocol enabled
the former and our conservative behavior allowed the latter. We have
what turns out to be a quite good solution for our own case, but this
doesn't help the general community much.
> In another message earlier in this thread, you had made a few suggestions:
>
>> 1) do nothing :(
>> 2) start throwing (or writing) an exception in write/readObject when
>> stream ops are performed without reading fields (maybe can be disabled
>> with a sys prop or something)
>> 3) leave fields cleared and risk protocol issues
>> 4) silently start reading/writing empty field information (risks
>> protocol issues)
>
[trim response for #2 and #4 which I agree with]
>
> #3 leads me to mention another area of the serialization specification
> that *is* well-defined, which is what occurs if fields are added or
> removed from one object version to the next. This is covered in sections
> 5.6.1 and 5.6.2 of the spec. [2] Briefly, if the current object has
> fields for which values are not present in the serialization stream,
> those fields are initialized to their default values (zero, null,
> false). Does this have any bearing on the issues you're concerned about?
> (It doesn't say so very explicitly, but field data that appears in the
> serialized form is ignored if there is no corresponding field in the
> current class.)
Yeah this is consistent with my understanding of the specification. I
think it is also implied in 3.4, "If the class being restored is not
present in the stream being read, then its readObjectNoData method, if
defined, is invoked (instead of readObject); otherwise, its fields are
initialized to the appropriate default values. For further detail, see
section 3.5."
Wire protocol aside, this requirement is what led to my #4 proposal:
from an API perspective, the behavior is no different. The fields are
initialized to default zero values. But it enables users to add get/put
field operations later on without protocol corruption.
Of course this change would be essentially protocol-incompatible with
the current Oracle/OpenJDK implementation, as was pointed out already in
various ways, unless as I said there is some clever way that I can't
immediately see to let this work.
> Finally, another suggestion that might help with these issues is not to
> change the JDK, but to use static analysis tools such as FindBugs to
> help programmers identify problem code.
Yeah, this is basically the status quo I think. It would however be
extra nice if the compiler could emit a warning at the least, instead of
relying on external static analysis tools, given how central
serialization is. The necessary static analysis should be relatively
simple, from what little I recall of the compiler internals.
One possible bad side-effect would be that people might "fix" their
broken classes without realizing they've just broken their wire
compatibility though. Such a tool should probably emit a message to the
effect of "classes without OOS.dWO/wF should only have transient fields"
instead of "this class should have a OOS.dWO/wF call". While adding
transient to fields of such classes would technically change the
serialized form of the class, that form is never used in any event, so
it's an "okay" change, and decreases the serialized class descriptor
size besides... and in the long term may even allow adding a dWO/dRO
pair in the future, depending on the compatibility needs of the user for
that class.
--
- DML
More information about the core-libs-dev
mailing list