Serialization problem
Osvaldo Doederlein
opinali at gmail.com
Sun Jan 31 17:42:30 UTC 2010
It's sad to see this issue of serialization vs. final resurface so many
times. I have complained about this myself a number of times. The 'final'
modifier is counter-intuitive as it doesn't really prohibit modification
(most developers don't know that even a 'private static final' field can be
updated by reflection or JNI, as explicitly allowed by the JLS). On top of
that, Serialization was introduced without enough care for final fields, so
we are effectively forced to drop 'final' for fields that require custom
desserialization. Now these problems will bite us much more often because
the immutable-object technique is being increasingly adopted, sometimes by
whole libraries, or even more radically by newer languages-for-the-JVM like
Clojure (and I guess these languages would love to translate their
semantically-immutable types into immutable JVM-level classes whenever
possible, e.g. when mutable state is not introduced by the compiler as
optimization around extra allocations).
My suggestion (big one I know, perhaps an idea for Java 8...) - add some
mechanism (annotation, type modifier, etc.) that allows to fix/strengthen
the semantics of final and serialization (and more? some suggestions
below...), as follows:
1) Final fields are guaranteed immutable, forever, after construction. They
cannot be changed by magic reflection calls from trusted classes, or even by
JNI calls. (If it's too expensive to write-protect against arbitrary JNI
code, spec the result as 'undefined behavior', and perform the very
expensive check only with -Xcheck:jni.)
2) readObject() and other desserialization helpers can update final fields.
The "final freeze" is defined to happen only after desserialization is
complete. (If readObject() is invokes some helper methods, even if these
methods are in the same class and only used by desserialization, they cannot
assign to final fields; no need to make this check complex.)
3) In final fields of array type, the array elements are also immutable
after the final-freeze. (I know this creates some challenges. Within the
same class, maybe we can do enough effort to make sure that array field is
not assigned/aliased/reflected in a way that would allow modification to
elements. Then we can just decree that the array cannot be escape its class;
so if one wants a getter for the array, it's mandatory to return a copy of
it. This technique - defensive copying - is already a best-practice, so it's
not really extra cost. And the JIT can always eliminate copying in cases
covered by Escape Analysis, e.g. in println(obj.getStuff()[5]), it's trivial
to see that the array copy performed by getStuff() can be avoided at this
particular callsite.)
4) If the class is Serializable, providing serialVersionUID is mandatory.
5) If the class overloads hashCode(), it must overload equals() and
vice-versa.
6) If the class (or some of its superclasses except Object) doesn't overload
hashCode(), a call to hashCode() throws an exception; that is, relying on
Object.hashCode() is banned.
7) Some interaction with JSR-305 (enforcing the semantics of its annotations
further - a real pluggable typesystem)?
8..) More?
The general idea is enforcing as much "Modern POJO Best-Practices" as
possible (without requiring extra code, so I don't propose things such as
mandating hashCode/equals to be overridden). This enforcement should be
hard-line, with detection of uncompliance at both runtime and (when
possible) compile-time. It should be robust enough (no possible
circumvention) so the security system could rely on it to enforce security
concerns without extra runtime checks, and the JIT optimizer could rely on
it to enable aggressive optimizations.
A+
Osvaldo
2010/1/31 Alan Bateman <Alan.Bateman at sun.com>
> Stephen Colebourne wrote:
>
>> I thought I'd raise an issue with serialization that I've had a problem
>> with more than once. Perhaps there is an obvious easy solution, but I can't
>> see it (I can see hard workarounds...)
>>
>> In JSR-310 we have lots of immutable classes. One of these stores four
>> fields:
>>
>> private final String name
>> private final Duration duration
>> private final List<PeriodField> periods
>> private final int hashCode
>>
>> For serialization, I only need to store the name, duration and element
>> zero from the periods list. (The rest of the period list is a cache derived
>> from the first element. Similarly, I want to cache the hash code in the
>> constructor as this could be performance critical.). Storing just these
>> fields can be done easily using writeObject()
>>
> In the JDK there are places that use unsafe's putObjectVolatile to
> workaround this. It's also possible to use reflection hacks in some cases.
> There is more discussion here:
> http://bugs.sun.com/view_bug.do?bug_id=6379948
>
> Doug Lea and the concurrency group were working on a Fences API that
> included a method for safe publication so that one can get the same effects
> as final for cases where it's not possible to declare a field as field.
>
> For the hashCode case above then perhaps it doesn't necessary to compute
> the hash code in the constructor or when reconstituting the object. Instead
> perhaps the hashCode method could compute and set the hashCode field when it
> sees the value is 0 (no need to be volatile and shouldn't matter if more
> than one thread computes it).
>
> -Alan.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20100131/3cdd8982/attachment.html>
More information about the core-libs-dev
mailing list