JEP 187: Serialization 2.0 & Serialization-aware constructors

Thu Jan 23 15:18:45 UTC 2014

Hi David,

This is a nice summary of how object deserialization is working today, and some interesting ideas around serialisation-aware constructors.

It seems there is just too much magic in the construction of deserialized objects. All the field values required to fully construct the object are known, or are in the stream, but the instance is constructed without giving them any consideration, and the stuffing happens after the fact. It would seem that if these fields were made available to a “special” deserializer that the class being reconstructed would have the possibility of constructing itself in a more normal manner ( assuming constructor cooperation up the class hierarchy ).

-Chris. 

On 22 Jan 2014, at 16:33, David M. Lloyd <david.lloyd at redhat.com> wrote:

> On 01/13/2014 06:26 PM, mark.reinhold at oracle.com wrote:
>> Posted: http://openjdk.java.net/jeps/187
> 
> The concept of explicit deserialization constructors is interesting and is something I've explored a little bit in the context of JBoss Marshalling.
> 
> The way construction works today (simple version!), the framework will magic up a new Constructor instance which can construct a partially-initialized object.  By "partially initialized" I mean, only the classes in the non-serializable "top half" of the class hierarchy are initialized, subclass-first like always.  At this point it relies on the language constraints that require that the superclass be initialized as the first step (more or less) of construction, thus effectively reversing initialization order to be superclass-first.
> 
> Now at this point there is an object that was (more or less) initialized from the top (Object) down to the last non-serializable class in the hierarchy (which is often also Object, as it happens).  From here, the deserialization mechanism takes over, using stream information to acquire values and "stuff" them into fields (even final fields) using reflection, in superclass-first order.  Some reflection magic makes sure that final field publication works more or less as expected; some other magic ensures that sensible action is taken for certain types of differences in the sending and receiving class structures.  No initializers are ever invoked for these classes, though you can define a private readObject() method which is a close approximation (as long as you don't have final fields, else you're stuck using reflection too).
> 
> The idea with a serialization-aware constructor is that each serializable class constructor can read the stream information itself and initialize fields the "normal" way, with "normal" validation.
> 
> The simplest/most naive implementation of this is to simply pass in an ObjectInputStream to these constructors.  This approach seems to work fairly well actually, from the user's perspective: each constructor calls to the superclass first, then it acquires (for example) a GetField object for itself and then pulls field data out of it and populates its real fields, much like a readObject() method might do.
> 
> The problem here is that the actual serialization implementation normally gets to hook in between calls to readObject(); it cannot do this for constructors, because each constructor calls the superclass' immediately in a chain.  The framework would have to examine the call stack to know who the actual caller is, and there is also the possibility that the constructors would abuse this contract in various ways, taking advantage of the framework's lack of control.
> 
> In an ideal world (for serialization implementations anyway), constructors would be wholly isolated, which would allow the framework to call each one in sequence with only its safely isolated bit of the stream.  But in the real world, this isn't really possible within the framework of the existing language.
> 
> One concept that might be interesting would be to introduce such isolated instance initializers which do not call up to the superclass but which otherwise follow the general constructor contract.  This would present a very simple solution from the perspective of serialization, though the complexity of such a solution is potentially great.
> 
> Another option is to establish a tighter API which constructors can consume.  The constructor would be able to read field information out of the API but only for its own class, possibly even enforced by call stack inspection.  The constructor would be contractually obligated to propagate the API object to the superclass; the framework would have to enforce that the propagation happened correctly for the class hierarchy (which it would have knowledge of), i.e. ensure the object didn't "cheat" by calling a non-serialization constructor for a serializable superclass.
> 
> Other ideas may be possible as well.  I found this to be an interesting problem when I was exploring it myself, and I still find it pretty interesting.
> -- 
> - DML