Explicit Serialization API and Security

Mon Jan 12 11:37:06 UTC 2015

On 08/01/15 20:10, Brian Goetz wrote:
>> 1) Validate invariants
>>
>>      A clear and easy to understand mechanism that can validate the
>> deserialized
>>      fields. Does not prevent the use of final fields, as the
>> serialization framework
>>      will be responsible for setting them. Something along the lines
>> of what David
>>      suggested:
>>
>>        private static void validate(GetField fields) {
>>            if (fields.getInt("lo") > fields.getInt("hi")) { ... }
>>       }
>>
>>      This could be a “special” method, or annotation driven. TBD.
>>
>>      Note: the validate method is static, so the object instance is
>> not required to
>>      be created before running the validation.
>
> Sort of...
>
> This is true if the fields participating in the invariant are
> primitives.  But if they're refs, what do you do?  What if you want to
> validate something like
>
>    count == list.size()   // fields are int count, List list
>
> ?  Then wouldn't GetField.getObject have to deserialize the object
> referred to by that field?

Yes it would.

For clarity, I would like to describe how things currently work.

  1) Allocate a new instance of the deserialized type.
  2) Call the first non-Serializable types no-arg constructor
     ( may be j.l.Object ).
  3) For each type in the deserialized types hierarchy, starting
     with the top most ( closest to j.l.Object ),
    3a) create objects representing all fields values for the type
        [this step is recursive and will go to 1 until all
         non-primitive types have been created ]
    3b)  [ holder for invariant validation ]
    3c) assign objects to their respective members of the
        containing instance

[ For simplicity, ignore cyclic references are readObjectXXX for now,
   I will address them separately. ]

Without any user visible side-effects, no readObjectXXX methods, it 
would appear that there is no reason why 1 & 2 must happen before 3a. 
Since objects representing field values are created recursively, then 
all the objects representing the field values are created, per class in 
the hierarchy, before being assigned. If we have no reaObjectXXX 
methods, then the objects being created in 3a could be stored locally, 
repeating 3a as needed, and only assign after all types in the hierarchy 
have been walked. Essentially the sequence of steps could be, 3, 3a+, 
[3b], 1, 2, 3c+.

Given this, an the invariant could be validated at 3b, without the need 
for the creation of the containing object.

Cyclic references: If we encounter a cyclic reference, then we can 
"de-optimize"; stop, create the required instances reachable in the 
graph, fill in whatever fields are currently known, then continue.

readObjectXXX: Since these are instance methods then they must have 
visibility to any deserialized state in the super type. These can be 
handled similarly to cyclic references, but can be determine up front, 
at step 3, rather than in the flow.

-Chris.