Explicit Serialization API and Security

Tue Jan 6 15:03:04 UTC 2015

Hi Chris,

On 01/06/2015 12:10 PM, Chris Hegarty wrote:
> On 6 Jan 2015, at 07:27, Peter Levart <peter.levart at gmail.com> wrote:
> Hi Brian,
>> On 01/05/2015 09:51 PM, Brian Goetz wrote:
>>> Thanks Peter(s): I think it's just about time to re-sync on the goals, since there are an infinitude of issues with serialization, only some of which we can address.
>>>
>>> To that end, let me toss out some of the results of our recent explorations with improving serialization, where we started out on something not unlike what Peter is suggesting, and dialed back.
>>>
>>> One thing we can do to positively improve the serialization experience is to reduce the set of situations where users have to write explicit readObject() methods.  In the JDK, we end up having to do so for all manner of security-motivated reasons, and such code is hard to write correctly, hard to validate, hard to audit, and hard to evolve.
>>>
>>> After an audit of our codebase (thanks Chris!), we identified some of the biggest offenders on this score:
>>>
>>> - Validating invariants
>>> - Ensuring confinement
>>>
>>> The latter comes up in cases where there's a field:
>>>
>>> class Foo {
>>>     private final Bar bar = new BarImpl();
>>> }
>>>
>>> which implicitly has an invariant: the Bar referenced by Foo.bar is a) exactly of type BarImpl and not some other subclass and b) there is exactly one reference to that BarImpl (assuming it doesn't escape Foo.)
>>>
>>> Note that both of these still use the default deserialization; having to override readObject() just to do some extra checking is error-prone and messy.  So I think the goal of a separate validation method which is called automatically is a good one, since it means we can ditch some percentage of otherwise-needless readObject implementations (which then have to jump through hoops to set final fields.)
>> readObject() can be used on both ends of the spectrum or anywhere in-between.
>> It can be used for explicit deserialization or just for validation of default deserialized state.

Well, that "in-between" is actually not really available (unless 
explicit (de)serialization part uses "block data mode", which is not 
very evolvable). But I think this could be enhanced (see below).

>> Would separate validation method give us anything more or simplify things? readObject() can be used just as:
>>
>> private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
>>     in.defaultReadObject();
>>     ...
>>     just validation
>>     ...
>> }
> One issue with the above pattern is that defaultReadObject will read and set the fields before the explicit validation code, ‘just validate’, is executed. So if validation fails, throws an appropriate exception, the object may be left in an inconsistent state.

Right.

What about the following addition to 
ObjectInputStream.GetField/ObjectOutputStream.PutField API. For example:

public class Interval implements Serializable {

     // current fields
     private final int lo, hi;

     private static final ObjectStreamField[] serialPersistentFields = {
         new ObjectStreamField("lo", Integer.TYPE),
         new ObjectStreamField("hi", Integer.TYPE),
         // legacy 'len' field
         new ObjectStreamField("len", Integer.TYPE)
     };

     private static void validate(int lo, int hi) {
         // invariant
         if (lo > hi)
             throw new IllegalArgumentException("lo:" + lo + " > hi:" + hi);
     }

     public Interval(int lo, int hi) {
         validate(lo, hi);
         this.lo = lo;
         this.hi = hi;
     }

     public int getLo() { return lo; }

     public int getHi() { return hi; }

     public int getLen() { return hi - lo; }

     private void readObject(ObjectInputStream in) throws IOException, 
ClassNotFoundException {
         ObjectInputStream.GetField fields = in.readFields(); // this 
already validates the types
         // validate 'lo' and 'hi' fields invariant
         int lo = fields.get("lo", 0);
         int hi = fields.get("hi", 0);
         validate(lo, hi);
         // validate legacy 'len' field
         int len = fields.get("len", hi - lo);
         if (len != hi - lo) throw new IllegalArgumentException("len:" + 
len + " is wrong");
         // set current fields from read data
         fields.defaultReadFields(); // this is new API!
     }

     private void writeObject(ObjectOutputStream out) throws IOException {
         ObjectOutputStream.PutField fields = out.putFields();
         // put current fields
         fields.defaultPutFields(); // this is new API!
         // put legacy len field
         fields.put("len", hi - lo);
         // write-out buffered puts
         out.writeFields();
     }
}

I've taken a quick look at what might be needed to do that, and it 
appears very simple:

http://cr.openjdk.java.net/~plevart/misc/ObjectIOStream.GetPutFields/webrev/

>
> One approach, with existing serialization, is to use the GetField API rather than defaultReadObject. Read the field values into locals and validate before setting. But this again falls foul for final fields.

The above addition to [Get|Put]Fields API is a mix of explicit 
validation and default setters which seems to cover final fields.

>
> As an aside: I have been looking at ObjectInputStream, and related classes, recently. The current default implementation of readObject sets non-primitive field values one at a time, without first checking that all their types are assignable. In many cases the setting of field values can be deferred until after they have been validated, at a minimum that the non-primitive types are assignable. Whole hierarchy failure atomicity in many cases, or at a minimum per class descriptor failure atomicity, can be achieved ( either all fields will be set, or contain their default values ). Preliminary webrev [1].
>
>  From this exploration, it seems like the natural place for an invariant checker is between the reading and reconstitution of field values, and the assigning of them. If the invariant checker could be called by the serialization mechanism then the users code can use final fields, without needing to muck around with reflection, and also have the added benefit of, in many cases, not potential leaving the object in an inconsistent state.
>
> To me, these benefits seems worthwhile, and justify a separate invariant checker mechanism.

The type checking for failure atomicity is a good addition to 
OIS.defaultReadObject() method behaviour. The separate invariant checker 
mechanism would have to expose field values before setting them to the 
fields of object being deserialized. The GetFields API lends itself to 
that purpose. Whether to reserve another private 'validateReadObject' 
special method signature for that purpose is a question. The above is an 
alternative that uses readObject() for both - default field assignment 
and explicit validation.

Regards, Peter

>
>>> I agree that being able to play nicely with final fields is also an important goal here.
>> Explicit deserialization and final fields don't play nicely together currently, yes. Sometimes data-racey-publication thread-safety is sacrificed just because explicit deserialization would complicate code too much by using reflection.
> Given my above comments, certainly in the case of invariant checking, we should be able to make this experience a little better ( confining the use of reflectively setting final fields and ensuring their safe publication in one place, the Serialization implementation ).
>
> -Chris.
>
> [1] http://cr.openjdk.java.net/~chegar/failureAtomicity/
>