Explicit Serialization API and Security

Tue Jan 6 11:10:09 UTC 2015

On 6 Jan 2015, at 07:27, Peter Levart <peter.levart at gmail.com> wrote:
Hi Brian,
> 
> On 01/05/2015 09:51 PM, Brian Goetz wrote:
>> Thanks Peter(s): I think it's just about time to re-sync on the goals, since there are an infinitude of issues with serialization, only some of which we can address.
>> 
>> To that end, let me toss out some of the results of our recent explorations with improving serialization, where we started out on something not unlike what Peter is suggesting, and dialed back.
>> 
>> One thing we can do to positively improve the serialization experience is to reduce the set of situations where users have to write explicit readObject() methods.  In the JDK, we end up having to do so for all manner of security-motivated reasons, and such code is hard to write correctly, hard to validate, hard to audit, and hard to evolve.
>> 
>> After an audit of our codebase (thanks Chris!), we identified some of the biggest offenders on this score:
>> 
>> - Validating invariants
>> - Ensuring confinement
>> 
>> The latter comes up in cases where there's a field:
>> 
>> class Foo {
>>    private final Bar bar = new BarImpl();
>> }
>> 
>> which implicitly has an invariant: the Bar referenced by Foo.bar is a) exactly of type BarImpl and not some other subclass and b) there is exactly one reference to that BarImpl (assuming it doesn't escape Foo.)
>> 
>> Note that both of these still use the default deserialization; having to override readObject() just to do some extra checking is error-prone and messy.  So I think the goal of a separate validation method which is called automatically is a good one, since it means we can ditch some percentage of otherwise-needless readObject implementations (which then have to jump through hoops to set final fields.)
> 
> readObject() can be used on both ends of the spectrum or anywhere in-between. It can be used for explicit deserialization or just for validation of default deserialized state. Would separate validation method give us anything more or simplify things? readObject() can be used just as:
> 
> private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {
>    in.defaultReadObject();
>    ...
>    just validation
>    ...
> }

One issue with the above pattern is that defaultReadObject will read and set the fields before the explicit validation code, ‘just validate’, is executed. So if validation fails, throws an appropriate exception, the object may be left in an inconsistent state.

One approach, with existing serialization, is to use the GetField API rather than defaultReadObject. Read the field values into locals and validate before setting. But this again falls foul for final fields.

As an aside: I have been looking at ObjectInputStream, and related classes, recently. The current default implementation of readObject sets non-primitive field values one at a time, without first checking that all their types are assignable. In many cases the setting of field values can be deferred until after they have been validated, at a minimum that the non-primitive types are assignable. Whole hierarchy failure atomicity in many cases, or at a minimum per class descriptor failure atomicity, can be achieved ( either all fields will be set, or contain their default values ). Preliminary webrev [1].

From this exploration, it seems like the natural place for an invariant checker is between the reading and reconstitution of field values, and the assigning of them. If the invariant checker could be called by the serialization mechanism then the users code can use final fields, without needing to muck around with reflection, and also have the added benefit of, in many cases, not potential leaving the object in an inconsistent state.

To me, these benefits seems worthwhile, and justify a separate invariant checker mechanism.

>> I agree that being able to play nicely with final fields is also an important goal here.
> 
> Explicit deserialization and final fields don't play nicely together currently, yes. Sometimes data-racey-publication thread-safety is sacrificed just because explicit deserialization would complicate code too much by using reflection.

Given my above comments, certainly in the case of invariant checking, we should be able to make this experience a little better ( confining the use of reflectively setting final fields and ensuring their safe publication in one place, the Serialization implementation ).

-Chris.

[1] http://cr.openjdk.java.net/~chegar/failureAtomicity/

> Regards, Peter
> 
>> 
>> On 1/5/2015 9:11 AM, Peter Levart wrote:
>>> Hi Peter and others,
>>> 
>>> A assume your approach described here (to use constructors for
>>> deserialization) is motivated mainly by the fact that only constructors
>>> are allowed to set the final fields. Otherwise the explicit features of
>>> your ReadSerial API are more or less accessible even now by using
>>> "serialPersistentFields " static field for declaration of "serialized"
>>> fields, combined with ObjectOutputStream.PutField and
>>> ObjectInputStream.GetField API, accessible from writeObject/readObject
>>> methods. readObject() method already acts like a constructor, whith the
>>> following notable differences:
>>> 
>>> 1 - it is not chained explicitly to superclass (either in source or
>>> automatically by javac) like constructor, but is invoked by
>>> (de)serialization infrastructure in correct order for each class in the
>>> hierarchy that declares it.
>>> 2 - it is a normal method, so it can be called at any time by code in
>>> the declaring class or inner classes.
>>> 3 - it can't set final fields (without using clumsy reflection)
>>> 
>>> 
>>> The 1st point is actually in favor of readObject() method, since the
>>> (de)serialization API need not be caller-sensitive. The correct
>>> per-class context is established before each readObject call-back.
>>> 2nd and 3rd points are readObject drawbacks and are both interconnected.
>>> If readObject was not universally callable, then the restriction that it
>>> can't set final fields could be lifted.
>>> 
>>> Perhaps it is too late from compatibility perspective to change the
>>> readObject's 2nd and 3rd point now, but we could compatibly change the
>>> 1st point of a special kind of constructor.
>>> 
>>> Say that a constructor declared with exactly one parameter of type
>>> ObjectInputStream is treated specially by javac and reflection:
>>> - constructor code can't chain to superclass constructor
>>> - javac does not automatically generate default constructor chaining code
>>> - javac does not allow calling such constructor from any code
>>> - this is optional: javac does not allow calling instance methods and
>>> explicit use of 'this' keyword in deserialization constructor code
>>> (preventing escape of such object before de-serialization ends) - to be
>>> really safe, this would probably have to be accompanied with verifier
>>> checks too.
>>> - public reflection API does not allow invoking such constructor (only
>>> deserialization infrastructure can - like it can invoke the default
>>> constructor of a non-Serializable superclass on an already allocated
>>> instance of a sublclass)
>>> 
>>> No additional changes to VM are apparently necessary (except verifier).
>>> Declaring such constructor as an alternative to readObject() method is
>>> then possible to make deserialization more safe.
>>> 
>>> Regarding finalizability (and finalizer attacks):
>>> 
>>> The serialization specifies that for 1st (most specific)
>>> non-Serializable superclass of a Serializable subclass, default
>>> constructor is invoked to initialize the non-Serializable superclass
>>> state before deserializing subclass state. As each Serializable class
>>> has such a non-Serializable superclass (at least in the form of Object
>>> class) and normal constructors are chained, it is apparent that Object's
>>> default constructor is called before any deserialization of state
>>> begins. As soon as Object's constructor completes, the instance becomes
>>> eligible for finalization:
>>> 
>>>  "An object o is not finalizable until its constructor has invoked the
>>> constructor for Object on o and that invocation has completed
>>> successfully (that is, without throwing an exception).”
>>> 
>>> The rule of calling non-Serializable superclass default constructor
>>> could be weakened. If 1st (most specific) non-Serializable superclass of
>>> a Serializable subclass is Object, then it's constructor is not called.
>>> Obviously it is not needed since Object has no state to initialize. What
>>> we achieve by this is that such object does not become eligible for
>>> finalization just yet. Serialization infrastructure must then explicitly
>>> register such objects for finalization, but only after their
>>> de-serialization completes normally. This would prevent finalization
>>> attacks.
>>> 
>>> Regards, Peter
>>> 
>>> 
>>> On 01/05/2015 01:01 PM, Peter Firmstone wrote:
>>>> My mistake, thank you.
>>>> 
>>>> Peter.
>>>> 
>>>> On 5/01/2015 9:57 PM, David Holmes wrote:
>>>>> Hi Peter,
>>>>> 
>>>>> Did you send this to the list? I haven't seen it turn up yet.
>>>>> 
>>>>> David
>>>>> 
>>>>> On 5/01/2015 6:51 PM, Peter Firmstone wrote:
>>>>>> Thanks Dave,
>>>>>> 
>>>>>> That's another way of checking invariants, you could expose A's check
>>>>>> method but you'd also need a couple of additional static methods to
>>>>>> obtain integers upper and lower from ReadSerial, so that B can ensure
>>>>>> its invariant.
>>>>>> 
>>>>>> ReadSerial is caller sensitive so B can't obtain A's stream values via
>>>>>> ReadSerial and must do so via A's API.  This prevents the
>>>>>> publication of
>>>>>> A's implementation, you don't wan't B depending on A's serial form.
>>>>>> Instead A can reorder and rename it's fields and change their values,
>>>>>> have multiple serial forms and be able to remain compatible with all
>>>>>> forms relatively easily.
>>>>>> 
>>>>>> There are some benefits to using a temporary object instance of A; it
>>>>>> reduces the amount of code required, eg it would be a beneficial for
>>>>>> RMI
>>>>>> to minimise code downloads, the JVM is free to inline access to these
>>>>>> fields and the temporary instance of A never leaves the cpu cache, so
>>>>>> it's not likely to become a performance issue.  Well it might on real
>>>>>> time java :)
>>>>>> 
>>>>>> If circular relationships are mutable, or effectively mutable after
>>>>>> publication, then we could eventually deprecate the requirement to
>>>>>> support special treatment of final fields for Serializable.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Peter.
>>>>>> 
>>>>>> On 5/01/2015 10:38 AM, David Holmes wrote:
>>>>>>> 
>>>>>>>>>      - I don't see how this invariant-checking mechanism can
>>>>>>>>> enforce
>>>>>>>>> invariants between superclass fields and subclass fields.   For
>>>>>>>>> example:
>>>>>>>>> 
>>>>>>>>> class A {
>>>>>>>>>            int lower, upper;   // invariant: lower <= upper
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> class B extends A {
>>>>>>>>>            int cur;   // invariant: lower <= cur <= upper
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> The serialization framework should only construct an objects fields
>>>>>>>> and make these available from ReadSerial, each class then calls a
>>>>>>>> static method before calling a superclass constructor that's
>>>>>>>> responsible for an objects creation, to prevent construction of the
>>>>>>>> object, if necessary.
>>>>>>>> 
>>>>>>>> Eg, some complexity, but bullet proof:
>>>>>>>> 
>>>>>>>> public class A (
>>>>>>>> 
>>>>>>>>     public final int lower, upper;
>>>>>>>> 
>>>>>>>>      private static boolean check(ReadSerial rs){
>>>>>>>>         if (rs.getInt("lower") > rs.getInt("upper"))
>>>>>>>>              throw new IllegalArgumentException(
>>>>>>>>                 "lower bound must be less than or equal to upper");
>>>>>>>>        return true;
>>>>>>>>     }
>>>>>>>> 
>>>>>>>>     public A(ReadSerial rs){
>>>>>>>>         this(check(rs), rs);
>>>>>>>>     }
>>>>>>>> 
>>>>>>>>     private A(boolean checked, ReadSerial rs){
>>>>>>>>         super();
>>>>>>>>         lower = rs.getInt("lower");
>>>>>>>>         upper = rs.getInt("upper");
>>>>>>>>     }
>>>>>>>> 
>>>>>>>> // other constructors omitted must also check invarients
>>>>>>>> }
>>>>>>>> 
>>>>>>>> class B extends A {
>>>>>>>> 
>>>>>>>>     public final int cur;
>>>>>>>> 
>>>>>>>>     private static ReadSerial check(ReadSerial rs) {
>>>>>>>>         A a = new A(rs);
>>>>>>> 
>>>>>>> 
>>>>>>> It doesn't seem practical in general to have to create an instance of
>>>>>>> your supertype to validate the passed in serialized form. Why not
>>>>>>> expose the check method?
>>>>>>> 
>>>>>>> David
>>>>>>> -----
>>>>>>> 
>>>>>>>>         int cur = rs.getInt("cur");
>>>>>>>>         if ( a.lower > cur || cur > a.upper )
>>>>>>>>              throw new IllegalArgumentException(
>>>>>>>>                  "cur outside lower and upper bounds");
>>>>>>>>         return rs;
>>>>>>>>     }
>>>>>>>> 
>>>>>>>>     public B(ReadSerial rs) {
>>>>>>>>         super(check(rs));
>>>>>>>>         cur = rs.getInt("cur");
>>>>>>>>     }
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>