Explicit Serialization API and Security

Thu Jan 8 17:09:12 UTC 2015

Peter, David

I would like to see how far we can push the existing Serialization mechanism, before embarking on a road that may lead us to substantially different one. Whether that be constructor based, to otherwise ( we know we will need a replacement for Unsafe.allocateInstance, for third-party Serialization libraries, in the future ).

Just to summarise the potential improvements that have been discussed so far ( I can create JIRA issues for these ):

1) Validate invariants

    A clear and easy to understand mechanism that can validate the deserialized
    fields. Does not prevent the use of final fields, as the serialization framework
    will be responsible for setting them. Something along the lines of what David
    suggested: 

      private static void validate(GetField fields) {
          if (fields.getInt("lo") > fields.getInt("hi")) { ... }
     }

    This could be a “special” method, or annotation driven. TBD.

    Note: the validate method is static, so the object instance is not required to
    be created before running the validation.

2) Failure atomicity for defaultReadObject and readFields

    In many cases the setting of field values can be deferred until after they have
    been validated, at a minimum that the non-primitive types are assignable.
    Whole hierarchy failure atomicity in many cases ( where there are not
    “special readObjectXXX methods ), or at a minimum per class failure
    atomicity, can be achieved ( either all fields will be set, or contain their default
    values ).

    Coupled with 1) above this could work well.

3) Finalization

    It is clear that the finalization attack is an issue for deserialized objects.
    I think that a deserialized object should not be “finalizable” until after
    a certain point in its construction. I would like to investigate further the
    possibility of making the VM aware of the first no-args default constructor,
    or j.l.Object, being called by the serialization mechanism, and possibly
    treating it differently. Or maybe spin “field holder” classes for finalizable
    types being deserialized, and deserialze into them first. This doesn’t need
    to be ultra-performant.

4) Confinement

    A new mechanism to ensure that a deserialized field value is not shared,
    and loaded by the same loader, or parent of, the object being
    constructed’s loader.

    This will eliminate the need to define a readObject to essentially copy
    the deserialized field value, as it cannot always be trusted. And again
    the issue with final fields. Something along the lines of:

     class Foo {
         @Confined
         private final Bar bar = new BarImpl();
     }

    @Confined could be placed on fields, or possibly classes.

There may have been more, sorry if I missed anything.

-Chris.

On 8 Jan 2015, at 15:32, David M. Lloyd <david.lloyd at redhat.com> wrote:

> On 01/08/2015 02:32 AM, Peter Firmstone wrote:
>> Thank you all for participating in this discussion.
>> 
>> Initially a constructor signature for deserialization was proposed to
>> enforce invariants and encapsulation, however there appears to be
>> interest in using alternative methods, although they appear to be
>> improvements over the status quo, I'm having trouble working out how to:
>> 
>>    * Enforce intra class invariants with alternate proposals without
>>      breaking encapsulation.
>>    * How the static method proposal can be used to replace final fields
>>      referents without needing to implement readObject().
>>    * And how the alternative GetFields readObject() implementation can
>>      enforce invariants and prevent finalizer attack, while also
>>      allowing subclassing?
>> 
>> What started me down this path, is our project (Apache River) is heavily
>> dependant on Serialization, I wanted to create a secure
>> ObjectInputStream subclass that restricted deserialized classes to those
>> I had audited.  My aim was to allow deserialization to enforce security
>> using verification, similar to how an http server verifies its input.
> 
> Now we're getting down to brass tacks. :-)
> 
> Maybe I missed this part of the discussion, but can you elaborate on why ObjectInputValidation is inadequate for validating intra-class invariants?  I was thinking this is really the only way to effectively achieve intra-class validation, because (barring complex graph traversal algorithms) there isn't really any better way to guarantee that any given subgraph of the object graph is actually fully initialized.
> 
> As far as replacing final fields, I think the seed of the best idea came from Chris Hegarty - the problem isn't the readObject() method, the problem is that there's no way to initialize your final fields if you use it.  By adding a "defaultReadFields()" (name not important) method like he proposes, you can validate your single class.  But I don't see any reason why this couldn't be taken another step farther, and allow the readObject method to actually change, add, and remove fields from its internal map before doing so.  This would allow readObject to arbitrarily transform its content, and assign to final fields, while only minimally changing the API and implementation.
> 
>> Unfortunately before I can achieve the goal of secure deserialization,
>> there's a denial of service issue in ObjectInputStream: I'd also like to
>> propose a system property, to allow limiting the size of arrays, created
>> during deserialization.   Presently I can craft an inputstream to take
>> down the JVM, if I can do it, so can an attacker, I'd like to get that
>> fixed if possible.
> 
> I agree this would be a very good (and easy to implement) idea.  You can limit the underlying stream but that does you no good if the attacker can cause the construction of any number of very large objects before the end of stream is hit.
> 
> I wonder if this could be as simple as adding a protected method on ObjectInputStream in the vein of:
> 
>  protected boolean validateArrayCreation(Class<?> elementType, int dimension) {
>      return true;
>  }
> 
> Or elementType could be arrayType; I don't think it matters too much. Method returns false and InvalidObjectException is thrown. Implementations could do a simple global max array size, different max sizes by primitive vs ref, different by type, or even have a "budget" of heap and use a combination of this method and overridden resovleClass() to create a rough estimate of memory usage, failing at a heuristic threshold.
> 
>> I've attached a text file that contains three classes using the original
>> proposed serial constructor, A, B and C which have intra class
>> invariants that must be satiisfied.  C also contains a circular link and
>> implements an interface called Circular, the Serialization framework
>> calls it after construction to provide access to fields with circular
>> links.  Can the other proposals provide similar safety?
>> 
>> Notes:
>> 
>>   1. The class with the circular link is final and every method checks
>>      if invariants are satisfied.
>>   2. The method with @SerialConstructor annotation, is the only
>>      constructor called by the serialization frame work, other than the
>>      Circular interface method, there are no other methods that need to
>>      be implemented.
>>   3. Encapsulation is preserved, classes have full control over their
>>      invariants, their internal implementation and what's serialized.
> 
> I was previously firmly of the opinion that a serialization constructor is the best way to accomplish this, but this discussion thread has at least given me quite some doubt if not outright changed my mind, though I think there's still an unanswered question regarding validation (above).
> 
> If nothing else, this discussion has yielded some good ideas for enhancements to JBoss Marshalling. :-)
> 
> -- 
> - DML