RFR 9: JEP 290: Filter Incoming Serialization Data

Wed Jul 20 14:44:55 UTC 2016

Hi Peter,

On 7/20/2016 4:19 AM, Peter Levart wrote:
> Hi Roger,
>
> On first reading, I have the following thoughts:
>
> - The name "ObjectInputFilter" makes me think that it is a function 
> that "filters" the input stream (like a Predicate in 
> Stream::filter(Predicate)), but it is in fact a validator that 
> terminates deserialization on 1st rejection. So perhaps a different 
> name is in order - ObjectInputValidator ?
'Filterin'g is used in some internet use cases as blocking packets, 
connections, etc. but it is not perfect.
There is already a Validation concept in 
ObjectInputStream.registerValidation 
<http://download.java.net/java/jdk9/docs/api/java/io/ObjectInputStream.html#registerValidation-java.io.ObjectInputValidation-int->() 
that is a hook for validating
multiple objects after the graph is reconstructed.  And we would want to 
avoid any ambiguity in terminology.

I also considered 'audit' as a possible verb but found support for 'filter'.

>
> - I haven't found in the public javadocs, an explanation of what 
> happens when the filter returns ALLOWED, REJECTED or UNDECIDED. Docs 
> just say that the deserialization is terminated (on UNDECIDED too?) 
> but not with what exception (there is some explanation on 
> OIS::filterCheck, but this is a private method).
>
> - The crux of behavioral docs is on the OIS::setObjectInputFilter 
> method. I would expect it to be on the ObjectInputFilter class, but I 
> understand that OIS subclasses might have a different behavior. How do 
> they behave indeed? For example IIOPInputStream does not use the 
> filter, right?
oops, an omission.  As implemented in filterCheck, it should be 
documented  as throwing IllegalClassException
if the filter returns REJECTED or throws an exception.  The bias is 
slightly toward accepting inputs that
are not specifically identified as rejected.

IIOPInputStream could use the serialFilter, but the implementation would 
be different.
The conditions describing calls to the filter are still appropriate and 
could be implemented
by IIOPIS. Subclasses could be given flexibility in how or whether the 
filter is used.
I'll take another look at IIOPIS.

>
> - I had some trouble to precisely understand the behavior from the 
> docs alone. The following in OIS::setObjectInputFilter:
>
> 1174      * @implSpec
> 1175      * The filter, when {@code non-null}, is invoked during 
> {@linkplain #readObject()}
> 1176      * for each object (regular or class) in the stream including 
> the following:
> 1177      * <ul>
> 1178      *     <li>each object reference previously deserialized from 
> the stream,
> 1179      *     <li>each regular class,
> 1180      *     <li>each interface of a dynamic proxy and the dynamic 
> proxy class itself,
> 1181      *     <li>each array is filtered using the array size and 
> the type of the array,
> 1182      *     <li>each object replaced by its class' {@code 
> readResolve} method
> 1183      *         is filtered using the replacement object's class 
> and if it is an array, the length,
> 1184      *     <li>and each object replaced by {@linkplain 
> #resolveObject resolveObject}
> 1185      *         is filtered using the replacement object's class 
> and if it is an array, the length.
> 1186      * </ul>
> 1187      *
> 1188      * When  the {@link ObjectInputFilter#checkInput checkInput} 
> method is invoked
> 1189      * it is passed the current class, (null if no class),
>
> ...does not specify when the passed-in class might be "null". Reading 
> the implementation, I see it is null when a back reference to 
> previously deserialized object is read from stream, but javadocs are 
> not clear about that.
good point, that should be clarified in each of the cases.
>
> - I wonder if invoking the filter for each interface of a dynamic 
> proxy is necessary (other properties passed to the filter don't change 
> during iteration through the interfaces and each interface call-back 
> is not an indicator that an object is about to be read-in next). This 
> is not uniform with other objects where the filter is invoked only 
> once. Why is a dynamic proxy so special? If one wants to check the 
> proxy interfaces in the filter, she can obtain them manually:
>
> if (Proxy.isProxyClass(clazz)) {
>     for (Class<?> intf : class.getInterfaces()) {
>         ...
>     }
> }
It allows filters to be simpler, with fewer special cases to implement 
and therefore more reliable.
>
> - The docs might be more clear about when precisely the filter is 
> invoked (i.e. after the type of the object and possible length of 
> array or the back reference has already been read from the stream, but 
> the object state has not been read yet). This is important to 
> correctly interpret the streamBytes parameter. The docs might also be 
> more clear about when the nRefs is incremented (it says: "for each 
> call". Is it before or after the call?).
Some flexibility should be allowed in the implementation. Without extra 
overhead it will point into the middle
of some sequence in the stream, not to a typecode. The intention of 
streamBytes was to give
an indication that the stream was larger than expected, not a precise 
location in the stream.
Similarly with depth and nRefs, the filter is not likely to know exactly 
how big or how deep or how many
references are expected but can only check for values that are known to 
be out of bounds.
>
> - What is the purpose of the UNDECIDED return? I suspect it is meant 
> to be used in some filter implementation that delegates the validation 
> to some "parent" filter and respects its decision unless it is 
> UNDECIDED in which case it decides (or does not) on its own. Should 
> such strategy be mentioned in the docs to encourage inter-operable 
> filter implementations?
Yes,  some simple filters might be for purposes of black-listing or 
white-listing.
The pattern based filters, as produced by 
ObjectInputFilter.createFilter(patterns), can simply represent
white or black listing, but if none of the patterns match, it can only 
report UNDECIDED.

A custom filter, should check if there is a process-wide filter 
configured and invoke it first.
Returning its status unless it is UNDECIDED and in that case use its own 
logic to determine the status.

Definitely worthy of an @apiNote in ObjectInputFilter.

>
> - The call-back is invoked after the type of the object and possible 
> array length is read from stream but before the object's state is 
> read. Suppose that the object that is about to be read is either 
> Externalizable object or an object with a readObject() method(s) that 
> consume block data from the stream. This block data can be large. 
> Should there be a call-back to "announce" the block data too? (for 
> example, when the 'clazz' is null and the 'size' is 0, the call-back 
> reports a back-reference to a previously read object, but when the 
> 'clazz' is null and the 'size' > 0, it announces the 'size' bytes of 
> block data. Does this make sense?)
Interesting case, I'll take another look at that. Since block data 
records are <= 1024, a filter might not
have enough information to make an informed decision.  Those bytes would 
show up in
the stream bytes but not until the next object is read.
>
> That's it for the start. If I notice something else, I'll post again.

Thanks, Roger

>
> Regards, Peter
>
>
> On 07/19/2016 04:02 PM, Roger Riggs wrote:
>> Please review the design, implementation, and tests of JEP 290: 
>> Filter Incoming Serialization Data[1]
>>
>> It allows incoming streams of object-serialization data to be 
>> filtered in order to improve both security and robustness.
>> The JEP[1] has more detail on the background and scope.
>>
>> The core mechanism is a filter interface implemented by serialization 
>> clients and set on an |ObjectInputStream|. The filter is called 
>> during the deserialization process to validate the classes being 
>> deserialized, the sizes of arrays being created, and metrics 
>> describing stream length, stream depth, and number of references as 
>> the stream is being decoded.
>>
>> A process-wide filter can be configured that is applied to every 
>> ObjectInputStream.
>> The API of ObjectInputStream can be used to set a custom filter to 
>> supersede or augment the process-wide filter.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~rriggs/webrev-serial-filter-jdk9-8155760/
>>
>> SpecDiff:
>> http://cr.openjdk.java.net/~rriggs/filter-diffs/overview-summary.html
>>
>> Javadoc (subset)
>> http://cr.openjdk.java.net/~rriggs/filter-javadoc/java/io/ObjectInputStream.html 
>>
>> http://cr.openjdk.java.net/~rriggs/filter-javadoc/java/io/ObjectInputFilter.html 
>>
>>
>> Comments appreciated, Roger
>>
>> [1] JEP 290:   https://bugs.openjdk.java.net/browse/JDK-8154961
>>
>