RFR 9: JEP 290: Filter Incoming Serialization Data

Fri Jul 22 20:55:55 UTC 2016

Hi Peter,

A filter callback based on blockdata records would more frequently 
inform the filter
of progress in the stream and the amount of data being consumed is 
already available
from streamBytes.  That could improve detection of excessive data 
without complicating
the interface and it would not be all that useful to pass the size of 
individual block data records.
And it only helps in the case of a class with exclusively primitive data 
and poor
checks on the data stream. As far as I can see, it increases the number 
of filter callbacks
quite a bit and adds little value.
The current thresholds depth, number of refs, array size, and stream 
size can identify
out of bounds conditions.  I don't want argue that it can't happen but 
we're definitely
down the slippery slope beyond what is needed if it tries to expose the 
shape of the graph or
what bytes go where.

Roger

On 7/22/2016 3:00 AM, Peter Levart wrote:
> Hi Roger,
>
>
> On 07/21/2016 08:19 PM, Roger Riggs wrote:
>>>>> - The call-back is invoked after the type of the object and 
>>>>> possible array length is read from stream but before the object's 
>>>>> state is read. Suppose that the object that is about to be read is 
>>>>> either Externalizable object or an object with a readObject() 
>>>>> method(s) that consume block data from the stream. This block data 
>>>>> can be large. Should there be a call-back to "announce" the block 
>>>>> data too? (for example, when the 'clazz' is null and the 'size' is 
>>>>> 0, the call-back reports a back-reference to a previously read 
>>>>> object, but when the 'clazz' is null and the 'size' > 0, it 
>>>>> announces the 'size' bytes of block data. Does this make sense?)
>>>> Interesting case, I'll take another look at that. Since block data 
>>>> records are <= 1024, a filter might not
>>>> have enough information to make an informed decision.  Those bytes 
>>>> would show up in
>>>> the stream bytes but not until the next object is read.
>>>
>>> ...which could be to late. If the filter is to be also used as a 
>>> defense against forged streams that try to provoke DOS by triggering 
>>> frequent GCs and OutOfMemoryError(s), then such call-back that 
>>> announces each block data record could help achieve that.
>> Individual block data lengths are not very informative since block 
>> data can be segmented but
>> a cumulative (for the whole stream) block data size suitable for a 
>> callback from the
>> start of each block data segment might be useful.
>
> Is it possible to identify which block data records "belong" to a 
> particular object? If yes, then perhaps cumulative sum(s) of block 
> data sizes for a particular object could be passed to the call-back 
> together with the Class of the object the data belongs to (similar to 
> how array is reported, the size would be cumulative block data size 
> read so-far and the filter could distinguish such callback from array 
> callback by inspecting clazz.isArray()). In conjunction with 
> cumulative size of the whole stream which is already passed now, I 
> think this is enough to implement all kinds of safety-belts.
>
> If you think that such addition starts to become complex from 
> combinatorial standpoint then what about passing an additional enum 
> argument identifying a particular "event"? It would then be easy do 
> document the rest of parameters for each event type...
>
> Regards, Peter
>