RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Archie Cobbs acobbs at openjdk.org
Mon Jul 22 19:26:34 UTC 2024


On Mon, 22 Jul 2024 18:08:52 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

>>> The term "extra" here feels somewhat open to interpretation. Specifically, "extra" sounds like something that is out of the ordinary, but not uncommon or wrong. It could be used when describing an optional feature in a format specification.
>>> 
>>> The API description referes to "unexpected data". Perhaps the word "unexpected" is a more precise and descriptive term to use in this option? So ignoreUnexpectedBytes or ignoreUnexpectedData.
>> 
>> This is a good point.. because it's actually a bit subtle what kind of data is being referred to here as being "ignored". Even "unexpected" doesn't quite capture it.
>> 
>> To be precise, here's what bytes the  `ignoreExtraBytes` parameter is going to cause to be ignored:
>> * In the case `allowConcatenation == false`, it really does refer to "extra" or "extraneous" data, that is: one or more bytes having any value appearing _after_ the end of the GZIP trailer frame.
>> * In the case `allowConcatenation == true`, it means a truncated or invalid GZIP _header frame_ appearing after the end of any GZIP trailer frame.
>> 
>> The second case means "unexpected" seems wrong because why would unexpected data in a header frame be treated any differently from unexpected data any other frame (which of course is not going to be ignored but instead will trigger an exception)?
>> 
>> So maybe this is just more evidence that we shouldn't use boolean parameters - because they're not really orthogonal.
>> 
>> What about something like:
>> 
>> 
>> public enum ConcatPolicy {
>>     DISALLOW_STRICT,    // one GZIP stream, nothing else
>>     DISALLOW_LENIENT,   // one GZIP stream, ignore any extra byte(s)
>>     ALLOW_STRICT,       // N GZIP streams, nothing else
>>     ALLOW_LENIENT;      // N GZIP streams, stop at, and ignore, an invalid header frame
>> }
>> 
>> The legacy behavior would of course correspond to `ALLOW_LENIENT`.
>
> Another thought:
> 
> Are we sure we would want to introduce a new mode now which does _not_ allow concatenation, but _does_ ignore trailing data?
> 
> If the ignore mode is really discouraged, why open a new mode of config allowing it?
> 
> In other words, could we instead (at least conceptually)  have the policies LEGACY, SINGLE_STREAM, CONCATENATED_STREAM where the latter two always reject trailing data?

> Are we sure we would want to introduce a new mode now which does not allow concatenation, but does ignore trailing data?

I don't see it as necessary. It's certainly not a case that anyone is currently relying on. So it's fine with me to omit it.

If are calling this a "concatenated stream policy" then we could have:

public enum ConcatPolicy {
    DISALLOW,
    ALLOW,
    ALLOW_LENIENT;
}

(I'm avoiding `LEGACY` which makes sense today but might make less sense N years from now.)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1687042483


More information about the core-libs-dev mailing list