RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v4]
Archie Cobbs
acobbs at openjdk.org
Mon Apr 15 22:15:24 UTC 2024
> `GZIPInputStream` supports reading data from multiple concatenated GZIP data streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In order to do this, after a GZIP trailer frame is read, it attempts to read a GZIP header frame and, if successful, proceeds onward to decompress the new stream. If the attempt to decode a GZIP header frame fails, or happens to trigger an `IOException`, it just ignores the trailing garbage and/or the `IOException` and returns EOF.
>
> There are several issues with this:
>
> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring trailing garbage are not documented, much less precisely specified.
> 2. Ignoring trailing garbage is dubious because it could easily hide errors or other data corruption that an application would rather be notified about. Moreover, the API claims that a `ZipException` will be thrown when corrupt data is read, but obviously that doesn't happen in the trailing garbage scenario (so N concatenated streams where the last one has a corrupted header frame is indistinguishable from N-1 valid streams).
> 3. There's no way to create a `GZIPInputStream` that does _not_ support stream concatenation.
>
> On the other hand, `GZIPInputStream` is an old class with lots of existing usage, so it's important to preserve the existing behavior, warts and all (note: my the definition of "existing behavior" here includes the bug fix in [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>
> So this patch adds a new constructor that takes two new parameters `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is enabled by setting both to true; otherwise, they do what they sound like. In particular, when `allowTrailingGarbage` is false, then the underlying input must contain exactly one (if `allowConcatenation` is false) or exactly N (if `allowConcatenation` is true) concatenated GZIP data streams, otherwise an exception is guaranteed.
Archie Cobbs has updated the pull request incrementally with one additional commit since the last revision:
Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/18385/files
- new: https://git.openjdk.org/jdk/pull/18385/files/32ff1abd..df302a62
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=03
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=18385&range=02-03
Stats: 31 lines in 2 files changed: 0 ins; 0 del; 31 mod
Patch: https://git.openjdk.org/jdk/pull/18385.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/18385/head:pull/18385
PR: https://git.openjdk.org/jdk/pull/18385
More information about the core-libs-dev
mailing list