RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v8]

Archie Cobbs acobbs at openjdk.org
Tue Jul 30 18:09:35 UTC 2024


On Tue, 30 Jul 2024 17:35:33 GMT, Lance Andersen <lancea at openjdk.org> wrote:

> Based on the above, I am reluctant to change the current behavior given it appears to have been modeled after gzip/gunzip as well as WinZip.

That's a reasonable conclusion... and I have no problem with preserving the existing behavior if that's what we want. But in that case I would lobby that we should also provide some new way to configure a `GZIPInputStream` that guarantees reliable behavior.

The key question here is: "Exactly what current behavior of `new GZIPInputStream(in)` do we want to preserve?"

Let's start by assuming that we want your above test to pass. Putting that into words: "Given a single GZIP stream followed by trailing garbage, `new GZIPInputStream(in)` should successfully decode the GZIP stream and ignore the trailing garbage".

Note however that what `new GZIPInputStream(in)` currently provides is stronger than that:
1. Trailing garbage is ignored
1. Any `IOException` thrown while reading trailing garbage is ignored
1. Concatenated streams are automatically decoded

So we know we want to preserve 1 - What about 2 and/or 3? Your thoughts?

My personal opinions:

* I think 2 is inherently bad and it should not be implemented in any variant
* I think 3 is not required _by default_, but one should be able to enable it somehow

If we were to accept those opinions (preserving only 1), then we would end up at the same place as `GzipCompressorInputStream`:
* Underlying `IOException`'s are never suppressed
* `new GZIPInputStream(in)` decodes only one GIZP stream and ignores any trailing garbage
* `new GZIPInputStream(in, true)` decodes concatenated streams; trailing garbage causes `IOException`

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2258919532


More information about the core-libs-dev mailing list