RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics

Eirik Bjørsnøs eirbjo at openjdk.org
Fri Aug 30 10:31:48 UTC 2024


On Fri, 30 Aug 2024 07:27:11 GMT, Eirik Bjørsnøs <eirbjo at openjdk.org> wrote:

> Please review this PR with picks up on the excellent work done by @archiecobbs in #18385
> 
> The proposed changes aim to solve two issues with the current `java.util.zip.GZIPInputStream`:
> 
> *  The class parses multiple concatenated GZIP files as a single stream. This behavior is not documented in the API  specification.
> *  Any additional bytes following a trailer which do not form a valid header are discarded and the stream behaves as if the end of stream has been reached. This behavior is not documented in the API  specification.
> 
> Testing:
> 
> * A new test `GZIPInputStreamConcat` verifies the behaviors being specified in this PR 
> * A new test `GZIPInputStreamGzipCommand` verifies decompression of various GZIP files created using the `gzip` command.

@LanceAndersen @jaikiran 

I have updated the API documentation in this PR inspired by the following comment from @jaikiran in Archie's PR:

https://github.com/openjdk/jdk/pull/18385#issuecomment-2265378324

I aimed to keep this at a high level, avoiding any details of the GZIP file format and the parsing logic involved in the implementation:


 * <p>
 * The InputStream passed to the constructor of this class may represent a
 * single GZIP file or multiple consecutive GZIP files. When the end of a
 * GZIP file is immediately followed by a new GZIP file, this class continues
 * to decode compressed data into a single, concatenated stream of uncompressed
 * data. Otherwise, any additional trailing bytes following a GZIP file are
 * discarded as if the end of stream is reached.


What do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20787#issuecomment-2320776918


More information about the core-libs-dev mailing list