RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v4]

Eirik Bjorsnos duke at openjdk.org
Sun Dec 17 13:51:45 UTC 2023


On Fri, 15 Dec 2023 21:13:07 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:

>> `GZIPInputStream`, when looking for a concatenated stream, relies on what the underlying `InputStream` says is how many bytes are `available()`. But this is inappropriate because `InputStream.available()` is just an estimate and is allowed (for example) to always return zero.
>> 
>> The fix is to ignore what's `available()` and just proceed and see what happens. If fewer bytes are available than required, the attempt to extend to another stream is canceled just as it was before, e.g., when the next stream header couldn't be read.
>
> Archie Cobbs has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Address third round of review comments.

The current behavior of allowing/ignoring trailing malformed data seems to have a complicated history:

- GZipInputStream was updated to throw ZipExeption instead of IOException on malformed GZIP data in [4263582](https://bugs.openjdk.org/browse/JDK-4263582)
- The ability to read concatenated GZ files was added in [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425) This change interestingly also introduced the current behavior of ignoring any trailing malformed data in the stream. 
- [7021870](https://bugs.openjdk.org/browse/JDK-7021870) fixed a bug where GZipInputStream closed the underlying input stream. The change also introduced the test GZIPInZip which verified that reads from a wrapped ZipInputStream does not close the stream
- Some months later GZIPInZip was updated in fix a test failure, but the change also modified the test to verifiy that malformed trailing data was ignored. The JBS issue is not available to me: [JDK-8023431](https://bugs.openjdk.org/browse/JDK-8023431)
- Soon after this, GZIPInZip was again updated to fix test failure, this time removing the use of piped streams and threads. The JBS issue is not available to me: [JDK-8026756](https://bugs.openjdk.org/browse/JDK-8026756)

The current behavior of ignoring trailing malformed data does not seem to be specified in the API. On the contrary, the read methods are specified to throw ZipException for corrupt input data:


     * @throws    ZipException if the compressed input data is corrupt.
     * @throws    IOException if an I/O error has occurred.
     *
     */
    public int read(byte[] buf, int off, int len) throws IOException {


Not sure whether it is worthwhile to change this long-standing behavior of GZIpInputStream.  But it could perhaps be noted somehow in the API documentation? (To be clear, that would be for a different PR/issue/CSR)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1859177655


More information about the core-libs-dev mailing list