Bug in GZIPInputStream.read() causing data loss

Louis Bergelson louisb at broadinstitute.org
Thu Dec 14 19:08:48 UTC 2023


Hello.  This is my first time posting here so I apologize if this is the
wrong forum.  I wanted to bring up an issue in the GZipInputStream which
was first identified in 2011, confirmed as a bug, and then never resolved.

When reading certain GZIP files from certain types of InputStreams the
GZIPInputStream can misidentify the end of the stream and close early
resulting in silently truncated data.

You can see the bug report which has a detailed description here:
https://bugs.openjdk.org/browse/JDK-7036144

In short it comes down to incorrect use of the (quite confusing)
InputStream.available() method to detect the end of stream.  This typically
works fine with local files, but with network streams that might not have
bytes available at any given moment it fails nondeterministically.

How could I go about getting this fixed?  I can contribute a patch or
additional examples if necessary.

Genomics data is typically encoded as block gzipped files, so this comes up
regularly and causes a lot of confusion.  The workaround is to just not use
the GZIPInput stream.  It seems like a core java class though so it would
be nice if it worked.

Thank you,
Louis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231214/35025ffd/attachment-0001.htm>


More information about the core-libs-dev mailing list