Bug in GZIPInputStream.read() causing data loss

Archie Cobbs archie.cobbs at gmail.com
Thu Dec 14 20:15:57 UTC 2023


Hi Louis,

On first glance this looks easy to fix. I've filed a draft PR here (pending
tests) https://github.com/openjdk/jdk/pull/17113

-Archie

On Thu, Dec 14, 2023 at 1:10 PM Louis Bergelson <louisb at broadinstitute.org>
wrote:

> Hello.  This is my first time posting here so I apologize if this is the
> wrong forum.  I wanted to bring up an issue in the GZipInputStream which
> was first identified in 2011, confirmed as a bug, and then never resolved.
>
> When reading certain GZIP files from certain types of InputStreams the
> GZIPInputStream can misidentify the end of the stream and close early
> resulting in silently truncated data.
>
> You can see the bug report which has a detailed description here:
> https://bugs.openjdk.org/browse/JDK-7036144
>
> In short it comes down to incorrect use of the (quite confusing)
> InputStream.available() method to detect the end of stream.  This typically
> works fine with local files, but with network streams that might not have
> bytes available at any given moment it fails nondeterministically.
>
> How could I go about getting this fixed?  I can contribute a patch or
> additional examples if necessary.
>
> Genomics data is typically encoded as block gzipped files, so this comes
> up regularly and causes a lot of confusion.  The workaround is to just not
> use the GZIPInput stream.  It seems like a core java class though so it
> would be nice if it worked.
>
> Thank you,
> Louis
>


-- 
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231214/d6b72f44/attachment.htm>


More information about the core-libs-dev mailing list