Bug in GZIPInputStream.read() causing data loss
Archie Cobbs
archie.cobbs at gmail.com
Thu Dec 14 20:15:57 UTC 2023
Hi Louis,
On first glance this looks easy to fix. I've filed a draft PR here (pending
tests) https://github.com/openjdk/jdk/pull/17113
-Archie
On Thu, Dec 14, 2023 at 1:10 PM Louis Bergelson <louisb at broadinstitute.org>
wrote:
> Hello. This is my first time posting here so I apologize if this is the
> wrong forum. I wanted to bring up an issue in the GZipInputStream which
> was first identified in 2011, confirmed as a bug, and then never resolved.
>
> When reading certain GZIP files from certain types of InputStreams the
> GZIPInputStream can misidentify the end of the stream and close early
> resulting in silently truncated data.
>
> You can see the bug report which has a detailed description here:
> https://bugs.openjdk.org/browse/JDK-7036144
>
> In short it comes down to incorrect use of the (quite confusing)
> InputStream.available() method to detect the end of stream. This typically
> works fine with local files, but with network streams that might not have
> bytes available at any given moment it fails nondeterministically.
>
> How could I go about getting this fixed? I can contribute a patch or
> additional examples if necessary.
>
> Genomics data is typically encoded as block gzipped files, so this comes
> up regularly and causes a lot of confusion. The workaround is to just not
> use the GZIPInput stream. It seems like a core java class though so it
> would be nice if it worked.
>
> Thank you,
> Louis
>
--
Archie L. Cobbs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20231214/d6b72f44/attachment.htm>
More information about the core-libs-dev
mailing list