RFR: 7036144: GZIPInputStream readTrailer uses faulty available() test for end-of-stream [v6]
Jaikiran Pai
jpai at openjdk.org
Mon Feb 26 06:53:56 UTC 2024
On Mon, 5 Feb 2024 23:53:06 GMT, Archie Cobbs <acobbs at openjdk.org> wrote:
>> `GZIPInputStream`, when looking for a concatenated stream, relies on what the underlying `InputStream` says is how many bytes are `available()`. But this is inappropriate because `InputStream.available()` is just an estimate and is allowed (for example) to always return zero.
>>
>> The fix is to ignore what's `available()` and just proceed and see what happens. If fewer bytes are available than required, the attempt to extend to another stream is canceled just as it was before, e.g., when the next stream header couldn't be read.
>
> Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
> - Merge branch 'master' into JDK-7036144
> - Merge branch 'master' into JDK-7036144
> - Address third round of review comments.
> - Address second round of review comments.
> - Address review comments.
> - Fix bug in GZIPInputStream when underlying available() returns short.
Hello Archie, the proposal to not depend on the `available()` method of the underlying `InputStream` to decide whether to read additional bytes from the underlying stream to detect the "next" header seems reasonable.
What's being proposed here is that we proceed and read the underlying stream's few additional bytes to detect the presence or absence of a GZIP member header and if that attempt fails (with an IOException) then we consider that we have reached the end of GZIP stream and just return back.
For this change, I think we would also need to consider whether we should "unread" the read bytes from the `InputStream` if those don't correspond to a "next" GZIP member header. That way any underlying `InputStream` which was implemented in a way that it would return availability as 0 when it knew that the GZIP stream was done and yet had additional (non GZIP) data to read on the underlying stream, would still be able to read that data after this change. It's arguable whether we should have been doing that "unread" even when we were doing the `available() > 0` check and the decision that comes out of https://bugs.openjdk.org/browse/JDK-8322256 might cover that.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17113#issuecomment-1963429706
More information about the core-libs-dev
mailing list