RFR: 8374644: Regression in GZIPInputStream performance after JDK-7036144 [v3]
Jaikiran Pai
jpai at openjdk.org
Fri Jan 9 06:26:51 UTC 2026
On Thu, 8 Jan 2026 13:24:57 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:
>> Can I please get a review of this change which proposes to address a performance regression in `java.util.zip.GZIPInputStream`?
>>
>> In JDK 23, we updated `GZIPInputStream` to no longer rely on `InputStream.available()` method when determining if a `InputStream` consists of additional (concatenated) GZIP streams. That was done in https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of `InputStream.available()` with explicit call(s) to `InputStream.read()` to detect and read any additional concatenated GZIP stream header.
>>
>> Given how old that existing code in `GZIPInputStream` was, efforts were made to ensure that it wouldn't break existing applications, for the common use case. A release note too was added in an effort to highlight this change.
>>
>> Unfortunately, when reviewing and experimenting with this change I didn't realize that when we try to detect additional bytes in the stream which has already reached EOF, the `EOFException` that gets raised by this code could contribute to a performance degradation. The code does handle this exception appropriately, but it does mean that in the common case of a stream formed of just one GZIP stream, we would now end up constructing an `EOFException` in this code path.
>>
>> Construction of `Throwable` instances involves filling the stacktrace into that instance and is a relatively expensive operation. As a result, we ended up regressing the performance of this code path for the common use case of parsing a non-concatenated GZIP stream.
>>
>> The commit in this PR addresses this issue by introducing an alternate code path in the (private) `readHeader()` method to prevent an `EOFException` from being generated when the stream has already reached EOF and we are trying to determine if there's any additional concatenated GZIP stream header in the stream. This addresses the performance regression as well as lets us retain the changes that we had done in JDK-7036144 to prevent the usage of `InputStream.available()`.
>>
>> The reporter of this performance regression provided a JMH benchmark. I've run it against JDK 22 (the version that doesn't contain the JDK-7036144 change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) and with this change proposed in this PR. I can see that with this change, the performance numbers improve and go back to the JDK 22 levels. I haven't included that JMH benchmark in this PR and am looking for inputs on whether we should include it.
>>
>> No n...
>
> Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
> - merge latest from master branch
> - add a comment explaining the "true" value being passed to readHeader
> - review suggestion - assert not needed for now
> - add a comment to clarify the short value being read for GZIP magic header
> - use -1 to represent absence of a GZIP header, from readHeader() method
> - 8374644: Regression in GZIPInputStream performance after JDK-7036144
Thank you Alan and Lance for the reviews. tier1, tier2 and tier3 testing completed without issues. I'll go ahead and integrate this now.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/29092#issuecomment-3727396701
More information about the core-libs-dev
mailing list