RFR: 8374644: Regression in GZIPInputStream performance after JDK-7036144

Jaikiran Pai jpai at openjdk.org
Wed Jan 7 13:38:22 UTC 2026


Can I please get a review of this change which proposes to address a performance regression in `java.util.zip.GZIPInputStream`? 

In JDK 23, we updated `GZIPInputStream` to no longer rely on `InputStream.available()` method when determining if a `InputStream` consists of additional (concatenated) GZIP streams. That was done in https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of `InputStream.available()` with explicit call(s) to `InputStream.read()` to detect and read any additional concatenated GZIP stream header.

Given how old that existing code in `GZIPInputStream` was, efforts were made to ensure that it wouldn't break existing applications, for the common use case. A release note too was added in an effort to highlight this change.

Unfortunately, when reviewing and experimenting with this change I didn't realize that when we try to detect additional bytes in the stream which has already reached EOF, the `EOFException` that gets raised by this code could contribute to a performance degradation. The code does handle this exception appropriately, but it does mean that in the common case of a stream formed of just one GZIP stream, we would now end up constructing an `EOFException` in this code path.

Construction of `Throwable` instances involves filling the stacktrace into that instance and is a relatively expensive operation. As a result, we ended up regressing the performance of this code path for the common use case of parsing a non-concatenated GZIP stream.

The commit in this PR addresses this issue by introducing an alternate code path in the (private) `readHeader()` method to prevent an `EOFException` from being generated when the stream has already reached EOF and we are trying to determine if there's any additional concatenated GZIP stream header in the stream. This addresses the performance regression as well as lets us retain the changes that we had done in JDK-7036144 to prevent the usage of `InputStream.available()`.

The reporter of this performance regression provided a JMH benchmark. I've run it against JDK 22 (the version that doesn't contain the JDK-7036144 change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) and with this change proposed in this PR. I can see that with this change, the performance numbers improve and go back to the JDK 22 levels. I haven't included that JMH benchmark in this PR and am looking for inputs on whether we should include it.

No new regression test has been introduced and tier1, tier2 and tier3 testing is currently in progress.

-------------

Commit messages:
 - 8374644: Regression in GZIPInputStream performance after JDK-7036144

Changes: https://git.openjdk.org/jdk/pull/29092/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8374644
  Stats: 41 lines in 1 file changed: 33 ins; 1 del; 7 mod
  Patch: https://git.openjdk.org/jdk/pull/29092.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29092/head:pull/29092

PR: https://git.openjdk.org/jdk/pull/29092


More information about the core-libs-dev mailing list