RFR: 8374644: Regression in GZIPInputStream performance after JDK-7036144 [v2]
Jaikiran Pai
jpai at openjdk.org
Wed Jan 7 14:08:02 UTC 2026
> Can I please get a review of this change which proposes to address a performance regression in `java.util.zip.GZIPInputStream`?
>
> In JDK 23, we updated `GZIPInputStream` to no longer rely on `InputStream.available()` method when determining if a `InputStream` consists of additional (concatenated) GZIP streams. That was done in https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of `InputStream.available()` with explicit call(s) to `InputStream.read()` to detect and read any additional concatenated GZIP stream header.
>
> Given how old that existing code in `GZIPInputStream` was, efforts were made to ensure that it wouldn't break existing applications, for the common use case. A release note too was added in an effort to highlight this change.
>
> Unfortunately, when reviewing and experimenting with this change I didn't realize that when we try to detect additional bytes in the stream which has already reached EOF, the `EOFException` that gets raised by this code could contribute to a performance degradation. The code does handle this exception appropriately, but it does mean that in the common case of a stream formed of just one GZIP stream, we would now end up constructing an `EOFException` in this code path.
>
> Construction of `Throwable` instances involves filling the stacktrace into that instance and is a relatively expensive operation. As a result, we ended up regressing the performance of this code path for the common use case of parsing a non-concatenated GZIP stream.
>
> The commit in this PR addresses this issue by introducing an alternate code path in the (private) `readHeader()` method to prevent an `EOFException` from being generated when the stream has already reached EOF and we are trying to determine if there's any additional concatenated GZIP stream header in the stream. This addresses the performance regression as well as lets us retain the changes that we had done in JDK-7036144 to prevent the usage of `InputStream.available()`.
>
> The reporter of this performance regression provided a JMH benchmark. I've run it against JDK 22 (the version that doesn't contain the JDK-7036144 change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) and with this change proposed in this PR. I can see that with this change, the performance numbers improve and go back to the JDK 22 levels. I haven't included that JMH benchmark in this PR and am looking for inputs on whether we should include it.
>
> No new regression test has been in...
Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:
use -1 to represent absence of a GZIP header, from readHeader() method
-------------
Changes:
- all: https://git.openjdk.org/jdk/pull/29092/files
- new: https://git.openjdk.org/jdk/pull/29092/files/9c2cf68b..06e7b370
Webrevs:
- full: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=01
- incr: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=00-01
Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod
Patch: https://git.openjdk.org/jdk/pull/29092.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/29092/head:pull/29092
PR: https://git.openjdk.org/jdk/pull/29092
More information about the core-libs-dev
mailing list