RFR: 8374644: Regression in GZIPInputStream performance after JDK-7036144
Alan Bateman
alanb at openjdk.org
Wed Jan 7 13:50:10 UTC 2026
On Wed, 7 Jan 2026 13:27:43 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:
> Can I please get a review of this change which proposes to address a performance regression in `java.util.zip.GZIPInputStream`?
>
> In JDK 23, we updated `GZIPInputStream` to no longer rely on `InputStream.available()` method when determining if a `InputStream` consists of additional (concatenated) GZIP streams. That was done in https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of `InputStream.available()` with explicit call(s) to `InputStream.read()` to detect and read any additional concatenated GZIP stream header.
>
> Given how old that existing code in `GZIPInputStream` was, efforts were made to ensure that it wouldn't break existing applications, for the common use case. A release note too was added in an effort to highlight this change.
>
> Unfortunately, when reviewing and experimenting with this change I didn't realize that when we try to detect additional bytes in the stream which has already reached EOF, the `EOFException` that gets raised by this code could contribute to a performance degradation. The code does handle this exception appropriately, but it does mean that in the common case of a stream formed of just one GZIP stream, we would now end up constructing an `EOFException` in this code path.
>
> Construction of `Throwable` instances involves filling the stacktrace into that instance and is a relatively expensive operation. As a result, we ended up regressing the performance of this code path for the common use case of parsing a non-concatenated GZIP stream.
>
> The commit in this PR addresses this issue by introducing an alternate code path in the (private) `readHeader()` method to prevent an `EOFException` from being generated when the stream has already reached EOF and we are trying to determine if there's any additional concatenated GZIP stream header in the stream. This addresses the performance regression as well as lets us retain the changes that we had done in JDK-7036144 to prevent the usage of `InputStream.available()`.
>
> The reporter of this performance regression provided a JMH benchmark. I've run it against JDK 22 (the version that doesn't contain the JDK-7036144 change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) and with this change proposed in this PR. I can see that with this change, the performance numbers improve and go back to the JDK 22 levels. I haven't included that JMH benchmark in this PR and am looking for inputs on whether we should include it.
>
> No new regression test has been in...
src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 195:
> 193: * If failOnEOF is false and if the given InputStream has already
> 194: * reached EOF when this method was invoked, then this method returns
> 195: * 0 (indicating that there's no GZIP member header).
Have you tried using -1 for EOF instead of 0? I think would make it easier to understand at the use sites.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/29092#discussion_r2668528822
More information about the core-libs-dev
mailing list