RFR: 8374644: Regression in GZIPInputStream performance after JDK-7036144 [v2]

Lance Andersen lancea at openjdk.org
Wed Jan 7 15:47:53 UTC 2026


On Wed, 7 Jan 2026 14:08:02 GMT, Jaikiran Pai <jpai at openjdk.org> wrote:

>> Can I please get a review of this change which proposes to address a performance regression in `java.util.zip.GZIPInputStream`? 
>> 
>> In JDK 23, we updated `GZIPInputStream` to no longer rely on `InputStream.available()` method when determining if a `InputStream` consists of additional (concatenated) GZIP streams. That was done in https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of `InputStream.available()` with explicit call(s) to `InputStream.read()` to detect and read any additional concatenated GZIP stream header.
>> 
>> Given how old that existing code in `GZIPInputStream` was, efforts were made to ensure that it wouldn't break existing applications, for the common use case. A release note too was added in an effort to highlight this change.
>> 
>> Unfortunately, when reviewing and experimenting with this change I didn't realize that when we try to detect additional bytes in the stream which has already reached EOF, the `EOFException` that gets raised by this code could contribute to a performance degradation. The code does handle this exception appropriately, but it does mean that in the common case of a stream formed of just one GZIP stream, we would now end up constructing an `EOFException` in this code path.
>> 
>> Construction of `Throwable` instances involves filling the stacktrace into that instance and is a relatively expensive operation. As a result, we ended up regressing the performance of this code path for the common use case of parsing a non-concatenated GZIP stream.
>> 
>> The commit in this PR addresses this issue by introducing an alternate code path in the (private) `readHeader()` method to prevent an `EOFException` from being generated when the stream has already reached EOF and we are trying to determine if there's any additional concatenated GZIP stream header in the stream. This addresses the performance regression as well as lets us retain the changes that we had done in JDK-7036144 to prevent the usage of `InputStream.available()`.
>> 
>> The reporter of this performance regression provided a JMH benchmark. I've run it against JDK 22 (the version that doesn't contain the JDK-7036144 change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) and with this change proposed in this PR. I can see that with this change, the performance numbers improve and go back to the JDK 22 levels. I haven't included that JMH benchmark in this PR and am looking for inputs on whether we should include it.
>> 
>> No n...
>
> Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision:
> 
>   use -1 to represent absence of a GZIP header, from readHeader() method

Hi Jai,

Thank you for tackling.  Overall I think the changes make sense.

Additional comments that might make things a bit clearer for consideration

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 82:

> 80:         usesDefaultInflater = true;
> 81:         try {
> 82:             readHeader(in, true);

Any consideration given to using an enum or constant to make it a bit clearer on what the true represents realizing there is a comment in the readHeader method.  Not a big deal either way

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 219:

> 217:         } else {
> 218:             magic = readUShort(in);
> 219:         }

Understand what you are doing but perhaps consider adding more clarity to the comments as you are now  cobbling together the int which should contain the GZIP header magic number

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 260:

> 258:         }
> 259:         crc.reset();
> 260:         assert n > 0 : "incorrect number of header bytes: " + n;

No problem with this but wondering if we should add this later and look at other places of interest

-------------

Marked as reviewed by lancea (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/29092#pullrequestreview-3635387991
PR Review Comment: https://git.openjdk.org/jdk/pull/29092#discussion_r2668878961
PR Review Comment: https://git.openjdk.org/jdk/pull/29092#discussion_r2668979389
PR Review Comment: https://git.openjdk.org/jdk/pull/29092#discussion_r2668951609


More information about the core-libs-dev mailing list