RFR: 8281962: Avoid unnecessary native calls in InflaterInputStream [v2]

Alan Bateman alanb at openjdk.java.net
Mon Feb 21 12:05:50 UTC 2022


On Thu, 17 Feb 2022 16:02:42 GMT, Volker Simonis <simonis at openjdk.org> wrote:

>> Currently, `InflaterInputStream::read()` first does a native call to the underlying zlib `inflate()` function and only afterwards checks if the inflater requires input (i.e. `Inflater::needsInput()`) or has finished inflating (`Inflater::finished()`). This leads to an unnecessary native call to `inflate()` when `InflaterInputStream::read()` is invoked for the very first time because `Inflater::fill()` hasn't been called and another unnecessary native call to detect the end of the stream. For small streams/files which completely fit into the output buffer passed to `InflaterInputStream::read()` we can therefore save two out of three native calls. The attached micro benchmark shows that this results in a 5%-10% performance improvement for zip files of sizes between 4096 to 256 bytes (notice that in common jars like Tomcat, Spring-Boot, Maven, Jackson, etc. about 60-80% of the classes are smaller than 4096 bytes).
>> 
>> 
>> before JDK-8281962
>> ------------------
>> Benchmark                                     (size)  Mode  Cnt  Score   Error  Units
>> InflaterInputStreams.inflaterInputStreamRead     256  avgt    5  2.571 ± 0.120  us/op
>> InflaterInputStreams.inflaterInputStreamRead     512  avgt    5  2.861 ± 0.064  us/op
>> InflaterInputStreams.inflaterInputStreamRead    4096  avgt    5  5.110 ± 0.278  us/op
>> 
>> after JDK-8281962
>> -----------------
>> Benchmark                                     (size)  Mode  Cnt  Score   Error  Units
>> InflaterInputStreams.inflaterInputStreamRead     256  avgt    5  2.332 ± 0.081  us/op
>> InflaterInputStreams.inflaterInputStreamRead     512  avgt    5  2.691 ± 0.293  us/op
>> InflaterInputStreams.inflaterInputStreamRead    4096  avgt    5  4.812 ± 1.038  us/op
>> 
>> 
>> Tested with the JTreg zip/jar/zipfs and the JCK zip/jar tests.
>> 
>> As a side effect, this change also fixes an issue with alternative zlib implementations like zlib-Chromium or zlib-Cloudflare which pad the inflated bytes with a specif byte pattern at the end of the output for debugging purpose. This breaks code patterns like the following:
>> 
>> 
>> int readcount = 0;
>> while ((bytesRead = inflaterInputStream.read(data, 0, bufferSize)) != -1) {
>>     outputStream.write(data, 0, bytesRead);
>>     readCount++;
>> }
>> if (readCount == 1) {
>>     return data;         //  <---- first bytes might be overwritten
>> }
>> return outputStream.toByteArray();
>> 
>> 
>> Even if the whole data fits into the `data` array during the first call to `inflaterInputStream.read()`, we still need a second call to `inflaterInputStream.read()` in order to detect the end of the inflater stream. If this second call calls into the native `inflate()` function of Cloudflare/Chromium's zlib version, this will still write some padding bytes at the beginning of the `data` array and overwrite the previously read data. This issue has been reported in Spring [1] and ASM [2] when using these libraries with Cloudflares zlib version (by setting `LD_LIBRARY_PATH` accordingly).
>> 
>> [1] https://github.com/spring-projects/spring-framework/issues/27429
>> [2] https://gitlab.ow2.org/asm/asm/-/issues/317955
>
> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Changed hardcoded constant to JMH parmater and removed non-ASCII chars from comments

I think this change is okay. We also have to be super cautious when touching InflaterXXX and friends but I think one is okay.

-------------

Marked as reviewed by alanb (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/7492


More information about the core-libs-dev mailing list