RFR(S): 8242848: Improve performance of InflaterOutputStream.write()
Volker Simonis
volker.simonis at gmail.com
Wed Apr 15 17:48:03 UTC 2020
Hi,
can I please have a review for the following simple performance enhancement:
http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848/
https://bugs.openjdk.java.net/browse/JDK-8242864
InflaterOutputStream has an internal buffer which is used for the
inflated data. This buffer can be configured to a custom size (the
default is 512 bytes) by using the specialized
"InflaterOutputStream(OutputStream out, Inflater infl, int bufLen)"
constructor. A bigger buffer, results in fewer native calls to
"write()" on the associated OutputStream and better overall
performance.
Unfortunately "InflaterOutputStream.write(byte[] b, int off, int len)"
unnecessarily chunks its own byte input buffer "b" into pieces of
maximum 512 bytes before feeding them to the underlying Inflater. This
unnecessarily increases the number of native calls to
Inflater.inflate() and limits the benefit of the configurable internal
buffer to a size of ~(512 * compression-ratio) bytes.
Instead, we could easily pass the complete input buffer "b" as input
to the underlying Inflater. This simplifies the code and results in up
to ~25% performance improvements for bigger internal buffers (see the
JMH benchmark attached to the bug).
Regarding the implementation, I initially wanted to completely remove
the "for (;;)" loop after removing the chunking of the input buffer.
But I think it is still necessary, because an InflaterOutputStream can
be created with a custom Inflater which already has some input data.
In that case, the Inflater will first consume its initial input data
in the first "for" loop iteration while "write()"s input buffer "b"
will only be consumed in the second "for" loop iteration.
While doing this change, I've also realized that all the streams in
java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
internal byte buffer of 512 bytes by default. Looking at the benchmark
attached to JDK-8242864, I think that increasing this default to a
bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
read and write operations on these streams when they are created with
the default buffer size. I think the value "512" is a legacy of old
times when memory was more precious :) so I've opened "JDK-8242864:
Increase the default, internal buffer size of the Streams in
java.util.zip" to track that as as separate issue. Do you think it
makes sense to increase that default value?
Thank you and best regards,
Volker
PS: do you think it makes sense to contribute the benchmark attached
to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
[1] http://openjdk.java.net/projects/code-tools/jmh-jdk-microbenchmarks/
More information about the core-libs-dev
mailing list