RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Volker Simonis volker.simonis at gmail.com
Thu Apr 23 12:28:30 UTC 2020


On Thu, Apr 23, 2020 at 1:42 PM Laurent Bourgès
<bourges.laurent at gmail.com> wrote:
>
> Thank you Volker for sharing your results.
>
> It is amazing, to improve such widely used feature (zip -> jar, http compression...)
>

Watch out, there's mor eto come :)

> Congratulations.

Thank..

> Laurent
>
> Le jeu. 23 avr. 2020 à 12:19, Volker Simonis <volker.simonis at gmail.com> a écrit :
>>
>> On Thu, Apr 23, 2020 at 10:56 AM Laurent Bourgès
>> <bourges.laurent at gmail.com> wrote:
>> >
>> > Hi Volker,
>> >
>> > Could you give some benchmark results in the jbs bug to have an idea of the performance gain ?
>>
>> The results of a benchmark run are at the end of the microbenchmark
>> which is attached to https://bugs.openjdk.java.net/browse/JDK-8242848
>>
>> I've attached them here for your convenience. The first run is before,
>> the second run after applying this patch.
>>
>> E.g. for a user supplied buffer of size 16384, the time for inflation
>> decreases from "2.603 ± 0.404  ms/op" to "2.187 ± 0.126  ms/op" which
>> is an improvement of 16%. Obviously, increasing the default internal
>> buffer size from 512 to 16384 (which I want to do in
>> https://bugs.openjdk.java.net/browse/JDK-8242864) will get you even
>> bigger speedups if you use the default buffer size :)
>>
>> /output/jdk-opt/images/jdk/bin/java
>> ----------------------------------------------------------------------
>> Benchmark                        (scale)  (size)  Mode  Cnt  Score
>> Error  Units
>> InflaterOutputStreamWrite.write        1     512  avgt    3  5.378 ±
>> 0.301  ms/op
>> InflaterOutputStreamWrite.write        1    1024  avgt    3  3.917 ±
>> 0.165  ms/op
>> InflaterOutputStreamWrite.write        1    2048  avgt    3  3.158 ±
>> 0.097  ms/op
>> InflaterOutputStreamWrite.write        1    4096  avgt    3  2.707 ±
>> 0.138  ms/op
>> InflaterOutputStreamWrite.write        1    8192  avgt    3  2.600 ±
>> 0.399  ms/op
>> InflaterOutputStreamWrite.write        1   16384  avgt    3  2.603 ±
>> 0.404  ms/op
>> InflaterOutputStreamWrite.write        1   32768  avgt    3  2.622 ±
>> 0.211  ms/op
>> InflaterOutputStreamWrite.write        1   65536  avgt    3  2.605 ±
>> 0.170  ms/op
>> InflaterOutputStreamWrite.write        2     512  avgt    3  4.015 ±
>> 0.150  ms/op
>> InflaterOutputStreamWrite.write        2    1024  avgt    3  3.225 ±
>> 0.178  ms/op
>> InflaterOutputStreamWrite.write        2    2048  avgt    3  2.745 ±
>> 0.261  ms/op
>> InflaterOutputStreamWrite.write        2    4096  avgt    3  2.614 ±
>> 0.542  ms/op
>> InflaterOutputStreamWrite.write        2    8192  avgt    3  2.593 ±
>> 0.206  ms/op
>> InflaterOutputStreamWrite.write        2   16384  avgt    3  2.606 ±
>> 0.055  ms/op
>> InflaterOutputStreamWrite.write        2   32768  avgt    3  2.611 ±
>> 0.116  ms/op
>> InflaterOutputStreamWrite.write        2   65536  avgt    3  2.617 ±
>> 0.170  ms/op
>> InflaterOutputStreamWrite.write        4     512  avgt    3  3.376 ±
>> 0.599  ms/op
>> InflaterOutputStreamWrite.write        4    1024  avgt    3  2.840 ±
>> 0.155  ms/op
>> InflaterOutputStreamWrite.write        4    2048  avgt    3  2.633 ±
>> 0.550  ms/op
>> InflaterOutputStreamWrite.write        4    4096  avgt    3  2.598 ±
>> 0.166  ms/op
>> InflaterOutputStreamWrite.write        4    8192  avgt    3  2.602 ±
>> 0.054  ms/op
>> InflaterOutputStreamWrite.write        4   16384  avgt    3  2.601 ±
>> 0.039  ms/op
>> InflaterOutputStreamWrite.write        4   32768  avgt    3  2.639 ±
>> 0.020  ms/op
>> InflaterOutputStreamWrite.write        4   65536  avgt    3  2.619 ±
>> 0.260  ms/op
>> InflaterOutputStreamWrite.write        8     512  avgt    3  2.882 ±
>> 0.149  ms/op
>> InflaterOutputStreamWrite.write        8    1024  avgt    3  2.695 ±
>> 0.586  ms/op
>> InflaterOutputStreamWrite.write        8    2048  avgt    3  2.644 ±
>> 0.472  ms/op
>> InflaterOutputStreamWrite.write        8    4096  avgt    3  2.616 ±
>> 0.052  ms/op
>> InflaterOutputStreamWrite.write        8    8192  avgt    3  2.616 ±
>> 0.063  ms/op
>> InflaterOutputStreamWrite.write        8   16384  avgt    3  2.611 ±
>> 0.090  ms/op
>> InflaterOutputStreamWrite.write        8   32768  avgt    3  2.633 ±
>> 0.216  ms/op
>> InflaterOutputStreamWrite.write        8   65536  avgt    3  2.634 ±
>> 0.180  ms/op
>>
>> /priv/simonisv/output/jdk-opt-JDK-8242848/images/jdk/bin/java
>> ----------------------------------------------------------------------
>> Benchmark                        (scale)  (size)  Mode  Cnt  Score
>> Error  Units
>> InflaterOutputStreamWrite.write        1     512  avgt    3  5.388 ±
>> 0.349  ms/op
>> InflaterOutputStreamWrite.write        1    1024  avgt    3  3.799 ±
>> 0.093  ms/op
>> InflaterOutputStreamWrite.write        1    2048  avgt    3  2.994 ±
>> 0.023  ms/op
>> InflaterOutputStreamWrite.write        1    4096  avgt    3  2.583 ±
>> 0.159  ms/op
>> InflaterOutputStreamWrite.write        1    8192  avgt    3  2.325 ±
>> 0.345  ms/op
>> InflaterOutputStreamWrite.write        1   16384  avgt    3  2.187 ±
>> 0.126  ms/op
>> InflaterOutputStreamWrite.write        1   32768  avgt    3  2.073 ±
>> 0.083  ms/op
>> InflaterOutputStreamWrite.write        1   65536  avgt    3  2.007 ±
>> 0.153  ms/op
>> InflaterOutputStreamWrite.write        2     512  avgt    3  3.996 ±
>> 0.037  ms/op
>> InflaterOutputStreamWrite.write        2    1024  avgt    3  3.089 ±
>> 0.023  ms/op
>> InflaterOutputStreamWrite.write        2    2048  avgt    3  2.628 ±
>> 0.073  ms/op
>> InflaterOutputStreamWrite.write        2    4096  avgt    3  2.356 ±
>> 0.344  ms/op
>> InflaterOutputStreamWrite.write        2    8192  avgt    3  2.202 ±
>> 0.055  ms/op
>> InflaterOutputStreamWrite.write        2   16384  avgt    3  2.081 ±
>> 0.033  ms/op
>> InflaterOutputStreamWrite.write        2   32768  avgt    3  2.015 ±
>> 0.169  ms/op
>> InflaterOutputStreamWrite.write        2   65536  avgt    3  1.985 ±
>> 0.196  ms/op
>> InflaterOutputStreamWrite.write        4     512  avgt    3  3.325 ±
>> 0.920  ms/op
>> InflaterOutputStreamWrite.write        4    1024  avgt    3  2.740 ±
>> 0.156  ms/op
>> InflaterOutputStreamWrite.write        4    2048  avgt    3  2.415 ±
>> 0.370  ms/op
>> InflaterOutputStreamWrite.write        4    4096  avgt    3  2.250 ±
>> 0.012  ms/op
>> InflaterOutputStreamWrite.write        4    8192  avgt    3  2.115 ±
>> 0.085  ms/op
>> InflaterOutputStreamWrite.write        4   16384  avgt    3  2.042 ±
>> 0.099  ms/op
>> InflaterOutputStreamWrite.write        4   32768  avgt    3  1.988 ±
>> 0.185  ms/op
>> InflaterOutputStreamWrite.write        4   65536  avgt    3  1.975 ±
>> 0.171  ms/op
>> InflaterOutputStreamWrite.write        8     512  avgt    3  2.870 ±
>> 0.035  ms/op
>> InflaterOutputStreamWrite.write        8    1024  avgt    3  2.495 ±
>> 0.334  ms/op
>> InflaterOutputStreamWrite.write        8    2048  avgt    3  2.280 ±
>> 0.056  ms/op
>> InflaterOutputStreamWrite.write        8    4096  avgt    3  2.155 ±
>> 0.073  ms/op
>> InflaterOutputStreamWrite.write        8    8192  avgt    3  2.046 ±
>> 0.079  ms/op
>> InflaterOutputStreamWrite.write        8   16384  avgt    3  1.995 ±
>> 0.098  ms/op
>> InflaterOutputStreamWrite.write        8   32768  avgt    3  1.979 ±
>> 0.119  ms/op
>> InflaterOutputStreamWrite.write        8   65536  avgt    3  1.986 ±
>> 0.155  ms/op
>>
>> >
>> > Thanks,
>> > Laurent
>> >
>> > Le jeu. 23 avr. 2020 à 10:20, Langer, Christoph <christoph.langer at sap.com> a écrit :
>> >>
>> >> Hi Volker,
>> >>
>> >> since it's not yet pushed, I also went over the change and I like it. +1
>> >>
>> >> One little style nit caught my eye, which you could correct before pushing: The style of the if/else blocks in test/jdk/java/util/zip/DeflateIn_InflateOut.java in lines 48/49 and 59/60 does not match the other if/else blocks in the file. You should probably have the else on the line of the closing bracket before...
>> >>
>> >> Thanks
>> >> Christoph
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: core-libs-dev <core-libs-dev-bounces at openjdk.java.net> On Behalf
>> >> > Of Volker Simonis
>> >> > Sent: Mittwoch, 22. April 2020 22:09
>> >> > To: Lance Andersen <lance.andersen at oracle.com>
>> >> > Cc: Java Core Libs <core-libs-dev at openjdk.java.net>; Vyom Tewari
>> >> > <vyom.tewari at oracle.com>
>> >> > Subject: Re: RFR(S): 8242848: Improve performance of
>> >> > InflaterOutputStream.write()
>> >> >
>> >> > On Tue, Apr 21, 2020 at 5:23 PM Lance Andersen
>> >> > <lance.andersen at oracle.com> wrote:
>> >> > >
>> >> > > Hi Volker,
>> >> > >
>> >> > > I think overall this looks OK.  I went through the older SCCS histories to see
>> >> > if I could figure out why they were using 512 for the input length but could
>> >> > not find anything that might shed some light for me.
>> >> > >
>> >> >
>> >> > Hi Lance,
>> >> >
>> >> > thanks a lot for digging in the old sources to review my change. It's
>> >> > great that we stil have people who can use SCCS :)
>> >> >
>> >> > > I am not sure you can guarantee that src.zip exists but others might be able
>> >> > to weigh in here.  What we have been trying to do going forward is to have
>> >> > the tests create  the zip files that it needs.  In some cases, we have created a
>> >> > byte array within the test which represents the zip and just write it out
>> >> > before the test begins.
>> >> > >
>> >> >
>> >> > Yes, the dependency on an external file was not nice, so I rewrote the
>> >> > benchmark to programmatically create a file which can be compressed by
>> >> > a factor of ~6:
>> >> >
>> >> > http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.02/
>> >> >
>> >> > Notice that this new version only changes the microbenchmark, all the
>> >> > other files are untouched.
>> >> >
>> >> > As everybody seemed to be happy with the change itself and the
>> >> > regression test, I'm now waiting for your and Clae's final review of
>> >> > the microbenchmark before pushing. Please let me know hat you think?
>> >> >
>> >> > Best regards,
>> >> > Volker
>> >> >
>> >> > > I am hoping others with more history might also chime in case they are
>> >> > aware of the history here.
>> >> > >
>> >> > > Thank you for helping improve the performance.
>> >> > >
>> >> > > Best
>> >> > > Lance
>> >> > >
>> >> > > On Apr 17, 2020, at 6:49 AM, Volker Simonis <volker.simonis at gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > Thanks everybody for looking at this change!
>> >> > >
>> >> > > Please find an updated webrev (with a new test and micro-benchmark)
>> >> > > and my answers to your comments below:
>> >> > >
>> >> > > http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.01/
>> >> > >
>> >> > > On Thu, Apr 16, 2020 at 6:40 AM Vyom Tiwari <vyommani at gmail.com>
>> >> > wrote:
>> >> > >
>> >> > > Thanks for doing this, i think the below code change is not required .
>> >> > >
>> >> > > Please do let me know if i am not reading it correctly.
>> >> > >
>> >> > > if (inf.finished() || (len == 0)/* no more input */) {
>> >> > >
>> >> > > If you check the native code "inf.finished() will be true only of the
>> >> > corresponding native call inflate(strm, Z_PARTIAL_FLUSH) returns
>> >> > "Z_STREAM_END".
>> >> > >
>> >> > > After your code change  write may return even if finished() is not true.
>> >> > >
>> >> > >
>> >> > > Yes, that's true, but that's what we must do if there's no more input
>> >> > > available. Before my change this break on "len < 1" was in the "if
>> >> > > (inf.needsInput())" branch.
>> >> > >
>> >> > > On Thu, Apr 16, 2020 at 8:22 AM Thomas Stüfe
>> >> > <thomas.stuefe at gmail.com> wrote:
>> >> > >
>> >> > > 252                 // Check the decompressor
>> >> > > 253                 if (inf.finished() || (len == 0)/* no more input */) {
>> >> > > 254                     break;
>> >> > > 255                 }
>> >> > >
>> >> > > Not sure but I think this is wrong because now you bypass the followup
>> >> > handling of inf.needsDirectory.
>> >> > >
>> >> > > Inflater.inflate() returns 0 if either needsInput or needsDirectory. We have
>> >> > to ignore needsInput since we have no input anymore, but needsDirectory
>> >> > has to be handled, no?
>> >> > >
>> >> > >
>> >> > > You're absolutely right Thomas. Thanks for catching this! I've moved
>> >> > > the check for "needsDictionary" in front of the "finished() || len ==
>> >> > > 0" check.
>> >> > >
>> >> > > Unfortunately there is not very good test coverage for zip with preset
>> >> > > dictionaries (jtreg and submit repo passed without problems). So I
>> >> > > added a new test for this use case to "
>> >> > > test/jdk/java/util/zip/DeflateIn_InflateOut.java".
>> >> > >
>> >> > > On Thu, Apr 16, 2020 at 8:48 AM Thomas Stüfe
>> >> > <thomas.stuefe at gmail.com> wrote:
>> >> > >
>> >> > >
>> >> > > As for increasing the buffer size, it makes sense but IMHO needs a CSR and
>> >> > a release note.
>> >> > >
>> >> > >
>> >> > > I don't think so. This is an internal, implementation specific setting
>> >> > > which has never been exposed or documented before so I don't see why
>> >> > > we should document it now. But let's discuss this separately in the
>> >> > > corresponding JBS issue (see below).
>> >> > >
>> >> > > On Thu, Apr 16, 2020 at 1:18 PM Claes Redestad
>> >> > > <claes.redestad at oracle.com> wrote:
>> >> > >
>> >> > >
>> >> > > Hi Volker,
>> >> > >
>> >> > > On 2020-04-15 19:48, Volker Simonis wrote:
>> >> > >
>> >> > > While doing this change, I've also realized that all the streams in
>> >> > > java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
>> >> > > GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
>> >> > > internal byte buffer of 512 bytes by default. Looking at the benchmark
>> >> > > attached to JDK-8242864, I think that increasing this default to a
>> >> > > bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
>> >> > > read and write operations on these streams when they are created with
>> >> > > the default buffer size. I think the value "512" is a legacy of old
>> >> > > times when memory was more precious:)  so  I've opened "JDK-8242864:
>> >> > > Increase the default, internal buffer size of the Streams in
>> >> > > java.util.zip" to track that as as separate issue. Do you think it
>> >> > > makes sense to increase that default value?
>> >> > >
>> >> > >
>> >> > > Seems reasonable. 8192 seems to be the buffer size we've been
>> >> > converging
>> >> > > on elsewhere (see InputStream, BufferedInputStream, Files, ..). I also
>> >> > >
>> >> > >
>> >> > > That seems reasonable. Alan commented on the JBS issue so we can
>> >> > > continue the discussion there.
>> >> > >
>> >> > > found an instance of 8096, which is either a typo or a clever
>> >> > > optimization to keep the array + array object header fit snugly within
>> >> > > 8Kb. You chose. :-)
>> >> > >
>> >> > >
>> >> > > I like how you try to be positive :)
>> >> > >
>> >> > >
>> >> > > Thank you and best regards,
>> >> > > Volker
>> >> > >
>> >> > > PS: do you think it makes sense to contribute the benchmark attached
>> >> > > to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
>> >> > >
>> >> > > [1]http://openjdk.java.net/projects/code-tools/jmh-jdk-
>> >> > microbenchmarks/
>> >> > >
>> >> > >
>> >> > > I'd definitely welcome the micro as part of the patch under
>> >> > > test/micro/org/openjdk/bench/java/util/zip - additionally contributing
>> >> > >
>> >> > >
>> >> > > I knew that "jmh-jdk-microbenchmarks" has been copied to the jdk repo
>> >> > > but somehow I did found "jmh-jdk-microbenchmarks" before looking in
>> >> > > the obvious place :)
>> >> > >
>> >> > > So I've added the benchmark to the patch now. There's one thing I'm
>> >> > > not sure however. The benchmark requires a "big" (several 100k) with
>> >> > > good compression ratio (e.g. a large text file). I've decided to use a
>> >> > > big Java source file from "src.zip" but I'm not sure if "src.zip" is
>> >> > > always available in the jdk images which are used to run the
>> >> > > microbenchmarks. Do you think the test it is fine this way or do you
>> >> > > have a better idea? I saw that "ZipFind" uses "microbenchmarks.jar"
>> >> > > (i.e. the container of the test itself) but that file is already
>> >> > > compressed so the compression rate won't be that good.
>> >> > >
>> >> > > Another thing I couldn't figure out is a good way to skip the
>> >> > > benchmark when I realize that I can't load the expected file in the
>> >> > > "@Setup" method. Do you now anything better than just throwing an
>> >> > > exception?
>> >> > >
>> >> > > Thank you and best regards,
>> >> > > Volker
>> >> > >
>> >> > > to jmh-jdk-microbenchmarks could enable you to test the micro on 8 or
>> >> > > 11.
>> >> > >
>> >> > > /Claes
>> >> > >
>> >> > >
>> >> > >
>> >> > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> >> > > Oracle Java Engineering
>> >> > > 1 Network Drive
>> >> > > Burlington, MA 01803
>> >> > > Lance.Andersen at oracle.com
>> >> > >
>> >> > >
>> >> > >


More information about the core-libs-dev mailing list