RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Wed Apr 22 20:08:49 UTC 2020

On Tue, Apr 21, 2020 at 5:23 PM Lance Andersen
<lance.andersen at oracle.com> wrote:
>
> Hi Volker,
>
> I think overall this looks OK.  I went through the older SCCS histories to see if I could figure out why they were using 512 for the input length but could not find anything that might shed some light for me.
>

Hi Lance,

thanks a lot for digging in the old sources to review my change. It's
great that we stil have people who can use SCCS :)

> I am not sure you can guarantee that src.zip exists but others might be able to weigh in here.  What we have been trying to do going forward is to have the tests create  the zip files that it needs.  In some cases, we have created a byte array within the test which represents the zip and just write it out before the test begins.
>

Yes, the dependency on an external file was not nice, so I rewrote the
benchmark to programmatically create a file which can be compressed by
a factor of ~6:

http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.02/

Notice that this new version only changes the microbenchmark, all the
other files are untouched.

As everybody seemed to be happy with the change itself and the
regression test, I'm now waiting for your and Clae's final review of
the microbenchmark before pushing. Please let me know hat you think?

Best regards,
Volker

> I am hoping others with more history might also chime in case they are aware of the history here.
>
> Thank you for helping improve the performance.
>
> Best
> Lance
>
> On Apr 17, 2020, at 6:49 AM, Volker Simonis <volker.simonis at gmail.com> wrote:
>
> Thanks everybody for looking at this change!
>
> Please find an updated webrev (with a new test and micro-benchmark)
> and my answers to your comments below:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.01/
>
> On Thu, Apr 16, 2020 at 6:40 AM Vyom Tiwari <vyommani at gmail.com> wrote:
>
> Thanks for doing this, i think the below code change is not required .
>
> Please do let me know if i am not reading it correctly.
>
> if (inf.finished() || (len == 0)/* no more input */) {
>
> If you check the native code "inf.finished() will be true only of the corresponding native call inflate(strm, Z_PARTIAL_FLUSH) returns "Z_STREAM_END".
>
> After your code change  write may return even if finished() is not true.
>
>
> Yes, that's true, but that's what we must do if there's no more input
> available. Before my change this break on "len < 1" was in the "if
> (inf.needsInput())" branch.
>
> On Thu, Apr 16, 2020 at 8:22 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>
> 252                 // Check the decompressor
> 253                 if (inf.finished() || (len == 0)/* no more input */) {
> 254                     break;
> 255                 }
>
> Not sure but I think this is wrong because now you bypass the followup handling of inf.needsDirectory.
>
> Inflater.inflate() returns 0 if either needsInput or needsDirectory. We have to ignore needsInput since we have no input anymore, but needsDirectory has to be handled, no?
>
>
> You're absolutely right Thomas. Thanks for catching this! I've moved
> the check for "needsDictionary" in front of the "finished() || len ==
> 0" check.
>
> Unfortunately there is not very good test coverage for zip with preset
> dictionaries (jtreg and submit repo passed without problems). So I
> added a new test for this use case to "
> test/jdk/java/util/zip/DeflateIn_InflateOut.java".
>
> On Thu, Apr 16, 2020 at 8:48 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>
>
> As for increasing the buffer size, it makes sense but IMHO needs a CSR and a release note.
>
>
> I don't think so. This is an internal, implementation specific setting
> which has never been exposed or documented before so I don't see why
> we should document it now. But let's discuss this separately in the
> corresponding JBS issue (see below).
>
> On Thu, Apr 16, 2020 at 1:18 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
>
>
> Hi Volker,
>
> On 2020-04-15 19:48, Volker Simonis wrote:
>
> While doing this change, I've also realized that all the streams in
> java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
> GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
> internal byte buffer of 512 bytes by default. Looking at the benchmark
> attached to JDK-8242864, I think that increasing this default to a
> bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
> read and write operations on these streams when they are created with
> the default buffer size. I think the value "512" is a legacy of old
> times when memory was more precious:)  so  I've opened "JDK-8242864:
> Increase the default, internal buffer size of the Streams in
> java.util.zip" to track that as as separate issue. Do you think it
> makes sense to increase that default value?
>
>
> Seems reasonable. 8192 seems to be the buffer size we've been converging
> on elsewhere (see InputStream, BufferedInputStream, Files, ..). I also
>
>
> That seems reasonable. Alan commented on the JBS issue so we can
> continue the discussion there.
>
> found an instance of 8096, which is either a typo or a clever
> optimization to keep the array + array object header fit snugly within
> 8Kb. You chose. :-)
>
>
> I like how you try to be positive :)
>
>
> Thank you and best regards,
> Volker
>
> PS: do you think it makes sense to contribute the benchmark attached
> to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
>
> [1]http://openjdk.java.net/projects/code-tools/jmh-jdk-microbenchmarks/
>
>
> I'd definitely welcome the micro as part of the patch under
> test/micro/org/openjdk/bench/java/util/zip - additionally contributing
>
>
> I knew that "jmh-jdk-microbenchmarks" has been copied to the jdk repo
> but somehow I did found "jmh-jdk-microbenchmarks" before looking in
> the obvious place :)
>
> So I've added the benchmark to the patch now. There's one thing I'm
> not sure however. The benchmark requires a "big" (several 100k) with
> good compression ratio (e.g. a large text file). I've decided to use a
> big Java source file from "src.zip" but I'm not sure if "src.zip" is
> always available in the jdk images which are used to run the
> microbenchmarks. Do you think the test it is fine this way or do you
> have a better idea? I saw that "ZipFind" uses "microbenchmarks.jar"
> (i.e. the container of the test itself) but that file is already
> compressed so the compression rate won't be that good.
>
> Another thing I couldn't figure out is a good way to skip the
> benchmark when I realize that I can't load the expected file in the
> "@Setup" method. Do you now anything better than just throwing an
> exception?
>
> Thank you and best regards,
> Volker
>
> to jmh-jdk-microbenchmarks could enable you to test the micro on 8 or
> 11.
>
> /Claes
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>