RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Thu Apr 23 12:27:21 UTC 2020

On Thu, Apr 23, 2020 at 1:22 PM Lance Andersen <lance.andersen at oracle.com>
wrote:

> Hi Volker
>
> On Apr 22, 2020, at 4:08 PM, Volker Simonis <volker.simonis at gmail.com>
> wrote:
>
> On Tue, Apr 21, 2020 at 5:23 PM Lance Andersen
> <lance.andersen at oracle.com> wrote:
>
>
> Hi Volker,
>
> I think overall this looks OK.  I went through the older SCCS histories to
> see if I could figure out why they were using 512 for the input length but
> could not find anything that might shed some light for me.
>
>
> Hi Lance,
>
> thanks a lot for digging in the old sources to review my change. It's
> great that we stil have people who can use SCCS :)
>
>
> :-)
>
>
> I am not sure you can guarantee that src.zip exists but others might be
> able to weigh in here.  What we have been trying to do going forward is to
> have the tests create  the zip files that it needs.  In some cases, we have
> created a byte array within the test which represents the zip and just
> write it out before the test begins.
>
>
> Yes, the dependency on an external file was not nice, so I rewrote the
> benchmark to programmatically create a file which can be compressed by
> a factor of ~6:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.02/
>
> Notice that this new version only changes the microbenchmark, all the
> other files are untouched.
>
> As everybody seemed to be happy with the change itself and the
> regression test, I'm now waiting for your and Clae's final review of
> the microbenchmark before pushing. Please let me know hat you think?
>
>
> I think you are good to go from my POV.
>

Thanks a lot for your help Lance.
I've pushed the change now.

Best regards,
Volker

> Thank you for your contribution to improving Zip performance.
>
> Best,
> Lance
>
>
> Best regards,
> Volker
>
> I am hoping others with more history might also chime in case they are
> aware of the history here.
>
> Thank you for helping improve the performance.
>
> Best
> Lance
>
> On Apr 17, 2020, at 6:49 AM, Volker Simonis <volker.simonis at gmail.com>
> wrote:
>
> Thanks everybody for looking at this change!
>
> Please find an updated webrev (with a new test and micro-benchmark)
> and my answers to your comments below:
>
> http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.01/
>
> On Thu, Apr 16, 2020 at 6:40 AM Vyom Tiwari <vyommani at gmail.com> wrote:
>
> Thanks for doing this, i think the below code change is not required .
>
> Please do let me know if i am not reading it correctly.
>
> if (inf.finished() || (len == 0)/* no more input */) {
>
> If you check the native code "inf.finished() will be true only of the
> corresponding native call inflate(strm, Z_PARTIAL_FLUSH) returns
> "Z_STREAM_END".
>
> After your code change  write may return even if finished() is not true.
>
>
> Yes, that's true, but that's what we must do if there's no more input
> available. Before my change this break on "len < 1" was in the "if
> (inf.needsInput())" branch.
>
> On Thu, Apr 16, 2020 at 8:22 AM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
> 252                 // Check the decompressor
> 253                 if (inf.finished() || (len == 0)/* no more input */) {
> 254                     break;
> 255                 }
>
> Not sure but I think this is wrong because now you bypass the followup
> handling of inf.needsDirectory.
>
> Inflater.inflate() returns 0 if either needsInput or needsDirectory. We
> have to ignore needsInput since we have no input anymore, but
> needsDirectory has to be handled, no?
>
>
> You're absolutely right Thomas. Thanks for catching this! I've moved
> the check for "needsDictionary" in front of the "finished() || len ==
> 0" check.
>
> Unfortunately there is not very good test coverage for zip with preset
> dictionaries (jtreg and submit repo passed without problems). So I
> added a new test for this use case to "
> test/jdk/java/util/zip/DeflateIn_InflateOut.java".
>
> On Thu, Apr 16, 2020 at 8:48 AM Thomas Stüfe <thomas.stuefe at gmail.com>
> wrote:
>
>
> As for increasing the buffer size, it makes sense but IMHO needs a CSR and
> a release note.
>
>
> I don't think so. This is an internal, implementation specific setting
> which has never been exposed or documented before so I don't see why
> we should document it now. But let's discuss this separately in the
> corresponding JBS issue (see below).
>
> On Thu, Apr 16, 2020 at 1:18 PM Claes Redestad
> <claes.redestad at oracle.com> wrote:
>
>
> Hi Volker,
>
> On 2020-04-15 19:48, Volker Simonis wrote:
>
> While doing this change, I've also realized that all the streams in
> java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
> GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
> internal byte buffer of 512 bytes by default. Looking at the benchmark
> attached to JDK-8242864, I think that increasing this default to a
> bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
> read and write operations on these streams when they are created with
> the default buffer size. I think the value "512" is a legacy of old
> times when memory was more precious:)  so  I've opened "JDK-8242864:
> Increase the default, internal buffer size of the Streams in
> java.util.zip" to track that as as separate issue. Do you think it
> makes sense to increase that default value?
>
>
> Seems reasonable. 8192 seems to be the buffer size we've been converging
> on elsewhere (see InputStream, BufferedInputStream, Files, ..). I also
>
>
> That seems reasonable. Alan commented on the JBS issue so we can
> continue the discussion there.
>
> found an instance of 8096, which is either a typo or a clever
> optimization to keep the array + array object header fit snugly within
> 8Kb. You chose. :-)
>
>
> I like how you try to be positive :)
>
>
> Thank you and best regards,
> Volker
>
> PS: do you think it makes sense to contribute the benchmark attached
> to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
>
> [1]http://openjdk.java.net/projects/code-tools/jmh-jdk-microbenchmarks/
>
>
> I'd definitely welcome the micro as part of the patch under
> test/micro/org/openjdk/bench/java/util/zip - additionally contributing
>
>
> I knew that "jmh-jdk-microbenchmarks" has been copied to the jdk repo
> but somehow I did found "jmh-jdk-microbenchmarks" before looking in
> the obvious place :)
>
> So I've added the benchmark to the patch now. There's one thing I'm
> not sure however. The benchmark requires a "big" (several 100k) with
> good compression ratio (e.g. a large text file). I've decided to use a
> big Java source file from "src.zip" but I'm not sure if "src.zip" is
> always available in the jdk images which are used to run the
> microbenchmarks. Do you think the test it is fine this way or do you
> have a better idea? I saw that "ZipFind" uses "microbenchmarks.jar"
> (i.e. the container of the test itself) but that file is already
> compressed so the compression rate won't be that good.
>
> Another thing I couldn't figure out is a good way to skip the
> benchmark when I realize that I can't load the expected file in the
> "@Setup" method. Do you now anything better than just throwing an
> exception?
>
> Thank you and best regards,
> Volker
>
> to jmh-jdk-microbenchmarks could enable you to test the micro on 8 or
> 11.
>
> /Claes
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen|
> Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com
>
>
>
>