RFR(S): 8242848: Improve performance of InflaterOutputStream.write()

Fri Apr 17 10:49:49 UTC 2020

Thanks everybody for looking at this change!

Please find an updated webrev (with a new test and micro-benchmark)
and my answers to your comments below:

http://cr.openjdk.java.net/~simonis/webrevs/2020/8242848.01/

On Thu, Apr 16, 2020 at 6:40 AM Vyom Tiwari <vyommani at gmail.com> wrote:
> Thanks for doing this, i think the below code change is not required .
>
> Please do let me know if i am not reading it correctly.
>
> if (inf.finished() || (len == 0)/* no more input */) {
>
> If you check the native code "inf.finished() will be true only of the corresponding native call inflate(strm, Z_PARTIAL_FLUSH) returns "Z_STREAM_END".
>
> After your code change  write may return even if finished() is not true.

Yes, that's true, but that's what we must do if there's no more input
available. Before my change this break on "len < 1" was in the "if
(inf.needsInput())" branch.

On Thu, Apr 16, 2020 at 8:22 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>  252                 // Check the decompressor
>  253                 if (inf.finished() || (len == 0)/* no more input */) {
>  254                     break;
>  255                 }
>
> Not sure but I think this is wrong because now you bypass the followup handling of inf.needsDirectory.
>
> Inflater.inflate() returns 0 if either needsInput or needsDirectory. We have to ignore needsInput since we have no input anymore, but needsDirectory has to be handled, no?

You're absolutely right Thomas. Thanks for catching this! I've moved
the check for "needsDictionary" in front of the "finished() || len ==
0" check.

Unfortunately there is not very good test coverage for zip with preset
dictionaries (jtreg and submit repo passed without problems). So I
added a new test for this use case to "
test/jdk/java/util/zip/DeflateIn_InflateOut.java".

On Thu, Apr 16, 2020 at 8:48 AM Thomas Stüfe <thomas.stuefe at gmail.com> wrote:
>
> As for increasing the buffer size, it makes sense but IMHO needs a CSR and a release note.

I don't think so. This is an internal, implementation specific setting
which has never been exposed or documented before so I don't see why
we should document it now. But let's discuss this separately in the
corresponding JBS issue (see below).

On Thu, Apr 16, 2020 at 1:18 PM Claes Redestad
<claes.redestad at oracle.com> wrote:
>
> Hi Volker,
>
> On 2020-04-15 19:48, Volker Simonis wrote:
> > While doing this change, I've also realized that all the streams in
> > java.util.zip (i.e. DeflaterInputStream, GZIPInputStream,
> > GZIPOutputStream, InflaterInputStream, DeflaterOutputStream) use an
> > internal byte buffer of 512 bytes by default. Looking at the benchmark
> > attached to JDK-8242864, I think that increasing this default to a
> > bigger size (e.g. 4096 bytes) will considerably speed up (up to 50%)
> > read and write operations on these streams when they are created with
> > the default buffer size. I think the value "512" is a legacy of old
> > times when memory was more precious:)  so  I've opened "JDK-8242864:
> > Increase the default, internal buffer size of the Streams in
> > java.util.zip" to track that as as separate issue. Do you think it
> > makes sense to increase that default value?
>
> Seems reasonable. 8192 seems to be the buffer size we've been converging
> on elsewhere (see InputStream, BufferedInputStream, Files, ..). I also

That seems reasonable. Alan commented on the JBS issue so we can
continue the discussion there.

> found an instance of 8096, which is either a typo or a clever
> optimization to keep the array + array object header fit snugly within
> 8Kb. You chose. :-)
>

I like how you try to be positive :)

> >
> > Thank you and best regards,
> > Volker
> >
> > PS: do you think it makes sense to contribute the benchmark attached
> > to JDK-8242864 to the code-tools/mh-jdk-microbenchmarks [1] project?
> >
> > [1]http://openjdk.java.net/projects/code-tools/jmh-jdk-microbenchmarks/
>
> I'd definitely welcome the micro as part of the patch under
> test/micro/org/openjdk/bench/java/util/zip - additionally contributing

I knew that "jmh-jdk-microbenchmarks" has been copied to the jdk repo
but somehow I did found "jmh-jdk-microbenchmarks" before looking in
the obvious place :)

So I've added the benchmark to the patch now. There's one thing I'm
not sure however. The benchmark requires a "big" (several 100k) with
good compression ratio (e.g. a large text file). I've decided to use a
big Java source file from "src.zip" but I'm not sure if "src.zip" is
always available in the jdk images which are used to run the
microbenchmarks. Do you think the test it is fine this way or do you
have a better idea? I saw that "ZipFind" uses "microbenchmarks.jar"
(i.e. the container of the test itself) but that file is already
compressed so the compression rate won't be that good.

Another thing I couldn't figure out is a good way to skip the
benchmark when I realize that I can't load the expected file in the
"@Setup" method. Do you now anything better than just throwing an
exception?

Thank you and best regards,
Volker

> to jmh-jdk-microbenchmarks could enable you to test the micro on 8 or
> 11.
>
> /Claes
>