Hi Claes, will you open a bug for this? Thanks Christoph
-----Original Message----- From: core-libs-dev <core-libs-dev-bounces@openjdk.java.net> On Behalf Of Lennart Börjeson Sent: Dienstag, 16. April 2019 09:05 To: Claes Redestad <claes.redestad@oracle.com> Cc: core-libs-dev@openjdk.java.net Subject: Re: ZipFileSystem performance regression
Very good, thank you!
Also note that the "new" implementation also requires *a lot* more heap, since all *uncompressed* file content is copied to the heap before deflating.
Best regards,
/Lennart Börjeson
15 apr. 2019 kl. 18:34 skrev Claes Redestad <claes.redestad@oracle.com>:
Hi Lennart,
I can reproduce this locally, and have narrowed down to https://bugs.openjdk.java.net/browse/JDK-8034802 as the cause.
As you say the compression is deferred to ZipFileSystem.close() now, whereas previously it happened eagerly. We will have to analyze the changes more in-depth to try and see why this is the case.
Thanks!
/Claes
On 2019-04-15 15:32, Lennart Börjeson wrote:
I have made a small command-line utility which creates zip archives by compressing the input files in parallel. I do this by calling Files.copy(input, zipOutputStream) in a parallel Stream over all input files. I have run this with Java 1.8, 9, 10, and 11, on both my local laptop and on server-class machines with up to 40 cores. It scales rather well and I get the expected speedup, i.e. roughly proportional to the number of cores. With OpenJDK 12, however, I get no speedup whatsoever. My understanding is that in JDK 12, the copy method simply transfers the contents of the input stream to a ByteArrayOutputStream, which is then deflated when I close the ZipFileSystem (by the ZipFileSystem.sync() method). Previously, the deflation was done when in the call to Files.copy, thus executed in parallel, and the final ZipFileSystem.close() didn't do anything much. (But I may of course be wrong. In case there's a simpler explanation and something I can remedy in my code, please let me know.) I have a small GitHub gist for the utility here: https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923 <https://gist.github.com/lenborje/6d2f92430abe4ba881e3c5ff83736923> Steps to reproduce (timings on my 8-core MacBook Pro): 1) Get the contents of the gist as a single file, Zip.java 2) Compile it using Java 8. $ export JAVA_HOME=<PATH-TO-JDK8-HOME> $ javac -encoding utf8 Zip.java 3) run on a sufficiently large number of files to exercise the parallelity: (I've used about 70 text files ca 60MB each) $ time java -Xmx6g Zip -p /tmp/test.zip <WHATEVER>/*.log. Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL] ... completed zip archive, now closing... done! real 0m35.558s user 3m58.134s sys 0m5.543s As is evident from the ratio between "user time" and "real time", all cores have been busy most of the time. (Running with JDK 9, 10, and 11 produces similar timings.) But running with JDK 12 defeats the parallelism: $ export JAVA_HOME=<PATH-TO-JDK12-HOME> $ rm /tmp/test.zip # From previous run $ time java -Xmx6g Zip -p /tmp/test.zip <WHATEVER>/*.log. Working on ZIP FileSystem jar:file:/tmp/test.zip, using the options [PARALLEL] ... completed zip archive, now closing... done! real 3m1.187s user 3m5.422s sys 0m12.396s Now there's almost no speedup. When observing the output, note that the ZipFileSystem.close() method is called immediately after the "now closing..." output, and "Done!" Is written when it returns, and when running with JDK 12 almost all running time is apparently spent there. I'm hoping the previous behaviour could somehow be restored, i.e. that deflation actually happens when I'm copying the input files to the ZipFileSystem, and not when I close it. Best regards, /Lennart Börjeson
12 apr. 2019 kl. 14:25 skrev Lennart Börjeson <lenborje@gmail.com>:
I've found what I believe is a rather severe performance regression in ZipFileSystem. 1.8 and 11 runs OK, 12 does not.
Is this the right forum to report such issues?
Best regards,
/Lennart Börjeson