performance updates to jar and zip

Xueming Shen xueming.shen at oracle.com
Wed Oct 26 23:19:57 UTC 2011


Hi Mark

It appears the patch you provided throws unexpected exception (attached 
at the end of my
email)  when I tried it out on the  latest JDK8 repository.  Since I 
only did a quick scan of your
patch, I'm not sure what went wrong here.

This patch includes lots of stuff that obviously you are trying/testing 
on, as you "warned" us in
your email, I can see at least it tries to

(1) to support different compression level 0-9
(2) parallel Zip file writing
(3) with various m-thread strategy -Z
(4) Files.walkFileTree instead of File.list
(5) the -D :-) which I would really not recommend to do
(6) small optimization in various places.

which makes the code a little hard to read and the resulting data hard 
to compare with.
I would suggest to divide this proposal to separate pieces and work on 
them one by one,
for example maybe we can try to solve the main puzzle (2) + (3) first, 
and then the other
optimization opportunities.

To collect some data, I followed your lead to write a simple MT support 
implementation
in Jar Main class as showed at

http://cr.openjdk.java.net/~sherman/mtjar2/webrev2/

which I guess is similar to what your are doing. It uses a "simple" strategy

(1) A dedicated thread (from the ExecutorService thread pool) to iterate 
the file system
      tree to "collect" the target files,  submit a "compression  job" 
for  each of these files
      to the ExecutorService and keep the returned "Future" (from the 
submission) in a
      queue "elist".
(2) Threads from ExecutorService to use temporary buffer memory to read 
and compress
       the the file in memory .
(3) The main thread is polling the queue "elist", waiting for the 
"compression job" to
      cmplete and  write the result into the target ZipOutputStream.

The resulting data looks promising, I'm seeing the jar-ing speed doubled 
when jar-ing
the rt.jar and a jdk7 binary tree, on a "slow" but 4-core linux vm 
machine (I have the
similar result on a 2-hcore linux as well)

java Jar cf jdk.jar jdk1.7.0        Jar TotalTime:17278
java Jar cfT1 jdk.jar jdk1.7.0   Jar TotalTime:12345
java Jar cfT2 jdk.jar jdk1.7.0   Jar TotalTime:7559
java Jar cfT3 jdk.jar jdk1.7.0   Jar TotalTime:7572
java Jar cfT4 jdk.jar jdk1.7.0   Jar TotalTime:7801
java Jar cfT5 jdk.jar jdk1.7.0   Jar TotalTime:8112

The new "T" option for "n-thread", the digit number followed is to 
specify the
fixed thread number for the executor service's thread pool.  It appears 
that we can
achieve the "best" result with only 3 threads in this configuration. One 
thread for
scanning the file system, one thread for the compression and the main 
thread for
the writing out. My guess is that the fact we have to "write out" to a 
single file
(the resulting jar) limits the potential benefit of having more 
"compressing" threads.

I also tried to measure the "file scanning" speed in my mini-benchmark FIter

  http://cr.openjdk.java.net/~sherman/mtjar2/FIter.java

Here are the "surprising" results.

  "nio" is the walkFileTree,
  "io" is the File.list()
  "io2" is the File.listFiles().

The nio's File.walkFileTree is 15 times faster than the "traditional" 
recursion+File.list().
wow!

Linux--------------------------------------------------------------------------
sherman at sherman-linux:~/Workspace/test$ java FIter ../jdk8_mtJar/src
java.io.File iteration
------------------
   nio.totalSize:137149279
         fileNum:12222
        checkSum:16122691809689000
            Time:85
------------------
   io.totalSize:137149279
         fileNum:12222
        checkSum:16122691809689000
           Time:269
------------------
  io2.totalSize:137149279
         fileNum:12222
        checkSum:16122691809689000
           Time:450

Windows7---------------------------------------------------------------------------------

$ /cygdrive/c/Program\ Files\ \(x86\)/Java/jdk1.7.0/bin/java FIter 
../sqa/jdk8/src
java.io.File iteration
------------------
   nio.totalSize:136695871
         fileNum:12199
        checkSum:15997350823839479
            Time:323
------------------
   io.totalSize:136695871
         fileNum:12199
        checkSum:15997350823839479
           Time:2633
------------------
  io2.totalSize:136695871
         fileNum:12199
        checkSum:15997350823839479
           Time:4592


----------------------------------------------------------------------

sherman at sherman-linux:~/Workspace/test$ 
../jdk8_mtJar/build/linux-i586/bin/jar cf6DZ3 rt0.jar rtjar
duplicate path
java.util.zip.ZipException: duplicate entry: ../
     at 
java.util.zip.AbstractZipWriter.writeHeader(AbstractZipWriter.java:647)
     at 
java.util.zip.AbstractZipWriter.startWritingStored(AbstractZipWriter.java:384)
     at 
java.util.zip.AbstractZipWriter.writeWithResource(AbstractZipWriter.java:350)
     at 
java.util.zip.AbstractZipWriter.writeAll(AbstractZipWriter.java:273)
     at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:410)
     at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:350)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
     at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
     at java.lang.Thread.run(Thread.java:722)
java.util.concurrent.ExecutionException: java.util.zip.ZipException: 
duplicate entry: ../
     at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
     at java.util.concurrent.FutureTask.get(FutureTask.java:111)
     at sun.tools.jar.Main.waitFor(Main.java:810)
     at sun.tools.jar.Main.run(Main.java:679)
     at sun.tools.jar.Main.main(Main.java:1842)
Caused by: java.util.zip.ZipException: duplicate entry: ../

-Sherman

On 10/20/2011 3:55 PM, Mike Skells wrote:
> Hi All,
> I have some performance updates for the jar tool and for the Zip/Jar writing components, including some code to allow parallel writing of Jar and ZIP files (in java.util) 
>
> This work is not finished as yet but I am looking to see if anyone has any views as to the shape this should move in
> Currently it is a testbed for comparing different techniques, but largely based on the Jar utility
>
> The changes allow the work to be spread across multiple CPUs and optimise the some of the code and I/O paths
>
> This comparative figures do not include the effect of the nio changes that I proposed in earlier emails
>
> Command line changes
> 0--9 - I have added support for specifying different compression levels (the existing jar command just allows default compression or '0' for no compression
> D This allows the files to all be written with the date of now, lather than the file date  (the conversion of the date to zip format is a CPU hog, and not needed in some use-cases)
> Z0-5 - these are the different mechanisms to allow different parallel execution models - I would not expect this to be a production qualifier
>
> The test environment is a 4 core Intel core2 pc running windows  vista 64, the test case is jaring up the content of rt.jar to a jar file. 
> Each test is repeated 6 times and the last 5 are averaged to produce the answers. Each test is run in a fresh VM
>
> The performance figures are below as a CSV. The last column is the duration of the task in ms.
>
> In summary the existing jar utility takes (for uncompressed, compressed) 8.4 , 9.4 seconds to complete and this can be reduced to 1.6, 2.3 seconds  
>
> The different parallel algorithms are 
> 0 - none all in one thread as before
> 1 - file scanning in one core, 10 threads loading and buffering files, zip writing in a single thread using the existing ZipOuputStream
> 2. - file scanning in one core, 10 threads loading and buffering files, zip writing mostly mutithreaded (e.g. parallel compression, single write to the output stream)
> 3 - as 2 but writes to a file rather than a stream
> 4. as 2 but uses channels to be to write with direct buffers
> 5 as 4 but using heap buffers
>
> 3-5 have the zip capability in the code to seek and update headers that are incomplete, but this is not much tested
>   
>
>
>
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf0, java 1.6 rt -cf0, 8482
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf, java 1.6 rt -cf, 9318
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf0, java 1.7 rt -cf0, 8497
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf, java 1.7 rt -cf, 9518
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf0, orig 1.7 rt -cf0, 8448
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf, orig 1.7 rt -cf, 9484
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0, project 1.7 rt -cf0, 3133
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0D, project 1.7 rt -cf0D, 2824
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0Z0, project 1.7 rt -cf0 parallel 0, 3026
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ0, project 1.7 rt -cf0D parallel 0, 2961
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ1, project 1.7 rt -cf0D parallel 1, 2022
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ2, project 1.7 rt -cf0D parallel 2, 1757
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ3, project 1.7 rt -cf0D parallel 3, 1632
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ4, project 1.7 rt -cf0D parallel 4, 1994
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ5, project 1.7 rt -cf0D parallel 5, 1978
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1, project 1.7 rt -cf1, 5237
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1D, project 1.7 rt -cf1D, 5073
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1Z0, project 1.7 rt -cf1 parallel 0, 5367
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ0, project 1.7 rt -cf1D parallel 0, 5002
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ1, project 1.7 rt -cf1D parallel 1, 5125
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ2, project 1.7 rt -cf1D parallel 2, 2257
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ3, project 1.7 rt -cf1D parallel 3, 2145
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ4, project 1.7 rt -cf1D parallel 4, 2505
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ5, project 1.7 rt -cf1D parallel 5, 2549
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf2, project 1.7 rt -cf2, 5371
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf3, project 1.7 rt -cf3, 5409
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf4, project 1.7 rt -cf4, 5778
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf5, project 1.7 rt -cf5, 5906
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6, project 1.7 rt -cf6, 6082
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf7, project 1.7 rt -cf7, 6070
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf8, project 1.7 rt -cf8, 6251
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9, project 1.7 rt -cf9, 6191
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6D, project 1.7 rt -cf6D, 5843
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6Z0, project 1.7 rt -cf6 parallel 0, 6095
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ0, project 1.7 rt -cf6D parallel 0, 5907
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ1, project 1.7 rt -cf6D parallel 1, 5957
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ2, project 1.7 rt -cf6D parallel 2, 2388
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ3, project 1.7 rt -cf6D parallel 3, 2351
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ4, project 1.7 rt -cf6D parallel 4, 2694
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ5, project 1.7 rt -cf6D parallel 5, 2830
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9D, project 1.7 rt -cf9D, 6134
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9Z0, project 1.7 rt -cf9 parallel 0, 6258
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ0, project 1.7 rt -cf9D parallel 0, 6066
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ1, project 1.7 rt -cf9D parallel 1, 6203
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ2, project 1.7 rt -cf9D parallel 2, 2490
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ3, project 1.7 rt -cf9D parallel 3, 2361
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ4, project 1.7 rt -cf9D parallel 4, 2788
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ5, project 1.7 rt -cf9D parallel 5, 2847
>
> regards
> Mike



More information about the core-libs-dev mailing list