performance updates to jar and zip
Xueming Shen
xueming.shen at oracle.com
Wed Oct 26 23:19:57 UTC 2011
Hi Mark
It appears the patch you provided throws unexpected exception (attached
at the end of my
email) when I tried it out on the latest JDK8 repository. Since I
only did a quick scan of your
patch, I'm not sure what went wrong here.
This patch includes lots of stuff that obviously you are trying/testing
on, as you "warned" us in
your email, I can see at least it tries to
(1) to support different compression level 0-9
(2) parallel Zip file writing
(3) with various m-thread strategy -Z
(4) Files.walkFileTree instead of File.list
(5) the -D :-) which I would really not recommend to do
(6) small optimization in various places.
which makes the code a little hard to read and the resulting data hard
to compare with.
I would suggest to divide this proposal to separate pieces and work on
them one by one,
for example maybe we can try to solve the main puzzle (2) + (3) first,
and then the other
optimization opportunities.
To collect some data, I followed your lead to write a simple MT support
implementation
in Jar Main class as showed at
http://cr.openjdk.java.net/~sherman/mtjar2/webrev2/
which I guess is similar to what your are doing. It uses a "simple" strategy
(1) A dedicated thread (from the ExecutorService thread pool) to iterate
the file system
tree to "collect" the target files, submit a "compression job"
for each of these files
to the ExecutorService and keep the returned "Future" (from the
submission) in a
queue "elist".
(2) Threads from ExecutorService to use temporary buffer memory to read
and compress
the the file in memory .
(3) The main thread is polling the queue "elist", waiting for the
"compression job" to
cmplete and write the result into the target ZipOutputStream.
The resulting data looks promising, I'm seeing the jar-ing speed doubled
when jar-ing
the rt.jar and a jdk7 binary tree, on a "slow" but 4-core linux vm
machine (I have the
similar result on a 2-hcore linux as well)
java Jar cf jdk.jar jdk1.7.0 Jar TotalTime:17278
java Jar cfT1 jdk.jar jdk1.7.0 Jar TotalTime:12345
java Jar cfT2 jdk.jar jdk1.7.0 Jar TotalTime:7559
java Jar cfT3 jdk.jar jdk1.7.0 Jar TotalTime:7572
java Jar cfT4 jdk.jar jdk1.7.0 Jar TotalTime:7801
java Jar cfT5 jdk.jar jdk1.7.0 Jar TotalTime:8112
The new "T" option for "n-thread", the digit number followed is to
specify the
fixed thread number for the executor service's thread pool. It appears
that we can
achieve the "best" result with only 3 threads in this configuration. One
thread for
scanning the file system, one thread for the compression and the main
thread for
the writing out. My guess is that the fact we have to "write out" to a
single file
(the resulting jar) limits the potential benefit of having more
"compressing" threads.
I also tried to measure the "file scanning" speed in my mini-benchmark FIter
http://cr.openjdk.java.net/~sherman/mtjar2/FIter.java
Here are the "surprising" results.
"nio" is the walkFileTree,
"io" is the File.list()
"io2" is the File.listFiles().
The nio's File.walkFileTree is 15 times faster than the "traditional"
recursion+File.list().
wow!
Linux--------------------------------------------------------------------------
sherman at sherman-linux:~/Workspace/test$ java FIter ../jdk8_mtJar/src
java.io.File iteration
------------------
nio.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:85
------------------
io.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:269
------------------
io2.totalSize:137149279
fileNum:12222
checkSum:16122691809689000
Time:450
Windows7---------------------------------------------------------------------------------
$ /cygdrive/c/Program\ Files\ \(x86\)/Java/jdk1.7.0/bin/java FIter
../sqa/jdk8/src
java.io.File iteration
------------------
nio.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:323
------------------
io.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:2633
------------------
io2.totalSize:136695871
fileNum:12199
checkSum:15997350823839479
Time:4592
----------------------------------------------------------------------
sherman at sherman-linux:~/Workspace/test$
../jdk8_mtJar/build/linux-i586/bin/jar cf6DZ3 rt0.jar rtjar
duplicate path
java.util.zip.ZipException: duplicate entry: ../
at
java.util.zip.AbstractZipWriter.writeHeader(AbstractZipWriter.java:647)
at
java.util.zip.AbstractZipWriter.startWritingStored(AbstractZipWriter.java:384)
at
java.util.zip.AbstractZipWriter.writeWithResource(AbstractZipWriter.java:350)
at
java.util.zip.AbstractZipWriter.writeAll(AbstractZipWriter.java:273)
at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:410)
at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:350)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
java.util.concurrent.ExecutionException: java.util.zip.ZipException:
duplicate entry: ../
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at sun.tools.jar.Main.waitFor(Main.java:810)
at sun.tools.jar.Main.run(Main.java:679)
at sun.tools.jar.Main.main(Main.java:1842)
Caused by: java.util.zip.ZipException: duplicate entry: ../
-Sherman
On 10/20/2011 3:55 PM, Mike Skells wrote:
> Hi All,
> I have some performance updates for the jar tool and for the Zip/Jar writing components, including some code to allow parallel writing of Jar and ZIP files (in java.util)
>
> This work is not finished as yet but I am looking to see if anyone has any views as to the shape this should move in
> Currently it is a testbed for comparing different techniques, but largely based on the Jar utility
>
> The changes allow the work to be spread across multiple CPUs and optimise the some of the code and I/O paths
>
> This comparative figures do not include the effect of the nio changes that I proposed in earlier emails
>
> Command line changes
> 0--9 - I have added support for specifying different compression levels (the existing jar command just allows default compression or '0' for no compression
> D This allows the files to all be written with the date of now, lather than the file date (the conversion of the date to zip format is a CPU hog, and not needed in some use-cases)
> Z0-5 - these are the different mechanisms to allow different parallel execution models - I would not expect this to be a production qualifier
>
> The test environment is a 4 core Intel core2 pc running windows vista 64, the test case is jaring up the content of rt.jar to a jar file.
> Each test is repeated 6 times and the last 5 are averaged to produce the answers. Each test is run in a fresh VM
>
> The performance figures are below as a CSV. The last column is the duration of the task in ms.
>
> In summary the existing jar utility takes (for uncompressed, compressed) 8.4 , 9.4 seconds to complete and this can be reduced to 1.6, 2.3 seconds
>
> The different parallel algorithms are
> 0 - none all in one thread as before
> 1 - file scanning in one core, 10 threads loading and buffering files, zip writing in a single thread using the existing ZipOuputStream
> 2. - file scanning in one core, 10 threads loading and buffering files, zip writing mostly mutithreaded (e.g. parallel compression, single write to the output stream)
> 3 - as 2 but writes to a file rather than a stream
> 4. as 2 but uses channels to be to write with direct buffers
> 5 as 4 but using heap buffers
>
> 3-5 have the zip capability in the code to seek and update headers that are incomplete, but this is not much tested
>
>
>
>
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf0, java 1.6 rt -cf0, 8482
> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf, java 1.6 rt -cf, 9318
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf0, java 1.7 rt -cf0, 8497
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf, java 1.7 rt -cf, 9518
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf0, orig 1.7 rt -cf0, 8448
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf, orig 1.7 rt -cf, 9484
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0, project 1.7 rt -cf0, 3133
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0D, project 1.7 rt -cf0D, 2824
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0Z0, project 1.7 rt -cf0 parallel 0, 3026
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ0, project 1.7 rt -cf0D parallel 0, 2961
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ1, project 1.7 rt -cf0D parallel 1, 2022
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ2, project 1.7 rt -cf0D parallel 2, 1757
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ3, project 1.7 rt -cf0D parallel 3, 1632
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ4, project 1.7 rt -cf0D parallel 4, 1994
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ5, project 1.7 rt -cf0D parallel 5, 1978
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1, project 1.7 rt -cf1, 5237
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1D, project 1.7 rt -cf1D, 5073
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1Z0, project 1.7 rt -cf1 parallel 0, 5367
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ0, project 1.7 rt -cf1D parallel 0, 5002
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ1, project 1.7 rt -cf1D parallel 1, 5125
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ2, project 1.7 rt -cf1D parallel 2, 2257
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ3, project 1.7 rt -cf1D parallel 3, 2145
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ4, project 1.7 rt -cf1D parallel 4, 2505
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ5, project 1.7 rt -cf1D parallel 5, 2549
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf2, project 1.7 rt -cf2, 5371
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf3, project 1.7 rt -cf3, 5409
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf4, project 1.7 rt -cf4, 5778
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf5, project 1.7 rt -cf5, 5906
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6, project 1.7 rt -cf6, 6082
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf7, project 1.7 rt -cf7, 6070
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf8, project 1.7 rt -cf8, 6251
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9, project 1.7 rt -cf9, 6191
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6D, project 1.7 rt -cf6D, 5843
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6Z0, project 1.7 rt -cf6 parallel 0, 6095
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ0, project 1.7 rt -cf6D parallel 0, 5907
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ1, project 1.7 rt -cf6D parallel 1, 5957
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ2, project 1.7 rt -cf6D parallel 2, 2388
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ3, project 1.7 rt -cf6D parallel 3, 2351
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ4, project 1.7 rt -cf6D parallel 4, 2694
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ5, project 1.7 rt -cf6D parallel 5, 2830
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9D, project 1.7 rt -cf9D, 6134
>
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9Z0, project 1.7 rt -cf9 parallel 0, 6258
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ0, project 1.7 rt -cf9D parallel 0, 6066
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ1, project 1.7 rt -cf9D parallel 1, 6203
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ2, project 1.7 rt -cf9D parallel 2, 2490
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ3, project 1.7 rt -cf9D parallel 3, 2361
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ4, project 1.7 rt -cf9D parallel 4, 2788
> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ5, project 1.7 rt -cf9D parallel 5, 2847
>
> regards
> Mike
More information about the core-libs-dev
mailing list