performance updates to jar and zip
Mike Skells
mike.skells at talk21.com
Thu Oct 27 23:07:50 UTC 2011
Hi Sherman,
I think that you will get significant benefit from generating the data structures in the background threads.
I think that is you profile the usageyou will see that the generation of the header information is the dominant feature.
That is why I parallelised the writing process.
There are several bottlenecks such as the encoding of the name name and (although you dismiss it) the calculation of the dos time format is a CPU hog (the -D qualifier). I hink that it is about 10% of the overall CPU load
This is by the way pretty much in line with the extraction feature below added in java 6, so I cant see that there is a great reason against it,
after all why spend time storing information that (in most use cases) is not read (either because the jar utility does not by default maintain it, and most jar files are
probably not expanded anyway
/**
* If true, maintain compatibility with JDK releases prior to 6.0 by
* timestamping extracted files with the time at which they are extracted.
* Default is to use the time given in the archive.
*/
private static final boolean useExtractionTime =
Boolean.getBoolean("sun.tools.jar.useExtractionTime");
Here are the times that I get running the code that you wrote on my setup
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf, cf, 10279
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT1, cfT1, 9652
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT2, cfT2, 6139
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT3, cfT3, 5683
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT4, cfT4, 6102
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT5, cfT5, 6172
I think that the reason that it tails off in performance is rather that you are overloading the system with the background threads. You have many threads (ie > cores) loading the files, and they are contending for the CPU
and the writer thread is not getting its share of time, so with 3 threads + the initail file scanning and the writer thread there are more threads that can be services
If you introduce an ArrayBlockingQueue for both of the scanning -> compression and the compression->writing
and also get run of the cpu bound ( until the scanner gets going) polling like
while(true) {
Object o = elist.poll();
if (o == null)
continue;
I dont think that you have the seperation of the loading and storing sorted out. The code adds the future to elist, and the worker thread reads it whether or not it has completed,
so some times the loading is done on the background thread before the main thread reads it, and sometimes it blocks, even when other jobs have completed, so I think that a completion queue
works better for this. It will complicate the END processing though
If I am reading the code correctly I think that there are potential memory issues.
There are an unlimitted number of jobs submitted to an executor, which while it only executes T jobs, the jobs may still queue up in elist, and each job can buffer 50Mb of data. If the writing of the output is too slow you could run out of memory
Line 666 and 672 (both return statements ) I think should be continue;
With T1 there is no effective pipelining as I see it. The scannign thread has to complete before the loading thread can start (as there is only 1 CPU). So withthe blocking thread model we have to start at 2 threads as otherwise it may deadlock itself
with a blocking queue (and minor changes caused or implied by a blocking queue)
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf, cf, 10274
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT2, cfT2, 7201
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT3, cfT3, 5836
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT4, cfT4, 5884
C:\Program Files\Java\jdk1.7.0\bin\java.exe -Xbootclasspath/p:, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cfT5, cfT5, 5890
I tried to repoduce the exception that you have, but I cant
I donw have a java8 install on this machine, or unix.
It does seem very strange that there is a file being written as "../" in the first place though ( let alone a duplicate)
I didnt think that any of the API would return ../
Is it only on Z3 that this error occurs?
I will install a Java8 with the patch, but it will be at the start of next week
regards
Mike
>________________________________
>From: Xueming Shen <xueming.shen at oracle.com>
>To: core-libs-dev at openjdk.java.net
>Sent: Thursday, 27 October 2011, 0:19
>Subject: Re: performance updates to jar and zip
>
>Hi Mark
>
>It appears the patch you provided throws unexpected exception (attached at the end of my
>email) when I tried it out on the latest JDK8 repository. Since I only did a quick scan of your
>patch, I'm not sure what went wrong here.
>
>This patch includes lots of stuff that obviously you are trying/testing on, as you "warned" us in
>your email, I can see at least it tries to
>
>(1) to support different compression level 0-9
>(2) parallel Zip file writing
>(3) with various m-thread strategy -Z
>(4) Files.walkFileTree instead of File.list
>(5) the -D :-) which I would really not recommend to do
>(6) small optimization in various places.
>
>which makes the code a little hard to read and the resulting data hard to compare with.
>I would suggest to divide this proposal to separate pieces and work on them one by one,
>for example maybe we can try to solve the main puzzle (2) + (3) first, and then the other
>optimization opportunities.
>
>To collect some data, I followed your lead to write a simple MT support implementation
>in Jar Main class as showed at
>
>http://cr.openjdk.java.net/~sherman/mtjar2/webrev2/
>
>which I guess is similar to what your are doing. It uses a "simple" strategy
>
>(1) A dedicated thread (from the ExecutorService thread pool) to iterate the file system
> tree to "collect" the target files, submit a "compression job" for each of these files
> to the ExecutorService and keep the returned "Future" (from the submission) in a
> queue "elist".
>(2) Threads from ExecutorService to use temporary buffer memory to read and compress
> the the file in memory .
>(3) The main thread is polling the queue "elist", waiting for the "compression job" to
> cmplete and write the result into the target ZipOutputStream.
>
>The resulting data looks promising, I'm seeing the jar-ing speed doubled when jar-ing
>the rt.jar and a jdk7 binary tree, on a "slow" but 4-core linux vm machine (I have the
>similar result on a 2-hcore linux as well)
>
>java Jar cf jdk.jar jdk1.7.0 Jar TotalTime:17278
>java Jar cfT1 jdk.jar jdk1.7.0 Jar TotalTime:12345
>java Jar cfT2 jdk.jar jdk1.7.0 Jar TotalTime:7559
>java Jar cfT3 jdk.jar jdk1.7.0 Jar TotalTime:7572
>java Jar cfT4 jdk.jar jdk1.7.0 Jar TotalTime:7801
>java Jar cfT5 jdk.jar jdk1.7.0 Jar TotalTime:8112
>
>The new "T" option for "n-thread", the digit number followed is to specify the
>fixed thread number for the executor service's thread pool. It appears that we can
>achieve the "best" result with only 3 threads in this configuration. One thread for
>scanning the file system, one thread for the compression and the main thread for
>the writing out. My guess is that the fact we have to "write out" to a single file
>(the resulting jar) limits the potential benefit of having more "compressing" threads.
>
>I also tried to measure the "file scanning" speed in my mini-benchmark FIter
>
>http://cr.openjdk.java.net/~sherman/mtjar2/FIter.java
>
>Here are the "surprising" results.
>
>"nio" is the walkFileTree,
>"io" is the File.list()
>"io2" is the File.listFiles().
>
>The nio's File.walkFileTree is 15 times faster than the "traditional" recursion+File.list().
>wow!
>
>Linux--------------------------------------------------------------------------
>sherman at sherman-linux:~/Workspace/test$ java FIter ../jdk8_mtJar/src
>java.io.File iteration
>------------------
> nio.totalSize:137149279
> fileNum:12222
> checkSum:16122691809689000
> Time:85
>------------------
> io.totalSize:137149279
> fileNum:12222
> checkSum:16122691809689000
> Time:269
>------------------
>io2.totalSize:137149279
> fileNum:12222
> checkSum:16122691809689000
> Time:450
>
>Windows7---------------------------------------------------------------------------------
>
>$ /cygdrive/c/Program\ Files\ \(x86\)/Java/jdk1.7.0/bin/java FIter ../sqa/jdk8/src
>java.io.File iteration
>------------------
> nio.totalSize:136695871
> fileNum:12199
> checkSum:15997350823839479
> Time:323
>------------------
> io.totalSize:136695871
> fileNum:12199
> checkSum:15997350823839479
> Time:2633
>------------------
>io2.totalSize:136695871
> fileNum:12199
> checkSum:15997350823839479
> Time:4592
>
>
>----------------------------------------------------------------------
>
>sherman at sherman-linux:~/Workspace/test$ ../jdk8_mtJar/build/linux-i586/bin/jar cf6DZ3 rt0.jar rtjar
>duplicate path
>java.util.zip.ZipException: duplicate entry: ../
> at java.util.zip.AbstractZipWriter.writeHeader(AbstractZipWriter.java:647)
> at java.util.zip.AbstractZipWriter.startWritingStored(AbstractZipWriter.java:384)
> at java.util.zip.AbstractZipWriter.writeWithResource(AbstractZipWriter.java:350)
> at java.util.zip.AbstractZipWriter.writeAll(AbstractZipWriter.java:273)
> at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:410)
> at sun.tools.jar.Main$ZipOutputLoader2File.call(Main.java:350)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
>java.util.concurrent.ExecutionException: java.util.zip.ZipException: duplicate entry: ../
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> at sun.tools.jar.Main.waitFor(Main.java:810)
> at sun.tools.jar.Main.run(Main.java:679)
> at sun.tools.jar.Main.main(Main.java:1842)
>Caused by: java.util.zip.ZipException: duplicate entry: ../
>
>-Sherman
>
>On 10/20/2011 3:55 PM, Mike Skells wrote:
>> Hi All,
>> I have some performance updates for the jar tool and for the Zip/Jar writing components, including some code to allow parallel writing of Jar and ZIP files (in java.util)
>> This work is not finished as yet but I am looking to see if anyone has any views as to the shape this should move in
>> Currently it is a testbed for comparing different techniques, but largely based on the Jar utility
>>
>> The changes allow the work to be spread across multiple CPUs and optimise the some of the code and I/O paths
>>
>> This comparative figures do not include the effect of the nio changes that I proposed in earlier emails
>>
>> Command line changes
>> 0--9 - I have added support for specifying different compression levels (the existing jar command just allows default compression or '0' for no compression
>> D This allows the files to all be written with the date of now, lather than the file date (the conversion of the date to zip format is a CPU hog, and not needed in some use-cases)
>> Z0-5 - these are the different mechanisms to allow different parallel execution models - I would not expect this to be a production qualifier
>>
>> The test environment is a 4 core Intel core2 pc running windows vista 64, the test case is jaring up the content of rt.jar to a jar file. Each test is repeated 6 times and the last 5 are averaged to produce the answers. Each test is run in a fresh VM
>>
>> The performance figures are below as a CSV. The last column is the duration of the task in ms.
>>
>> In summary the existing jar utility takes (for uncompressed, compressed) 8.4 , 9.4 seconds to complete and this can be reduced to 1.6, 2.3 seconds
>> The different parallel algorithms are 0 - none all in one thread as before
>> 1 - file scanning in one core, 10 threads loading and buffering files, zip writing in a single thread using the existing ZipOuputStream
>> 2. - file scanning in one core, 10 threads loading and buffering files, zip writing mostly mutithreaded (e.g. parallel compression, single write to the output stream)
>> 3 - as 2 but writes to a file rather than a stream
>> 4. as 2 but uses channels to be to write with direct buffers
>> 5 as 4 but using heap buffers
>>
>> 3-5 have the zip capability in the code to seek and update headers that are incomplete, but this is not much tested
>>
>>
>>
>> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf0, java 1.6 rt -cf0, 8482
>> C:\Program Files\Java\jdk1.6.0_24\bin\java.exe, C:\Program Files\Java\jdk1.6.0_24\lib\tools.jar, -cf, java 1.6 rt -cf, 9318
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf0, java 1.7 rt -cf0, 8497
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Program Files\Java\jdk1.7.0\lib\tools.jar, -cf, java 1.7 rt -cf, 9518
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf0, orig 1.7 rt -cf0, 8448
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\Test\Archive\baseline.jar, -cf, orig 1.7 rt -cf, 9484
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0, project 1.7 rt -cf0, 3133
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0D, project 1.7 rt -cf0D, 2824
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0Z0, project 1.7 rt -cf0 parallel 0, 3026
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ0, project 1.7 rt -cf0D parallel 0, 2961
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ1, project 1.7 rt -cf0D parallel 1, 2022
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ2, project 1.7 rt -cf0D parallel 2, 1757
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ3, project 1.7 rt -cf0D parallel 3, 1632
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ4, project 1.7 rt -cf0D parallel 4, 1994
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf0DZ5, project 1.7 rt -cf0D parallel 5, 1978
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1, project 1.7 rt -cf1, 5237
>>
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1D, project 1.7 rt -cf1D, 5073
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1Z0, project 1.7 rt -cf1 parallel 0, 5367
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ0, project 1.7 rt -cf1D parallel 0, 5002
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ1, project 1.7 rt -cf1D parallel 1, 5125
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ2, project 1.7 rt -cf1D parallel 2, 2257
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ3, project 1.7 rt -cf1D parallel 3, 2145
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ4, project 1.7 rt -cf1D parallel 4, 2505
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf1DZ5, project 1.7 rt -cf1D parallel 5, 2549
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf2, project 1.7 rt -cf2, 5371
>>
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf3, project 1.7 rt -cf3, 5409
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf4, project 1.7 rt -cf4, 5778
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf5, project 1.7 rt -cf5, 5906
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6, project 1.7 rt -cf6, 6082
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf7, project 1.7 rt -cf7, 6070
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf8, project 1.7 rt -cf8, 6251
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9, project 1.7 rt -cf9, 6191
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6D, project 1.7 rt -cf6D, 5843
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6Z0, project 1.7 rt -cf6 parallel 0, 6095
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ0, project 1.7 rt -cf6D parallel 0, 5907
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ1, project 1.7 rt -cf6D parallel 1, 5957
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ2, project 1.7 rt -cf6D parallel 2, 2388
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ3, project 1.7 rt -cf6D parallel 3, 2351
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ4, project 1.7 rt -cf6D parallel 4, 2694
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf6DZ5, project 1.7 rt -cf6D parallel 5, 2830
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9D, project 1.7 rt -cf9D, 6134
>>
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9Z0, project 1.7 rt -cf9 parallel 0, 6258
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ0, project 1.7 rt -cf9D parallel 0, 6066
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ1, project 1.7 rt -cf9D parallel 1, 6203
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ2, project 1.7 rt -cf9D parallel 2, 2490
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ3, project 1.7 rt -cf9D parallel 3, 2361
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ4, project 1.7 rt -cf9D parallel 4, 2788
>> C:\Program Files\Java\jdk1.7.0\bin\java.exe, C:\NetBeansProjects\JavaProject1\dist\javaproject1.jar, -cf9DZ5, project 1.7 rt -cf9D parallel 5, 2847
>>
>> regards
>> Mike
>
>
>
More information about the core-libs-dev
mailing list