proposal to optimise the performance of the Jar utility

Wed Apr 6 18:04:25 UTC 2011

Hi Mike,

We are in the final stage of the JDK7 development, work like this is 
unlikely to get in the
last minute. I have filed a CR/RFE to trace this issue, we can use this 
CR as the start
point for the discussion and target for some jdk7 update releases or JDK8.

7034403: proposal to optimise the performance of the Jar utility

Thanks,
Sherman

On 04/05/2011 04:42 PM, mike.skells at talk21.com wrote:
> Hi,
> Not sure if this is too late for Java 7 but I have made some optimisations for a
> client to improve the performance of the jar utility in their environment, and
> would like to promite then into the main code base
>
> The optimisations that I have performed are
>
> 1. Allowing the Jar utility to have other compression levels (currently it
> allows default (5) only)
> 2. Multi-threading, and pipelining the the file information and access
> 3. multi-threading the compression and file writing
>
> A little background
> A part of the development process of where I work they regularly Jar the content
> of the working projects as part of the distribution to remote systems. This is a
> large and complex code base of 6 million LOC and growing. The Jar file ends up
> compressed to approx 100Mb, Uncompressed the jar size is approx 245mb, about 4-5
> times the size of rt.jar.
>
> I was looking at ways to improve the performance as this activity occurs several
> times a day for dozens of developers
>
> In essence when compressing a new jar file the jar utility is single threaded
> and staged. Forgive me if this is an oversimplification
>
> first  it works out all of the files that are specified, buffering the file
> names, (IO bound)
> then it iterates through the files, and for each file, it load  the file
> information, and then the file content sending it to a JarOutputStream, (CPU
> bound or IO bound depending on the IO speed)
>
> The JarOutputStream has a compression of 0 (just store) or 5 (the default), and
> the jar writing is single threaded by the design of the JarOutputStream
>
> The process of creation of a Jar took about 20 seconds in windows with the help
> of an SSD, and considerable longer without one, and was CPU bound to one CPU
> core
>
> ----
> The changes that I made were
> 1. Allow deferent compression levels (for us a compression level of 1 increases
> the file size of the Jar to 110 Mb but reduces the CPU load in compression to
> approx 30% of what it was (rough estimate)
> 2. pipelining the file access
> 2.1    one thread is started for each file root (-C on the Jar command line),
> which scans for files and places the file information into a blocking queue(Q1),
> which I set to abretrary size of 200 items
> 2.2    one thread pool of 10 threads reads the file information from the queue
> (Q1) and buffers the file content to a specified size (again I specified an
> arbetrary size limit of 25K for a file, and places the buffered content into a
> queue(q2) (again arbetrary size of 10 items
> 2.3    one thread takes the filecontent from Q2 and compresses it or checksums
> it and adds it the the  JarOutputStream. This process is single threaded due to
> the design of the JarOutputStream
>
> some other minor performance gain occurred by increasing the buffer on the
> output stream to reduce the IO load
>
> The end result is that the process takes about approx 5 seconds in the same
> configuration
>
> The above is in use in production configuration for a few months now
>
> As a home project I have completed some enhancements to the JarOutputStream, and
> produced a JarWriter that allows multiple threads to work concurrently deflating
> or calculating checksums, which seems to test OK for the test cases that Ihave
> generated,and successfully loads my quad core home dev machine on all cores.
> Each thread allocates a buffer, and the thread compresses a files into the
> buffer, only blocking other threads whenthe buffer is written to the output
> (which is after the compression is complete, unless the file is too large to
> compress
>
> This JarWriter is not API compatable with the JarOutputStream, it is not a
> stream. It allows the programmer to write a record based of the file information
> and an input stream, and is threadsafe. It is not a drop in replacement for
> JarOutputStream
> I am not an expert in the ZIp file format, but much of the code from
> ZipOutputStream is unchanged, just restructured
> ---
> I did think that there is some scope for improvement, that I have not looked at
> a. thresholding of file size for compression (very small files dont compress
> well
> b. some file types dont compress well (e.g. png, jpeg) as they have been
> compressed already)
> c. using NIO to parallelise the loading of the file information or content
> d. some pre-charging of the deflator dictionary (e.g. a class file contains the
> strings of the class name and packages), but this would make the format
> incompatable with zip, and require changes to the JVM to be useful, and is a
> long way from my comform zone, or skill set. This would reduce the file size
>
> --
> What is the view of the readers. Is this something, or at least some parts of
> this that could be incorperated into Java 7 or is this too late on the dev cycle
>
> regards
>
> Mike