proposal to optimise the performance of the Jar utility

Thu Apr 14 06:12:26 UTC 2011

  On 4/13/2011 4:55 PM, mike.skells at talk21.com wrote:
> Hi Mike,
> apart from compression 0 it doesn't make a lot of difference as far as 
> I see it. As I posted earlier, converting the file name to lower case 
> seems (according the the NB profiler) to take more time than the 
> compression

File.hashCode() is slow on Windows (yes, we have to go down to 
String.toLowerCase() to be "correct" and yes, its slow), we have
the comment in Win32FileSystem.hashCode() just for this:-)

     public int hashCode(File f) {
         /* Could make this more efficient: String.hashCodeIgnoreCase */
         return f.getPath().toLowerCase(Locale.ENGLISH).hashCode() ^ 
1234321;
     }

I tried to use ArrayList for the "entries" (instead of LinkedHashSet), 
as showed below
(yes, it's total fine to use List for "creating", need a little more 
work for "updating", but
let's put "updating" aside for now)

     // All files need to be added/updated.
     //Set<File> entries = new LinkedHashSet<File>();
     List<File> entries = new ArrayList<File>();

and run the cf0 on rtjar, which has the rt.jar extracted (time measured 
by System.currentTimeMillis()

[List]
$ ../jdk1.7.0/bin/java Main cf0 foo.jar rtjar
expand=406, create=10703
expand=406, create=10797
expand=407, create=11046

[Set]
$ ../jdk1.7.0/bin/java Main cf0 foo.jar rtjar
expand=469, create=10703
expand=469, create=10688
expand=469, create=10656

The "List" version is indeed about 15% faster than the "Set" version, in 
"expanding", but
"expanding" is only about 5% of the over all "creating" time (I've not 
tried to measure
yet, but I will not be surprised to find that we probably spend more 
time on i/o than deflating)

So it is nice to have, but does not appear to bring us some thing 
"significant".

-Sherman

>
> Bearing in mind that I have tweaked my system so that I am accessing 
> from cache only, so the elapsed time is pretty much the single core 
> time, here are the results
>
> the result I have re-run today (hence the delay)
> in csv format
> ---
> ,cf,cf0,cf1,cf2,cf3,cf4,cf5,cf6,cf7,cf8,cf9
> 1.6.0_24 (1.6 vm),10255,9751,,,,,,,,,
> java 1.7 release candinate,10498,9663,,,,,,,,,
> orig ,10481,9707,,,,,,,,,
> buffer CRC32 data,,7398,9490,9641,9565,10038,10142,10366,10310,10536,10440
> ---
> this run shows a 25% improvement in cf0 times (9703 ms vs 7398 wthen 
> the tweaks are applied)
> cf1 (compressed level 1) takes 9490 and cf9 takes 10440, so not a huge 
> difference
>
>
> Regards
>
> Mike
>
>
>
> ------------------------------------------------------------------------
> *From:* Mike Duigou <mike.duigou at oracle.com>
> *To:* mike.skells at talk21.com
> *Cc:* Xueming Shen <xueming.shen at oracle.com>; core-libs-dev Libs 
> <core-libs-dev at openjdk.java.net>
> *Sent:* Wednesday, 13 April, 2011 23:00:26
> *Subject:* Re: proposal to optimise the performance of the Jar utility
>
> Mike, can you share the results of performance testing at various 
> compression levels? Is there much difference between the levels or an 
> apparent "sweet spot"?
>
> For low hanging fruit for jdk 7 it might be worth considering raising 
> the default compression level from 5 to 6 (the zlib default). Raising 
> the level from 5 to 6 entails (by today's standards) very modest 
> increases in the amount of memory and effort used (16Kb additional 
> buffer space for compression). In general zlib reflects size choices 
> that are almost 20 years old and it may be of no measurable benefit to 
> be using the lower compression levels.
>
> Mike (also)
>
>
> On Apr 12 2011, at 17:48 , mike.skells at talk21.com 
> <mailto:mike.skells at talk21.com> wrote:
>
> > Hi Sherman,
> > I have had a quick look at the current code to see what 'low hanging 
> fruit'
> > there is. I appreciate that parallelizing the command in its 
> entirity may not be
> > feasible for the first release
> >
> > The tests that I have run are jarring the content of the 1.7 rt.jar 
> with varying
> > compression levels. Each jar is run as an child process 6 times and 
> the average
> > of the last 5 is taken. Tests are pretty much CPU bound on a single core
> >
> > 1. The performance of the cf0 (create file with no compression) path 
> can be
> > improved for the general case if the file is buffered.
> > In my test env (windows 7 64bit) this equates to a 17% time performance
> > improvement in my tests. In the existing code the file is read 
> twice, once to
> > calc the CRC and once to write the file to the Jar file. This change 
> would
> > buffer a single small file at a time (size < 100K)
> >
> > 2. It is also a trivial fix to allow different compression levels, 
> rather than
> > stored and default
> >
> > After that it is harder to gain performance improvements without 
> structural
> > change or more discussion
> >
> > 3. The largest saving after that is in the management of the 
> 'entries' Set, as
> > the hashcode of the File is expensive (this may not apply to other 
> filesystems).
> > the management of this map seems to account for more cpu than Deflator!
> > I cannot see the reason for this data being collected. I am probably 
> missing the
> > obvious ...
> >
> > 4. After that there is just the parallelisation of the jar utility 
> and the
> > underlying stream
> >
> > What is the best way to proceed
> >
> > regards
> >
> > Mike
> >
> >
> >
> > ________________________________
> > From: Xueming Shen <xueming.shen at oracle.com 
> <mailto:xueming.shen at oracle.com>>
> > To: core-libs-dev at openjdk.java.net 
> <mailto:core-libs-dev at openjdk.java.net>
> > Sent: Wednesday, 6 April, 2011 19:04:25
> > Subject: Re: proposal to optimise the performance of the Jar utility
> >
> > Hi Mike,
> >
> > We are in the final stage of the JDK7 development, work like this is
> > unlikely to get in the
> > last minute. I have filed a CR/RFE to trace this issue, we can use this
> > CR as the start
> > point for the discussion and target for some jdk7 update releases or 
> JDK8.
> >
> > 7034403: proposal to optimise the performance of the Jar utility
> >
> > Thanks,
> > Sherman
> >
> >
> > On 04/05/2011 04:42 PM, mike.skells at talk21.com 
> <mailto:mike.skells at talk21.com> wrote:
> >> Hi,
> >> Not sure if this is too late for Java 7 but I have made some 
> optimisations for
> >> a
> >> client to improve the performance of the jar utility in their 
> environment, and
> >> would like to promite then into the main code base
> >>
> >> The optimisations that I have performed are
> >>
> >> 1. Allowing the Jar utility to have other compression levels 
> (currently it
> >> allows default (5) only)
> >> 2. Multi-threading, and pipelining the the file information and access
> >> 3. multi-threading the compression and file writing
> >>
> >> A little background
> >> A part of the development process of where I work they regularly 
> Jar the
> >> content
> >> of the working projects as part of the distribution to remote 
> systems. This is
> >> a
> >> large and complex code base of 6 million LOC and growing. The Jar 
> file ends up
> >> compressed to approx 100Mb, Uncompressed the jar size is approx 
> 245mb, about
> >> 4-5
> >> times the size of rt.jar.
> >>
> >> I was looking at ways to improve the performance as this activity 
> occurs
> >> several
> >> times a day for dozens of developers
> >>
> >> In essence when compressing a new jar file the jar utility is 
> single threaded
> >> and staged. Forgive me if this is an oversimplification
> >>
> >> first  it works out all of the files that are specified, buffering 
> the file
> >> names, (IO bound)
> >> then it iterates through the files, and for each file, it load  the 
> file
> >> information, and then the file content sending it to a 
> JarOutputStream, (CPU
> >> bound or IO bound depending on the IO speed)
> >>
> >> The JarOutputStream has a compression of 0 (just store) or 5 (the 
> default),
> > and
> >> the jar writing is single threaded by the design of the JarOutputStream
> >>
> >> The process of creation of a Jar took about 20 seconds in windows 
> with the
> > help
> >> of an SSD, and considerable longer without one, and was CPU bound 
> to one CPU
> >> core
> >>
> >> ----
> >> The changes that I made were
> >> 1. Allow deferent compression levels (for us a compression level of 1
> > increases
> >> the file size of the Jar to 110 Mb but reduces the CPU load in 
> compression to
> >> approx 30% of what it was (rough estimate)
> >> 2. pipelining the file access
> >> 2.1    one thread is started for each file root (-C on the Jar 
> command line),
> >> which scans for files and places the file information into a blocking
> >> queue(Q1),
> >> which I set to abretrary size of 200 items
> >> 2.2    one thread pool of 10 threads reads the file information 
> from the queue
> >> (Q1) and buffers the file content to a specified size (again I 
> specified an
> >> arbetrary size limit of 25K for a file, and places the buffered 
> content into a
> >> queue(q2) (again arbetrary size of 10 items
> >> 2.3    one thread takes the filecontent from Q2 and compresses it 
> or checksums
> >> it and adds it the the  JarOutputStream. This process is single 
> threaded due
> > to
> >> the design of the JarOutputStream
> >>
> >> some other minor performance gain occurred by increasing the buffer 
> on the
> >> output stream to reduce the IO load
> >>
> >> The end result is that the process takes about approx 5 seconds in 
> the same
> >> configuration
> >>
> >> The above is in use in production configuration for a few months now
> >>
> >> As a home project I have completed some enhancements to the 
> JarOutputStream,
> >> and
> >> produced a JarWriter that allows multiple threads to work concurrently
> >> deflating
> >> or calculating checksums, which seems to test OK for the test cases 
> that Ihave
> >> generated,and successfully loads my quad core home dev machine on 
> all cores.
> >> Each thread allocates a buffer, and the thread compresses a files 
> into the
> >> buffer, only blocking other threads whenthe buffer is written to 
> the output
> >> (which is after the compression is complete, unless the file is too 
> large to
> >> compress
> >>
> >> This JarWriter is not API compatable with the JarOutputStream, it 
> is not a
> >> stream. It allows the programmer to write a record based of the file
> >> information
> >> and an input stream, and is threadsafe. It is not a drop in 
> replacement for
> >> JarOutputStream
> >> I am not an expert in the ZIp file format, but much of the code from
> >> ZipOutputStream is unchanged, just restructured
> >> ---
> >> I did think that there is some scope for improvement, that I have 
> not looked
> > at
> >> a. thresholding of file size for compression (very small files dont 
> compress
> >> well
> >> b. some file types dont compress well (e.g. png, jpeg) as they have 
> been
> >> compressed already)
> >> c. using NIO to parallelise the loading of the file information or 
> content
> >> d. some pre-charging of the deflator dictionary (e.g. a class file 
> contains
> > the
> >> strings of the class name and packages), but this would make the format
> >> incompatable with zip, and require changes to the JVM to be useful, 
> and is a
> >> long way from my comform zone, or skill set. This would reduce the 
> file size
> >>
> >> --
> >> What is the view of the readers. Is this something, or at least 
> some parts of
> >> this that could be incorperated into Java 7 or is this too late on 
> the dev
> >> cycle
> >>
> >> regards
> >>
> >> Mike
>