proposal to optimise the performance of the Jar utility

Thu Apr 14 08:54:22 UTC 2011

Hi Sherman,
I agree that the update case is different, but always thought that your new zip 
file system would be a better way to do this rather than unpacking and repacking 
a jar, which is what happens not isnt it? Forgive me if I am wrong but I never 
looked at the code in detail for update

I will check up on the timing that I get, but it will be after the weekend. It 
may be that they were skewed by the overhead of the profiler etc. I was running 
with full instrumentation for those timings
Just to be clear I was saying that the CPU time taken was by the hashcode was 
greater than the CPU time taken in deflate. I am attempting to eliminate any 
real IO time for the timings

I have also started on pipelining the 'expand' and 'create' code sections so 
that the expand time can be eliminated, if there is no IO contention, and 
another core available.
In this case there is no list stored, just a queue of files to be added

For the cf0 case the quick win is not scanning the file twice (once for Adler32 
and once for content), which in my tests takes a couple of seconds off the 
create time

regards

Mike

________________________________
From: Xueming Shen <xueming.shen at oracle.com>
To: mike.skells at talk21.com
Cc: core-libs-dev Libs <core-libs-dev at openjdk.java.net>
Sent: Thursday, 14 April, 2011 7:12:26
Subject: Re: proposal to optimise the performance of the Jar utility

 On 4/13/2011 4:55 PM, mike.skells at talk21.com wrote: 
Hi Mike,
>apart from compression 0 it doesn't make a lot of           difference as far as 
>I see it. As I posted earlier, converting           the file name to lower case 
>seems (according the the NB           profiler) to take more time than the 
>compression
File.hashCode() is slow on Windows (yes, we have to go down to     
String.toLowerCase() to be "correct" and yes, its slow), we have
the comment in Win32FileSystem.hashCode() just for this:-)

    public int hashCode(File f) {
        /* Could make this more efficient: String.hashCodeIgnoreCase     */
        return f.getPath().toLowerCase(Locale.ENGLISH).hashCode() ^     1234321;
    }

I tried to use ArrayList for the "entries" (instead of     LinkedHashSet), as 
showed below
(yes, it's total fine to use List for "creating", need a little more     work 
for "updating", but
let's put "updating" aside for now)

    // All files need to be added/updated.
    //Set<File> entries = new LinkedHashSet<File>();
    List<File> entries = new ArrayList<File>();

and run the cf0 on rtjar, which has the rt.jar extracted (time     measured by 
System.currentTimeMillis()

[List]
$ ../jdk1.7.0/bin/java Main cf0 foo.jar rtjar
expand=406, create=10703
expand=406, create=10797
expand=407, create=11046

[Set]
$ ../jdk1.7.0/bin/java Main cf0 foo.jar rtjar
expand=469, create=10703
expand=469, create=10688
expand=469, create=10656

The "List" version is indeed about 15% faster than the "Set"     version, in 
"expanding", but
"expanding" is only about 5% of the over all "creating" time (I've     not tried 
to measure
yet, but I will not be surprised to find that we probably spend more     time on 
i/o than deflating)

So it is nice to have, but does not appear to bring us some thing     
"significant".

-Sherman

>
>Bearing in mind that I have tweaked my system so that I am           accessing 
>from cache only, so the elapsed time is pretty much           the single core 
>time, here are the results
>
>
>the result I have re-run today (hence the delay)
>in csv format
>---
>,cf,cf0,cf1,cf2,cf3,cf4,cf5,cf6,cf7,cf8,cf9
>1.6.0_24 (1.6 vm),10255,9751,,,,,,,,,
>java 1.7 release candinate,10498,9663,,,,,,,,,
>orig ,10481,9707,,,,,,,,,
>buffer CRC32             
>data,,7398,9490,9641,9565,10038,10142,10366,10310,10536,10440
>---
>this run shows a 25% improvement in cf0 times (9703 ms vs           7398 wthen 
>the tweaks are applied)
>cf1 (compressed level 1) takes 9490 and cf9 takes 10440, so           not a huge 
>difference
>
>
>
>
>Regards
>
>
>Mike
>
>
>
>
>
>
>
________________________________
From: Mike Duigou <mike.duigou at oracle.com>
>To: mike.skells at talk21.com
>Cc: Xueming               Shen <xueming.shen at oracle.com>; core-libs-dev Libs 
><core-libs-dev at openjdk.java.net>
>Sent: Wednesday, 13 April, 2011 23:00:26
>Subject: Re: proposal to optimise the performance of the Jar               
>utility
>
>Mike, can you share the results of performance testing at             various 
>compression levels? Is there much difference between             the levels or 
>an apparent "sweet spot"?
>
>For low hanging fruit for jdk 7 it might be worth             considering 
>raising the default compression level from 5 to             6 (the zlib 
>default). Raising the level from 5 to 6 entails             (by today's 
>standards) very modest increases in the amount             of memory and effort 
>used (16Kb additional buffer space for             compression). In general zlib 
>reflects size choices that are             almost 20 years old and it may be of 
>no measurable benefit             to be using the lower compression levels.
>
>Mike (also) 
>
>
>On Apr 12 2011, at 17:48 , mike.skells at talk21.com wrote:
>
>> Hi Sherman,
>> I have had a quick look at the current code to see what             'low 
>>hanging fruit' 
>>
>> there is. I appreciate that parallelizing the command             in its 
>>entirity may not be 
>>
>> feasible for the first release
>> 
>> The tests that I have run are jarring the content of             the 1.7 rt.jar 
>>with varying 
>>
>> compression levels. Each jar is run as an child process             6 times and 
>>the average 
>>
>> of the last 5 is taken. Tests are pretty much CPU bound             on a single 
>>core
>> 
>> 1. The performance of the cf0 (create file with no             compression) 
>>path can be 
>>
>> improved for the general case if the file is buffered.  
>> In my test env (windows 7 64bit) this equates to a 17%             time 
>>performance 
>>
>> improvement in my tests. In the existing code the file             is read 
>>twice, once to 
>>
>> calc the CRC and once to write the file to the Jar             file. This 
>>change would 
>>
>> buffer a single small file at a time (size < 100K)
>> 
>> 2. It is also a trivial fix to allow different             compression levels, 
>>rather than 
>>
>> stored and default
>> 
>> After that it is harder to gain performance             improvements without 
>>structural 
>>
>> change or more discussion
>> 
>> 3. The largest saving after that is in the management             of the 
>>'entries' Set, as 
>>
>> the hashcode of the File is expensive (this may not             apply to other 
>>filesystems). 
>>
>> the management of this map seems to account for more             cpu than 
>>Deflator!
>> I cannot see the reason for this data being collected.             I am 
>>probably missing the 
>>
>> obvious ...
>> 
>> 4. After that there is just the parallelisation of the             jar utility 
>>and the 
>>
>> underlying stream
>> 
>> What is the best way to proceed
>> 
>> regards
>> 
>> Mike
>> 
>> 
>> 
>> ________________________________
>> From: Xueming Shen <xueming.shen at oracle.com>
>> To: core-libs-dev at openjdk.java.net
>> Sent: Wednesday, 6 April, 2011 19:04:25
>> Subject: Re: proposal to optimise the performance of             the Jar 
>>utility
>> 
>> Hi Mike,
>> 
>> We are in the final stage of the JDK7 development, work             like this 
>>is 
>>
>> unlikely to get in the
>> last minute. I have filed a CR/RFE to trace this issue,             we can use 
>>this 
>>
>> CR as the start
>> point for the discussion and target for some jdk7             update releases 
>>or JDK8.
>> 
>> 7034403: proposal to optimise the performance of the             Jar utility
>> 
>> Thanks,
>> Sherman
>> 
>> 
>> On 04/05/2011 04:42 PM, mike.skells at talk21.com wrote:
>>> Hi,
>>> Not sure if this is too late for Java 7 but I have             made some 
>>>optimisations for 
>>>
>>> a
>>> client to improve the performance of the jar             utility in their 
>>>environment, and
>>> would like to promite then into the main code base
>>> 
>>> The optimisations that I have performed are
>>> 
>>> 1. Allowing the Jar utility to have other             compression levels 
>>>(currently it
>>> allows default (5) only)
>>> 2. Multi-threading, and pipelining the the file             information and 
>>>access
>>> 3. multi-threading the compression and file writing
>>> 
>>> A little background
>>> A part of the development process of where I work             they regularly 
>>>Jar the 
>>>
>>> content
>>> of the working projects as part of the distribution             to remote 
>>>systems. This is 
>>>
>>> a
>>> large and complex code base of 6 million LOC and             growing. The Jar 
>>>file ends up
>>> compressed to approx 100Mb, Uncompressed the jar             size is approx 
>>>245mb, about 
>>>
>>> 4-5
>>> times the size of rt.jar.
>>> 
>>> I was looking at ways to improve the performance as             this activity 
>>>occurs 
>>>
>>> several
>>> times a day for dozens of developers
>>> 
>>> In essence when compressing a new jar file the jar             utility is 
>>>single threaded
>>> and staged. Forgive me if this is an             oversimplification
>>> 
>>> first  it works out all of the files that are             specified, buffering 
>>>the file
>>> names, (IO bound)
>>> then it iterates through the files, and for each             file, it load  the 
>>>file
>>> information, and then the file content sending it             to a 
>>>JarOutputStream, (CPU
>>> bound or IO bound depending on the IO speed)
>>> 
>>> The JarOutputStream has a compression of 0 (just             store) or 5 (the 
>>>default), 
>>>
>> and
>>> the jar writing is single threaded by the design of             the 
>>>JarOutputStream
>>> 
>>> The process of creation of a Jar took about 20             seconds in windows 
>>>with the 
>>>
>> help
>>> of an SSD, and considerable longer without one, and             was CPU bound 
>>>to one CPU
>>> core
>>> 
>>> ----
>>> The changes that I made were
>>> 1. Allow deferent compression levels (for us a             compression level of 
>>>1 
>>>
>> increases
>>> the file size of the Jar to 110 Mb but reduces the             CPU load in 
>>>compression to
>>> approx 30% of what it was (rough estimate)
>>> 2. pipelining the file access
>>> 2.1    one thread is started for each file root (-C             on the Jar 
>>>command line),
>>> which scans for files and places the file             information into a 
>>>blocking 
>>>
>>> queue(Q1),
>>> which I set to abretrary size of 200 items
>>> 2.2    one thread pool of 10 threads reads the file             information 
>>>from the queue
>>> (Q1) and buffers the file content to a specified             size (again I 
>>>specified an
>>> arbetrary size limit of 25K for a file, and places             the buffered 
>>>content into a
>>> queue(q2) (again arbetrary size of 10 items
>>> 2.3    one thread takes the filecontent from Q2 and             compresses it 
>>>or checksums
>>> it and adds it the the  JarOutputStream. This             process is single 
>>>threaded due 
>>>
>> to
>>> the design of the JarOutputStream
>>> 
>>> some other minor performance gain occurred by             increasing the buffer 
>>>on the
>>> output stream to reduce the IO load
>>> 
>>> The end result is that the process takes about             approx 5 seconds in 
>>>the same
>>> configuration
>>> 
>>> The above is in use in production configuration for             a few months 
>>>now
>>> 
>>> As a home project I have completed some             enhancements to the 
>>>JarOutputStream, 
>>>
>>> and
>>> produced a JarWriter that allows multiple threads             to work 
>>>concurrently 
>>>
>>> deflating
>>> or calculating checksums, which seems to test OK             for the test cases 
>>>that Ihave
>>> generated,and successfully loads my quad core home             dev machine on 
>>>all cores.
>>> Each thread allocates a buffer, and the thread             compresses a files 
>>>into the
>>> buffer, only blocking other threads whenthe buffer             is written to 
>>>the output
>>> (which is after the compression is complete, unless             the file is too 
>>>large to
>>> compress
>>> 
>>> This JarWriter is not API compatable with the             JarOutputStream, it 
>>>is not a
>>> stream. It allows the programmer to write a record             based of the 
>>>file 
>>>
>>> information
>>> and an input stream, and is threadsafe. It is not a             drop in 
>>>replacement for
>>> JarOutputStream
>>> I am not an expert in the ZIp file format, but much             of the code 
>>>from
>>> ZipOutputStream is unchanged, just restructured
>>> ---
>>> I did think that there is some scope for             improvement, that I have 
>>>not looked 
>>>
>> at
>>> a. thresholding of file size for compression (very             small files dont 
>>>compress
>>> well
>>> b. some file types dont compress well (e.g. png,             jpeg) as they have 
>>>been
>>> compressed already)
>>> c. using NIO to parallelise the loading of the file             information or 
>>>content
>>> d. some pre-charging of the deflator dictionary             (e.g. a class file 
>>>contains 
>>>
>> the
>>> strings of the class name and packages), but this             would make the 
>>>format
>>> incompatable with zip, and require changes to the             JVM to be useful, 
>>>and is a
>>> long way from my comform zone, or skill set. This             would reduce the 
>>>file size
>>> 
>>> --
>>> What is the view of the readers. Is this something,             or at least 
>>>some parts of
>>> this that could be incorperated into Java 7 or is             this too late on 
>>>the dev 
>>>
>>> cycle
>>> 
>>> regards
>>> 
>>> Mike
>
>