Miscellaneous improvements to "jar".

Martin Buchholz martinrb at google.com
Fri Jun 26 17:11:28 UTC 2009


On Fri, Jun 26, 2009 at 09:31, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:

>  Martin, thanks for taking the time.
>
> Am 26.06.2009 15:53, Martin Buchholz schrieb:
>
>
>
> On Fri, Jun 26, 2009 at 01:37, Ulf Zibis <Ulf.Zibis at gmx.de> wrote:
>
>> 1. Hopefully some volunteer would be found to fix
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6818737
>> before JDK7 API-freeze.
>> Especially, if jar is not compressed, as in case of normal JDK
>> installation, reading entries from jar should be much faster through
>> java.nio.channels, than via BuffererdInputStream.
>
>
> The way to motivate us around here
> is to provide the prototype implementation that
> demonstrates the speedup.
>
>
> Sorry, I'm not the specialist on how to provide NIO buffers from native
> memory, and first, I will finish my work on charsets.
>
> Motivation:
> Xueming states:
> *"dat" based uses less disk space, but it has larger startup time, reading
> an additional "big" dat file during class loading/initializing actually
> takes much longer time than I expected (I hit the extreme when I worked on
> the EUC_TW, which I make the size only 30% of the existing one, but startup
> is a disaster regression, ...
> *
>

I'm surprised.  I would expect startup to actually be faster.
I assume we're only reading the bytes that are necessary


>
> If loading x bytes from dat file via getResourceAsStream() takes much
> longer time than loading x+30% bytes from class file, processing the UTF-8
> conversion, instantiating and initializing additional Class objects, I
> imperatively presume, that there must be a big chance for significantly
> improving read speed from uncompressed jar file (here charsets.jar), by
> using direct channels or how ever. I presume, enhancing reading from jar
> files would be a big fish in performance gain for the whole JDK, as it is
> very common task in JVM's daily work.
>
>
>
>
>>
>>
>>
>>> While benchmarking, I discovered to my horror that the simple
>>>
>>> jar cf /tmp/t1 ...
>>> jar i /tmp/t1
>>>
>>> fails, because it tries to create the replacement jar in "." and then
>>> rename() it into place.  Oops... Better refactor out the code that
>>> puts the replacement temp file in the same directory.
>>> Better write some tests for that, too.
>>>
>>
>>  2. I don't like to refactor out the code in case of only once used, and
>> only to better "comment" what the 2 lines are doing.
>> It blows up the code, and following the code demands annoying scrolling.
>> Better add additional comment to original code.
>>
>
> The original code created temp files in *two* places,
> and did it differently.
>
>
> Oops, at my first search on your code I only found *one* usage of
> createTempFileInSameDirectoryAs(). Did you add the 2nd later?
> But there is only one usage of directoryOf(). Shouldn't you inline this?
>

This is modern software engineering.
We are all encouraged to write many small methods.



>
>   I think the name
> createTempFileInSameDirectoryAs
> makes the current code much clearer.
>
>
> Yes, this is pretty clear, but the cost is 19 lines against 2+2 plus
> demanding the reader for annoying scrolling.
> Thinking about directoryOf() I guess, following this strategy you would
> find ten's of locations in Main.java where you could refactor out code into
> small well self-explaining methods, but wouldn't this end up in a mess of
> unreadable blown-up code?
>

Find suitable abstractions and refactor them into a separate piece of code.

The win is a lot bigger if you make the new abstractions
public supported parts of the API,
but that is harder with the JDK.


>    Also, JITs tend to be very good at inlining.
>
>
> (... after some loops), yes, I know
>
>
>
>>
>> 3. What happens, if original file is exactly named "jartmp"
>> I think you would better add ".tmp" at the end of the filename, and remove
>> it later.
>> Does your new code work with? :
>> jar cf /jartmp/t1 ...
>> jar i /jartmp/t1
>>
>
> File.createTempFile doesn't literally create a file named jartmp.
> That's only the prefix.  And it promises to return
> a freshly created empty file.
>
>
> Now I understand deeper. I just wondered why in fact just renaming "tmp" to
> "jartmp" would resolve your bug. I didn't recognize the 2nd location, where
> wrong "." was used for dir name.
>

The renaming jar -> jartmp is not significant.

Martin


>
> -Ulf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/core-libs-dev/attachments/20090626/b8e2c17e/attachment.html>


More information about the core-libs-dev mailing list