RFR: Faster ZipFile.getEntry()/entries()

Xueming Shen xueming.shen at oracle.com
Wed May 21 21:55:28 UTC 2014


On 05/21/2014 02:31 PM, Bernd Eckenfels wrote:
> Am Wed, 21 May 2014 14:19:13 -0700
> schrieb Xueming Shen<xueming.shen at oracle.com>:
>
>> And java implementation also brings in the benefits of better memory
>> usage (all memory allocated in java heap), no more expensive jni
>> invocations...
>>
>> Opinion/comments are appreciated.
> I had ZIP native code related crashes in the past. I suspected they all
> have been due to memory pressure (malloc returning null). But I

Most cases reported is due to the use scenario that a zip/jar file is being updated by someone
else when it is still being used + mmap...for performance reason the implementation only
does the reasonable sanity check at the beginning of reading in the central directory then it
operations on "assumption" that these "data" are correct, so if the content is being changed
later, the access may crash the vm, instead of having some exception throwing at somewhere
at certain point. Currently we have a system property "sun.zip.disableMemoryMapping" to
disable the mmap usage to workaround such scenario. The mmap was "useful" to share the
jar/file content (among jvms) back to the old day that we actually mmap in the whole jar file,
but we no longer do that, only the central directory is being mmap-ed now....


> expected that to be fixed meanwhile? I mean its not impossible to have
> robust C code, or?
>
> Anyway, having said that - is there a performance comparision? What was
> the reason for that native part in the first place?
>

We will still have to keep the native, even with a Java implementation. The jvm needs a
native version to access the "jar" files to start with (all classes are stored in jar/zip format).
So I believe the idea back then is to share that native implementation.

I have a small micro-benchmark test case at
http://cr.openjdk.java.net/~sherman/zipfile_jj/TestZipFile.java

to do some measurement, it appears the Java version is slightly slower for the central
directory initialization, but much faster to iterate the entries.

-Sherman




More information about the core-libs-dev mailing list