URLClassPath does unnecessary linear search through every jar and jar entry to find resource

Sat Sep 19 08:04:12 UTC 2015

Hi Peter,

Thanks for your response! And sorry for my late reply
I was wondering why zip_util was using the hash of the name... missed
the line "idx = zip->table[hsh % zip->tablelen];" before the loop.
My bad, I should have looked closer into it and realized it was a hash
before claiming it was doing a linear search for every entry

I've not used jmh before, but believe I set it up properly and ran the
benchmark with/without your patch.
I copied over rt.jar (~62MB) and replaced idea.jar with the next
largest jar file I could find (~18MB); I'm assuming the specific
contents don't matter
I get similar results to yours:

Original
Benchmark                  (_zipTuple)  Mode  Cnt     Score     Error  Units
ZipFileBench.getEntry    rt.jar:rt.jar  avgt    8  1153.436 ± 261.162  ns/op
ZipFileBench.getEntry  rt.jar:idea.jar  avgt    8   515.975 ±  14.451  ns/op
ZipFileBench.getEntry  idea.jar:rt.jar  avgt    8   486.993 ±  25.501  ns/op

Patched
Benchmark                  (_zipTuple)  Mode  Cnt     Score    Error  Units
ZipFileBench.getEntry    rt.jar:rt.jar  avgt    8  1162.392 ± 18.514  ns/op
ZipFileBench.getEntry  rt.jar:idea.jar  avgt    8   360.032 ±  9.854  ns/op
ZipFileBench.getEntry  idea.jar:rt.jar  avgt    8   320.842 ± 11.421  ns/op

---
In my situation, I notice about 30-40ms improvements in classloading
time, measured using sun.misc.PerfCounter.getFindClassTime() (part of
the vanilla JVM, didn't have to add this)

Regarding the cache, since the second level is nolonger needed, I
removed the cache resource name -> jar entry
The performance is about the same as when I had the second map (which
was redundant as JarFile#getEntry is ~constant time anyways)
There's about 200-250ms improvement in classloading/find class time

However, I find with different workloads the performance varies,
although it seems even in the worst case when every single jar on the
classpath is opened (and processed and contents added to the map
resource -> which jar/jarloader), it's only ~5-8ms slower than the
control (vanilla JVM)
I haven't found a case where the cache adds notable overhead, but I'll
try to get back soon with more data

Best regards,
Adrian

On Tue, Sep 15, 2015 at 11:31 AM, Peter Levart <peter.levart at gmail.com> wrote:
> Hi Adrian,
>
> I looked briefly at your code and claims and have some comments inline...
>
>
> On 09/13/2015 11:08 AM, Adrian wrote:
>>
>> Hi,
>>
>> I posted about this earlier (few weeks ago), and got some responses
>> about concerns which I addressed in my last email, though I didn't
>> hear back about it.
>> My apologies if I shouldn't be sending this; I'm not sure what the
>> protocol is about this stuff
>>
>> Classloading on a standard Java application with jars on the classpath
>> currently does a linear search through every jar on the classpath, and
>> every entry in a jar, for every class loaded.
>> As URLClassPath searches for an entry/resource/class, it's possible to
>> cache each entry it encounters -> where to find it, so in the future
>> if a resource has already been seen we don't need to repeat the ~2d
>> search
>>
>> Original thread (august list):
>>
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/035009.html
>> Last message (september list):
>>
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-September/035016.html
>>
>> I got 3 responses: 2 concerning changes to jars at runtime
>> (invalidating cache), and 1 saying you're not supposed to modify jars
>> at runtime (can confirm via source code, and manual testing - it
>> crashes the JVM)
>>
>> In my last message I addressed
>> - jars being modified (which you're not supposed to do; the current
>> classloader does not handle this) or the classpath changing (only
>> possible if you make public fields/methods via reflection, and this
>> could easily be handled gracefully)
>> - some details of the finding resource process (e.g. the meta index,
>> when the cache for jar entries can't be used because of the semantics
>> of other loaders/types of entries on the classpath)
>> - a reference implementation of caching that I believe is simple and
>> compliant with existing functionality
>> - some basic numbers on performance
>>
>> ---
>> So in this email I wanted to explain the problem again, hopefully more
>> clearly now
>>
>> URLClassPath is used by URLClassLoader to find classes, though it
>> could be used for finding any resource on a classpath.
>> URLClassPath keeps an array of URLs, which are typically folders or
>> jars on the local filesystem.
>> They can be http or ftp or other files, but that's not
>> relevant/doesn't affect this problem
>>
>> To find a resource/class (URLClassPath#getResource), it:
>> 1. Loops through the URLs in order
>> 2. Creates Loader objects for each URL lazily (URLClassPath$FileLoader
>> or URLClassPath$JarLoader). So if the Loader for the first URL finds
>> all the resources, Loaders for the remaining entries on the classpath
>> are never created/looked at
>> 3. Calls Loader#getResource and returns the resource if found
>> (otherwise keep searching)
>>
>> URLClassPath$JarLoader creates its corresponding JarFile either in the
>> constructor or in getResource (depending on the meta index - the
>> details I explained in my last email I won't repeat)
>> When a JarFile is created, it opens the jar file on the file system,
>> reads the central directory of the jar/zip file, and creates an
>> internal linked list with all its entries
>
>
> It's not actually a linked list, but a hash table. See
> jdk/src/java.base/share/native/libzip/zip_util.[c,h] ...
>
> the jzfile.table is an array of jint mapping (entry-name-hash-code %
> table.length) -> index into jzfile.entries
> the jzfile.entries is an array of jzcell which represent heads of hash
> buckets that are linked with jzcell.next
>
> So look-ups into individual jar/zip files should be O(1). But each look-up
> does cross the Java/JNI boundary, so it has a fixed JNI overhead too. If
> there are lots of jar files in class path, you pay for the unsuccessful
> hash-table look-ups before finding the resource. This overhead is not that
> big, but increases with the number of jar files in class path. Each look-up
> into class path is therefore O(N) where N = # of jar files...
>
>
>>
>> JarFile objects are immutable; you can only open them for read/delete
>> (see constructor API
>>
>> http://docs.oracle.com/javase/7/docs/api/java/util/jar/JarFile.html#JarFile(java.io.File,%20boolean,%20int)
>> ), they do not detect if the file has been modified externally, and
>> you only "append" or "delete" entries by creating a new jar
>> (JarOutputStream)
>>
>> When URLClassPath$JarLoader#getResource calls JarFile#getEntry; in the
>> C code it searches through the linked list
>> (jdk/src/share/native/java/util/zip/zip_util.c, ZIP_GetEntry - jar
>> files are just zip files, and the java JarFile object just extends
>> ZipFile)
>>
>> Since the order in which jar files and jar entries are searched is
>> invariant, we can create a map of resource -> first jar which contains
>> it
>>
>> However, we don't want to introduce additional overhead.
>> When a JarFile is created, it already builds the internal linked list
>> - it's O(number of entries)
>> I propose that after the JarLoader creates the JarFile, it iterates
>> through its entries and adds them to the cache (if the map does not
>> already contain the resource, i.e. an earlier jar contains the
>> resource)
>> This adds a small overhead when instantiating loaders - but creating
>> the JarLoader/JarFile is still technically O(number of entries), and
>> now getResource is constant time, instead of requiring a linear search
>> through every jar and the linked list of entries for each jar
>> (O(number of jars * entries/jar))
>
>
> Have you measured your entry iteration and cache initialization overhead?
> When iterating over JarEntries, the code invokes 10 native methods for each
> returned JarEntry:
>
>         long jzentry = getNextEntry(jzfile, i++)
>         getEntryFlag(jzentry);
>         getEntryBytes(jzentry, JZENTRY_NAME)
>         getEntryTime(jzentry)
>         getEntryCrc(jzentry)
>         getEntrySize(jzentry)
>         getEntryCSize(jzentry)
>         getEntryMethod(jzentry)
>         getEntryBytes(jzentry, JZENTRY_EXTRA)
>         getEntryBytes(jzentry, JZENTRY_COMMENT)
>
> ...it uses CharsetDecoder 1 or 2 times to decode the name and optional
> comment of each JarEntry, it allocates several java objects...
>
> So I have a feeling that initializing your cache when JarFiles are
> constructed, would increase start-up time and not decrease it. It may pay-of
> on the long run though. For achieving short start-up time, JDK tries to be
> as lazy as possible. Your cache goes against that.
>
> ZipFile native code tries hard to keep the memory usage down for maintaining
> the look-up hash table. It only keeps hash codes in the table, with offsets
> into memory mapped ZIP central directory for each entry. Your proposed
> java-side cache is also a hash table, but it's memory footprint is much
> bigger.
>
> I have made a little experiment to see if JNI overhead for negative answers
> from ZipFile.getEntry(name) is really that big. I modified
> ZipFile.java/ZipFile.c to expose access to a look-up hash table that is
> maintained in native code to the java side via two direct ByteBuffers. I
> screen each ZipFile.getEntry(name) with a probe that gives a negative answer
> for majority of negative lookups without invoking JNI methods. Here are the
> changes I made for this experiment:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/ZipFile.getEntry/webrev.01/
>
> I have tested the changed JDK with the following JMH benchmark:
>
> http://cr.openjdk.java.net/~plevart/jdk9-dev/ZipFile.getEntry/ZipFileBench.java
>
> This benchmark measures the ZipFile.getEntry(name) method on 2 jar files
> (rt.jar ~20K entries, idea.jar ~40K entries) looking up entries in 1st jar
> file with names of the entries taken from the 2nd jar file:
>
> Original:
>
> Benchmark                        (_zipTuple)  Mode  Samples Score    Error
> Units
> j.t.ZipFileBench.getEntry      rt.jar:rt.jar  avgt        8  721.536 ±
> 17.344  ns/op
> j.t.ZipFileBench.getEntry    rt.jar:idea.jar  avgt        8  501.451 ±
> 10.298  ns/op
> j.t.ZipFileBench.getEntry    idea.jar:rt.jar  avgt        8  423.513 ±
> 23.268  ns/op
>
> Patched:
>
> Benchmark                        (_zipTuple)  Mode  Samples Score     Error
> Units
> j.t.ZipFileBench.getEntry      rt.jar:rt.jar  avgt        8  743.281 ±
> 13.467  ns/op
> j.t.ZipFileBench.getEntry    rt.jar:idea.jar  avgt        8  392.569 ±
> 7.710  ns/op
> j.t.ZipFileBench.getEntry    idea.jar:rt.jar  avgt        8  333.249 ±
> 14.069  ns/op
>
>
> The rt.jar:rt.jar therefore gives the score for successful look-ups, while
> rt.jar:idea.jar and idea.jar:rt.jar give the score for unsuccessful
> look-ups.
>
> The screening of JNI call improves unsuccessful lookup by 20-25% while not
> hurting much the successful lookup. The successful look-up could be improved
> so that it wouldn't have any overhead by utilizing the result of the
> screening probe that must now be re-computed in native code once more (see
> TODO).
>
> So is this worth the trouble? I don't know.
>
> Adrian, does this patch improve your situation (don't forget to set the
> system property -Dsun.zip.useNativeTableProbe=true to enable the feature)?
> This patch should not hurt startup performance, as it does not maintain or
> initialize any additional data structures (besides 2 ByteBuffer objects per
> ZipFile that only expose native memory to Java code).
>
> Regards, Peter
>
>
>>
>> There are several caveats when the cache cannot be used with non-jar
>> URLs on the classpath, and the meta index, but I explain them in my
>> last email along with comments in the reference implementation
>>
>> ---
>> Regarding modified jars:
>> - moved/renamed: the file handle is still valid and it doesn't affect
>> the JVM/classloading
>> - deleted: the file handle is still valid and it doesn't affect the
>> JVM/classloading
>> - modified: the JVM crashes
>>
>> The first two may not be intuitive, but remember that file handles
>> point to files; not paths on the filesystem.
>> So even though a jar appears renamed in the shell, the java process
>> has opened a file, somewhere in the c implementation of file objects
>> it has the file handle, and when the JVM does the system call read on
>> the file handle say to read the class from the jar file, it all works
>> fine
>> For what it's worth, here's a stack overflow answer as "source":
>>
>> http://stackoverflow.com/questions/2028874/what-happens-to-an-open-file-handler-on-linux-if-the-pointed-file-gets-moved-de
>>
>> ---
>> There is a protected method URLClassLoader#addURL which appends a URL
>> to the classpath.
>> People could use reflection to make it public.
>> Because jars are opened lazily and the cache is also built lazily
>> whenever a jar is opened, it doesn't matter if paths are appended
>>
>> Regarding people making extensive use of reflection to modify the
>> order of entries on the classpath, I believe that's irrelevant as
>> that's clearly not the semantic of URLClassLoader/URLClassPath.
>> People who need custom classloading rules create custom classloaders;
>> that's the purpose of classloaders being extensible
>>
>> ---
>> Anyways, I hope this was discussion worthy.
>> I've looked much into this and believe I haven't missed anything, but
>> if someone knows why it hasn't/can't be done any insight would be
>> appreciated!
>> Alan from the last email thread said "There was a lot of experiments
>> and prototypes in the past along these lines" - are the results
>> public?
>> He also mentioned improving classloading in Java's upcoming module
>> system (originally planned for Java 7, currently delayed to Java 9),
>> but I believe the algorithmic complexity and performance of
>> URLClassLoader could be improved without complicated changes
>>
>> Please let me know what you think, and thanks for your time!
>>
>> Best regards,
>> Adrian
>
>