Cache which java classes are in a jar when opening jar the first time during classloading

Mon Aug 31 03:02:38 UTC 2015

Hi Adrian,

It's possible for jar files to be modified while the JVM is running - is
there some facility for detecting that an archive was modified and thus
invalidating the cache?

Also, I wonder how class data sharing might interact with this, though I'll
admit that I don't know much about HotSpot (I use the IBM JVM).

On Sun, Aug 30, 2015, 18:20 Adrian <withoutpointk at gmail.com> wrote:

> Hello,
>
> I have been looking through the JVM source related to class loading.
> URLClassLoader#findClass calls URLClassPath#getResource
> URLClassPath creates a "loader" for every entry on the classpath (e.g.
> one JarLoader per jar file)
>
> In getResource, it loops through all its loaders in order,
> instantiating them lazily.
> For example, it will only create a JarLoader and open a jar file
> somewhere "farther along" the classpath if it did not find the
> resource in all the prior jars
>
> URLClassLoader#findClass and URLClassPath#getResource are doing linear
> searches on all the entries on the classpath every time they need to
> load a resource
>
> For a jar file, if there is an index in META-INF, at least the
> corresponding loader can figure out if the jar contains a class right
> away.
> If not, it searches an internal array/data structure created from the
> zipfile central directory (see
> jdk/src/share/native/java/util/zip/zip_util.c ZIP_GetEntry - if you
> follow the call hiearchy from URLClassPath$JarLoader#getResource, you
> end up at this function)
>
> If the jars on the classpath are optimal (majority of the classes are
> in the first few jars), there is not much overhead
> However, when classes are located in multiple jars along the
> classpath, the JVM spends nontrivial time searching through all of
> them
>
> One possible "solution" would be create a map of all resources ->
> which jar/jar loader they belong in whenever a jar file is opened.
> This can be done by iterating over JarFile#entries(), which just reads
> the central directory from the jar/zip file (which is done anyways to
> create some additional data structures when opening a jar/zip file)
>
> I implemented this to try it out and for a java program with ~1800
> classes, it improved the find class time (taken from
> sun.misc.PerfCounter.getFindClassTime()) from ~1.4s to ~1s
>
> I tried to think of reasons why this was not done already; looking
> through the code, I believe the semantics of the loaders remain the
> same.
> There is technically a memory overhead of saving this map of resources
> -> jar files/loaders, but improves the algorithm complexity from
> O(number of jars on classpath) to O(1)
>
> Would appreciate any feedback/insight as to whether this would be a
> good change or why it is the way it currently is.
> Thank you!
>
> Best regards,
> Adrian
>