Parallel ClassLoading space optimizations

Mon Feb 4 08:42:57 UTC 2013

Hi David,

I might have something usable, but I just wanted to verify some things 
beforehand. What I investigated was a simple keeping a cache of locks in 
a map weekly referenced. In your blog:

* 
https://blogs.oracle.com/dholmes/entry/parallel_classloading_revisited_fully_concurrent

...you describe this a a 3rd alternative:

    /3. Reduce the lifetime of lock objects so that entries are removed
    from the map when no longer needed (eg remove after loading,*use
    weak references* to the lock objects and cleanup the map periodically)./

...but later you preclude this option:

    /Similarly we might reason that we can remove a mapping (and the
    lock object) because the class is already loaded, but this would
    again violate the specification because it can be reasoned that the
    following assertion should hold true: /

    |
       Object lock1 = loader.getClassLoadingLock(name);
       loader.loadClass(name);
       Object lock2 = loader.getClassLoadingLock(name);
       assert lock1 == lock2;
    |

    /Without modifying the specification, or at least doing some
    creative wordsmithing on it, options 1 and 3 are precluded. /

When using WeakReferences to cache lock Objects, the above assertion 
would still hold true, wouldn't it?

I can not think of any reasonable 3rd party ClassLoader code that would 
behave differently when having lock objects strongly referenced for the 
entire VM lifetime vs. having them temporarily weakly referenced and 
eventually recreated if needed. For example, only code that does the 
following things can see difference:

  * use .toString or .hashCode on lock object and keep it somewhere
    without also keeping the lock object itself to use it later
  * wrap a lock object into a WeakReference and observe reference being
    cleared or not

Is that a reasonable assumption to continue in this direction? If the 
semantics are reasonably OK, then all the solution has to prove is 
(space and time) performance, right?

Here's some preliminary illustration what can be achieved space-wise. 
This is a test that attempts to load all the classes from the rt.jar. 
The situation we have now (using -Xms256m -Xmx256m and 32bit addresses):

...At the beginning of main()

Total memory: 257294336 bytes
Free  memory: 251920320 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 7936 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 30848 bytes
Deep size of both: 38784 bytes (reference)

...Attempted to load: 18558 classes in: 1964.55825 ms

Total memory: 257294336 bytes
Free  memory: 227314112 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162184 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215216 bytes
Deep size of both: 3377400 bytes (difference to reference: 3338616 bytes)

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 260440064 bytes
Free  memory: 193163368 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162328 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215408 bytes
Deep size of both: 3377736 bytes (difference to reference: 3338952 bytes)

vs. having lock objects weekly referenced and doing expunging work at 
each request for a lock:

...At the beginning of main()

Total memory: 257294336 bytes
Free  memory: 251920320 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 9584 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 33960 bytes
Deep size of both: 43544 bytes (reference)
Lock stats...
       create: 108
   return old: 0
      replace: 0
      expunge: 0

...Attempted to load: 18558 classes in: 2005.14628 ms

Total memory: 257294336 bytes
Free  memory: 187198776 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 572768 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 1122976 bytes
Deep size of both: 1695744 bytes (difference to reference: 1652200 bytes)
Lock stats...
       create: 37302
   return old: 201
      replace: 0
      expunge: 25893

...Performing gc()

...Loading class: test.TestClassLoader$Last (to trigger expunging)

Total memory: 257294336 bytes
Free  memory: 238693336 bytes
Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 78944 bytes
Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 168512 bytes
Deep size of both: 247456 bytes (difference to reference: 203912 bytes)
Lock stats...
       create: 2
   return old: 0
      replace: 0
      expunge: 11517

... as can be seen from this particular usecase, there's approx. 20% 
overhead of storage for locks because of WeakReference indirection (at 
the beginning of main() before any expunging kicks-in) and it seems 
there's a negligible overhead of about 2% in performance when 
considering total time of loading classes. After that we see that (since 
this is a single-threaded example) re-use of lock for a class that is 
already (being) loaded is rare (I assume only explicit requests like 
Class.forName trigger that event in this example). At the end, almost 
all locks are eventually released, which frees 3MB+ heap space.

Here's a piece of code for obtaining locks (coded as a subclass of 
ConcurrentHashMap for performance reasons):

     public Object getOrCreate(K key) {
         // the most common situation is that the key is new, so 
optimize fast-path accordingly
         Object lock = new Object();
         LockRef<K> ref = new LockRef<>(key, lock, refQueue);
         expungeStaleEntries();
         for (; ; ) {
             @SuppressWarnings("unchecked")
             LockRef<K> oldRef = (LockRef<K>) super.putIfAbsent(key, ref);
             if (oldRef == null) {
                 if (keepStats) createCount.increment();
                 return lock;
             }
             else {
                 Object oldLock = oldRef.get();
                 if (oldLock != null) {
                     if (keepStats) returnOldCount.increment();
                     return oldLock;
                 }
                 else if (super.replace(key, oldRef, ref)) {
                     if (keepStats) replaceCount.increment();
                     return lock;
                 }
             }
         }
     }

     private void expungeStaleEntries() {
         LockRef<K> ref;
         while ((ref = (LockRef<K>) refQueue.poll()) != null) {
             super.remove(ref.key, ref);
             if (keepStats) expungeCount.increment();
         }
     }

Do you think this is something worth pursuing further?

Regards, Peter

On 02/01/2013 05:01 AM, David Holmes wrote:
> Hi Peter,
>
> On 31/01/2013 11:07 PM, Peter Levart wrote:
>> Hi David,
>>
>> Could the parallel classloading be at least space optimized somehow in
>> the JDK8 timeframe if there was a solution ready?
>
> If there is something that does not impact any of the existing 
> specified semantics regarding the classloader lock object then it may 
> be possible to work it into an 8 update if not 8 itself. But all the 
> suggestions I've seen for reducing the memory usage also alter the 
> semantics in someway.
>
> However, a key part of the concurrent classloader proposal was that it 
> didn't change the behaviour of any existing classloaders outside the 
> core JDK. Anything that changes existing behaviour has a much higher 
> compatibility bar to get over.
>
> David
> -----