Parallel ClassLoading space optimizations

Mon Feb 4 11:50:53 UTC 2013

On 4/02/2013 9:11 PM, David Holmes wrote:
> On 4/02/2013 6:42 PM, Peter Levart wrote:
>> I might have something usable, but I just wanted to verify some things
>> beforehand. What I investigated was a simple keeping a cache of locks in
>> a map weekly referenced. In your blog:
>>
>> *
>> https://blogs.oracle.com/dholmes/entry/parallel_classloading_revisited_fully_concurrent
>>
>>
>> ...you describe this a a 3rd alternative:
>>
>> /3. Reduce the lifetime of lock objects so that entries are removed
>> from the map when no longer needed (eg remove after loading,*use
>> weak references* to the lock objects and cleanup the map periodically)./
>>
>>
>> ...but later you preclude this option:
>>
>> /Similarly we might reason that we can remove a mapping (and the
>> lock object) because the class is already loaded, but this would
>> again violate the specification because it can be reasoned that the
>> following assertion should hold true: /
>>
>> |
>> Object lock1 = loader.getClassLoadingLock(name);
>> loader.loadClass(name);
>> Object lock2 = loader.getClassLoadingLock(name);
>> assert lock1 == lock2;
>> |
>>
>> /Without modifying the specification, or at least doing some
>> creative wordsmithing on it, options 1 and 3 are precluded. /
>>
>>
>> When using WeakReferences to cache lock Objects, the above assertion
>> would still hold true, wouldn't it?
>
> No. The WeakReference can be cleared causing lock2 to be a different
> object.

As Peter has pointed out to me this is of course not correct. The only 
way you can write the assert is to have maintained strong references to 
the lock objects, hence the WeakReference can not be cleared.

I need to re-consider this approach as my previous thoughts on it were 
flawed.

David
-----

>> I can not think of any reasonable 3rd party ClassLoader code that would
>> behave differently when having lock objects strongly referenced for the
>> entire VM lifetime vs. having them temporarily weakly referenced and
>> eventually recreated if needed. For example, only code that does the
>> following things can see difference:
>>
>> * use .toString or .hashCode on lock object and keep it somewhere
>> without also keeping the lock object itself to use it later
>> * wrap a lock object into a WeakReference and observe reference being
>> cleared or not
>
> But that is the point - we don't know what any actual external
> classloader does with the classloading lock, so we can't just
> arbitrarily change the existing specification without being very sure
> about the implications for making that change.
>
> Such a change would have to have been proposed well before now so there
> was time to evaluate the impact.
>
> David
> -----
>
>> Is that a reasonable assumption to continue in this direction? If the
>> semantics are reasonably OK, then all the solution has to prove is
>> (space and time) performance, right?
>>
>> Here's some preliminary illustration what can be achieved space-wise.
>> This is a test that attempts to load all the classes from the rt.jar.
>> The situation we have now (using -Xms256m -Xmx256m and 32bit addresses):
>>
>> ...At the beginning of main()
>>
>> Total memory: 257294336 bytes
>> Free memory: 251920320 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 7936 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 30848 bytes
>> Deep size of both: 38784 bytes (reference)
>>
>> ...Attempted to load: 18558 classes in: 1964.55825 ms
>>
>> Total memory: 257294336 bytes
>> Free memory: 227314112 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162184 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215216 bytes
>> Deep size of both: 3377400 bytes (difference to reference: 3338616 bytes)
>>
>> ...Performing gc()
>>
>> ...Loading class: test.TestClassLoader$Last (to trigger expunging)
>>
>> Total memory: 260440064 bytes
>> Free memory: 193163368 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 3d4eac69: 1162328 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 55f96302: 2215408 bytes
>> Deep size of both: 3377736 bytes (difference to reference: 3338952 bytes)
>>
>>
>> vs. having lock objects weekly referenced and doing expunging work at
>> each request for a lock:
>>
>> ...At the beginning of main()
>>
>> Total memory: 257294336 bytes
>> Free memory: 251920320 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 9584 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 33960 bytes
>> Deep size of both: 43544 bytes (reference)
>> Lock stats...
>> create: 108
>> return old: 0
>> replace: 0
>> expunge: 0
>>
>> ...Attempted to load: 18558 classes in: 2005.14628 ms
>>
>> Total memory: 257294336 bytes
>> Free memory: 187198776 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 572768 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 1122976 bytes
>> Deep size of both: 1695744 bytes (difference to reference: 1652200 bytes)
>> Lock stats...
>> create: 37302
>> return old: 201
>> replace: 0
>> expunge: 25893
>>
>> ...Performing gc()
>>
>> ...Loading class: test.TestClassLoader$Last (to trigger expunging)
>>
>> Total memory: 257294336 bytes
>> Free memory: 238693336 bytes
>> Deep size of sun.misc.Launcher$ExtClassLoader at 75b84c92: 78944 bytes
>> Deep size of sun.misc.Launcher$AppClassLoader at 42a57993: 168512 bytes
>> Deep size of both: 247456 bytes (difference to reference: 203912 bytes)
>> Lock stats...
>> create: 2
>> return old: 0
>> replace: 0
>> expunge: 11517
>>
>>
>> ... as can be seen from this particular usecase, there's approx. 20%
>> overhead of storage for locks because of WeakReference indirection (at
>> the beginning of main() before any expunging kicks-in) and it seems
>> there's a negligible overhead of about 2% in performance when
>> considering total time of loading classes. After that we see that (since
>> this is a single-threaded example) re-use of lock for a class that is
>> already (being) loaded is rare (I assume only explicit requests like
>> Class.forName trigger that event in this example). At the end, almost
>> all locks are eventually released, which frees 3MB+ heap space.
>>
>> Here's a piece of code for obtaining locks (coded as a subclass of
>> ConcurrentHashMap for performance reasons):
>>
>> public Object getOrCreate(K key) {
>> // the most common situation is that the key is new, so
>> optimize fast-path accordingly
>> Object lock = new Object();
>> LockRef<K> ref = new LockRef<>(key, lock, refQueue);
>> expungeStaleEntries();
>> for (; ; ) {
>> @SuppressWarnings("unchecked")
>> LockRef<K> oldRef = (LockRef<K>) super.putIfAbsent(key, ref);
>> if (oldRef == null) {
>> if (keepStats) createCount.increment();
>> return lock;
>> }
>> else {
>> Object oldLock = oldRef.get();
>> if (oldLock != null) {
>> if (keepStats) returnOldCount.increment();
>> return oldLock;
>> }
>> else if (super.replace(key, oldRef, ref)) {
>> if (keepStats) replaceCount.increment();
>> return lock;
>> }
>> }
>> }
>> }
>>
>> private void expungeStaleEntries() {
>> LockRef<K> ref;
>> while ((ref = (LockRef<K>) refQueue.poll()) != null) {
>> super.remove(ref.key, ref);
>> if (keepStats) expungeCount.increment();
>> }
>> }
>>
>>
>> Do you think this is something worth pursuing further?
>>
>>
>> Regards, Peter
>>
>>
>> On 02/01/2013 05:01 AM, David Holmes wrote:
>>> Hi Peter,
>>>
>>> On 31/01/2013 11:07 PM, Peter Levart wrote:
>>>> Hi David,
>>>>
>>>> Could the parallel classloading be at least space optimized somehow in
>>>> the JDK8 timeframe if there was a solution ready?
>>>
>>> If there is something that does not impact any of the existing
>>> specified semantics regarding the classloader lock object then it may
>>> be possible to work it into an 8 update if not 8 itself. But all the
>>> suggestions I've seen for reducing the memory usage also alter the
>>> semantics in someway.
>>>
>>> However, a key part of the concurrent classloader proposal was that it
>>> didn't change the behaviour of any existing classloaders outside the
>>> core JDK. Anything that changes existing behaviour has a much higher
>>> compatibility bar to get over.
>>>
>>> David
>>> -----
>>