RFR (S/M) JDK-8010196 NPG: Internal Error: Metaspace allocation lock -- possible deadlock

Coleen Phillimore coleen.phillimore at oracle.com
Tue Apr 9 03:58:11 UTC 2013


Minor correction below.

On 4/8/2013 11:20 PM, Coleen Phillimore wrote:
> Mikael,
> This looks good including the refactoring.
>
> On 4/8/2013 8:17 PM, David Holmes wrote:
>> Hi Mikael,
>>
>> This approach sounds (and looks) reasonable to me, but I'll leav eit 
>> to others to officially Review.
>>
>> One concern I did have was whether the initialization of the array 
>> could fail and what that would mean for how this particular cld was 
>> used. That led me to the following code which looks suspect to me:
>>
>> // ClassLoaderData into the java/lang/ClassLoader object as a hidden 
>> field
>> ClassLoaderData* ClassLoaderDataGraph::add(ClassLoaderData** 
>> cld_addr, Handle loader, TRAPS) {
>>   ...
>>   ClassLoaderData* cld = new ClassLoaderData(loader, is_anonymous);
>>
>>   if (cld_addr != NULL) {
>>     // First, Atomically set it
>>     ClassLoaderData* old = (ClassLoaderData*) 
>> Atomic::cmpxchg_ptr(cld, cld_addr, NULL);
>>     if (old != NULL) {
>>       delete cld;
>>       // Returns the data.
>>       return old;
>>     }
>>   }
>>
>>   // We won the race, and therefore the task of adding the data to 
>> the list of
>>   // class loader data
>>   do {
>>     cld->set_next(next);
>>     ClassLoaderData* exchanged = 
>> (ClassLoaderData*)Atomic::cmpxchg_ptr(cld, list_head, next);
>>     if (exchanged == next) {
>>       ...
>>       // Create dependencies after the CLD is added to the list. 
>> Otherwise,
>>       // the GC GC will not find the CLD and the _class_loader field 
>> will
>>       // not be updated.
>>       cld->init_dependencies(CHECK_NULL);
>>       return cld;
>>     }
>>     next = exchanged;
>>   } while (true);
>> }
>>
>> So it explicitly states that we create dependencies after adding to 
>> the list, but if creating the dependencies fails we return null, but 
>> leave the new cld in the list. Is that a valid thing to do ??
>>
>
> Not really.  If we hit OOM here there isn't a good recovery. We 
> shouldn't return NULL in any case.   I will file a bug.  In the 
> meantime, Mikael can you change it to CATCH_NULL and fix the comment 
> above (remove "the GC" before "GC")?

David is right, it can return null with the pending exception but should 
be removed from the list and from the class_loader hidden field in this 
case.   This function is dangerous because it has unhandled oops (the 
address passed in).  For now, I think you can leave it Mikael.  It'll 
throw the OOM anyway.

Coleen

>
> Coleen
>
>
>
>
>> David
>> -----
>>
>> On 8/04/2013 11:49 PM, Mikael Gerdin wrote:
>>> Hi
>>>
>>> The problem is that when running the G1 garbage collector a call to
>>> objArrayOopDesc::obj_at_put can in rare cases cause the G1 dirty card
>>> queue buffer to fill up and this will cause G1 to take
>>> DirtyCardQ_CBL_mon/16 to return the full buffer and get a new one.
>>>
>>> Adding dependencies to a ClassLoaderData is currently protected by the
>>> per-metaspace "Metaspace allocation lock"/5 (which protects more than
>>> just allocations).
>>>
>>> Because I want to avoid messing around with the lock ordering I suggest
>>> that we use an ObjectLocker on the _dependencies oop in 
>>> ClassLoaderData.
>>> _dependencies is a 2-element objArrayOop, in effect it's an ad-hoc
>>> linked list of ClassLoaders/Classes which must be kept alive by this 
>>> CLD.
>>>
>>> Using an ObjectLocker on the head element of the linked list should be
>>> at least as good as using the metaspace_lock(). There should not be any
>>> new deadlock issues with VM mutexes since any application thread could
>>> use a similar synchronized linked list construct.
>>>
>>> As a bonus I suggest that we factor out the dependency handling from
>>> ClassLoaderData into a inner class: ClassLoaderData::Dependencies and
>>> let Dependencies manage the head of the linked list by itself. This
>>> should make it more clear that the dependency list is not guarded by 
>>> the
>>> metaspace lock anymore.
>>>
>>> Buglinks:
>>> https://jbs.oracle.com/bugs/browse/JDK-8010196
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010196
>>>
>>> Testing:
>>> default jprt run
>>> jdk/test/java/lang/invoke/BigArityTest.java (which could reproduce the
>>> issue with -XX:+UseG1GC -XX:G1UpdateBufferSize=1)
>>>
>>> Webrevs:
>>> Incremental:
>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-fix/webrev
>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-factor-out/webrev
>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-rename/webrev
>>> Everything:
>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0/webrev
>>>
>>> Thanks
>>> /Mikael
>




More information about the hotspot-gc-dev mailing list