RFR (S/M) JDK-8010196 NPG: Internal Error: Metaspace allocation lock -- possible deadlock

Coleen Phillimore coleen.phillimore at oracle.com
Tue Apr 9 17:47:08 UTC 2013


OK.
Coleen

On 04/09/2013 11:19 AM, Mikael Gerdin wrote:
>
>
> On 2013-04-09 16:48, Coleen Phillimore wrote:
>>
>> On 04/09/2013 04:54 AM, Mikael Gerdin wrote:
>>>
>>>
>>> On 2013-04-09 06:13, David Holmes wrote:
>>>> On 9/04/2013 1:58 PM, Coleen Phillimore wrote:
>>>>>
>>>>> Minor correction below.
>>>>>
>>>>> On 4/8/2013 11:20 PM, Coleen Phillimore wrote:
>>>>>> Mikael,
>>>>>> This looks good including the refactoring.
>>>>>>
>>>>>> On 4/8/2013 8:17 PM, David Holmes wrote:
>>>>>>> Hi Mikael,
>>>>>>>
>>>>>>> This approach sounds (and looks) reasonable to me, but I'll leav 
>>>>>>> eit
>>>>>>> to others to officially Review.
>>>>>>>
>>>>>>> One concern I did have was whether the initialization of the array
>>>>>>> could fail and what that would mean for how this particular cld was
>>>>>>> used. That led me to the following code which looks suspect to me:
>>>>>>>
>>>>>>> // ClassLoaderData into the java/lang/ClassLoader object as a 
>>>>>>> hidden
>>>>>>> field
>>>>>>> ClassLoaderData* ClassLoaderDataGraph::add(ClassLoaderData**
>>>>>>> cld_addr, Handle loader, TRAPS) {
>>>>>>>   ...
>>>>>>>   ClassLoaderData* cld = new ClassLoaderData(loader, is_anonymous);
>>>>>>>
>>>>>>>   if (cld_addr != NULL) {
>>>>>>>     // First, Atomically set it
>>>>>>>     ClassLoaderData* old = (ClassLoaderData*)
>>>>>>> Atomic::cmpxchg_ptr(cld, cld_addr, NULL);
>>>>>>>     if (old != NULL) {
>>>>>>>       delete cld;
>>>>>>>       // Returns the data.
>>>>>>>       return old;
>>>>>>>     }
>>>>>>>   }
>>>>>>>
>>>>>>>   // We won the race, and therefore the task of adding the data to
>>>>>>> the list of
>>>>>>>   // class loader data
>>>>>>>   do {
>>>>>>>     cld->set_next(next);
>>>>>>>     ClassLoaderData* exchanged =
>>>>>>> (ClassLoaderData*)Atomic::cmpxchg_ptr(cld, list_head, next);
>>>>>>>     if (exchanged == next) {
>>>>>>>       ...
>>>>>>>       // Create dependencies after the CLD is added to the list.
>>>>>>> Otherwise,
>>>>>>>       // the GC GC will not find the CLD and the _class_loader 
>>>>>>> field
>>>>>>> will
>>>>>>>       // not be updated.
>>>>>>>       cld->init_dependencies(CHECK_NULL);
>>>>>>>       return cld;
>>>>>>>     }
>>>>>>>     next = exchanged;
>>>>>>>   } while (true);
>>>>>>> }
>>>>>>>
>>>>>>> So it explicitly states that we create dependencies after adding to
>>>>>>> the list, but if creating the dependencies fails we return null, 
>>>>>>> but
>>>>>>> leave the new cld in the list. Is that a valid thing to do ??
>>>>>>>
>>>>>>
>>>>>> Not really.  If we hit OOM here there isn't a good recovery. We
>>>>>> shouldn't return NULL in any case.   I will file a bug. In the
>>>>>> meantime, Mikael can you change it to CATCH_NULL and fix the comment
>>>>>> above (remove "the GC" before "GC")?
>>>>>
>>>>> David is right, it can return null with the pending exception but
>>>>> should
>>>>> be removed from the list and from the class_loader hidden field in 
>>>>> this
>>>>> case.   This function is dangerous because it has unhandled oops (the
>>>>> address passed in).  For now, I think you can leave it Mikael.  It'll
>>>>> throw the OOM anyway.
>>>>
>>>> So getting back to the original fix my concern is whether after
>>>> init_dependencies fails, can we ever call add_dependency for that cld
>>>> and so attempt to lock on a NULL object?
>>>
>>> If _dependencies is NULL when we call record_dependency I have a hard
>>> time seeing how the current code would avoid crashing.
>>>
>>> In fact, it looks pretty likely that we'll crash if we ever fail
>>> init_dependencies.
>>
>> If we fail init_dependencies, we return a NULL CLD to the caller with
>> the exception pending.  That'll do a CHECK_NULL() and the OOM will pop
>> back out to some caller in Java.   But the CLD will still be attached to
>> the class_loader() oop.
>>
>> It should be removed from the CLD graph and unattached to the class
>> loader oop.  Unfortunately the former needs more locking than this CAS
>> that we have.
>>
>> Do you want to fix this for your bug fix or file a new bug?
>
> It would be good if we fix this with a new bug.
> I'm already doing both a small refactor and a bug fix in the same one :)
>
> /Mikael
>
>>
>> Coleen
>>
>>>
>>>
>>> If we survive all the way to record_dependency the current version of
>>> the code looks like it will assert twice in debug builds and then SEGV
>>> at the call to last->obj_at{_put} in locked_add_dependency.
>>>
>>> /Mikael
>>>
>>>>
>>>> David
>>>>
>>>>> Coleen
>>>>>
>>>>>>
>>>>>> Coleen
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 8/04/2013 11:49 PM, Mikael Gerdin wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> The problem is that when running the G1 garbage collector a 
>>>>>>>> call to
>>>>>>>> objArrayOopDesc::obj_at_put can in rare cases cause the G1 dirty
>>>>>>>> card
>>>>>>>> queue buffer to fill up and this will cause G1 to take
>>>>>>>> DirtyCardQ_CBL_mon/16 to return the full buffer and get a new one.
>>>>>>>>
>>>>>>>> Adding dependencies to a ClassLoaderData is currently protected
>>>>>>>> by the
>>>>>>>> per-metaspace "Metaspace allocation lock"/5 (which protects more
>>>>>>>> than
>>>>>>>> just allocations).
>>>>>>>>
>>>>>>>> Because I want to avoid messing around with the lock ordering I
>>>>>>>> suggest
>>>>>>>> that we use an ObjectLocker on the _dependencies oop in
>>>>>>>> ClassLoaderData.
>>>>>>>> _dependencies is a 2-element objArrayOop, in effect it's an ad-hoc
>>>>>>>> linked list of ClassLoaders/Classes which must be kept alive by 
>>>>>>>> this
>>>>>>>> CLD.
>>>>>>>>
>>>>>>>> Using an ObjectLocker on the head element of the linked list
>>>>>>>> should be
>>>>>>>> at least as good as using the metaspace_lock(). There should 
>>>>>>>> not be
>>>>>>>> any
>>>>>>>> new deadlock issues with VM mutexes since any application thread
>>>>>>>> could
>>>>>>>> use a similar synchronized linked list construct.
>>>>>>>>
>>>>>>>> As a bonus I suggest that we factor out the dependency handling 
>>>>>>>> from
>>>>>>>> ClassLoaderData into a inner class: ClassLoaderData::Dependencies
>>>>>>>> and
>>>>>>>> let Dependencies manage the head of the linked list by itself. 
>>>>>>>> This
>>>>>>>> should make it more clear that the dependency list is not 
>>>>>>>> guarded by
>>>>>>>> the
>>>>>>>> metaspace lock anymore.
>>>>>>>>
>>>>>>>> Buglinks:
>>>>>>>> https://jbs.oracle.com/bugs/browse/JDK-8010196
>>>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010196
>>>>>>>>
>>>>>>>> Testing:
>>>>>>>> default jprt run
>>>>>>>> jdk/test/java/lang/invoke/BigArityTest.java (which could
>>>>>>>> reproduce the
>>>>>>>> issue with -XX:+UseG1GC -XX:G1UpdateBufferSize=1)
>>>>>>>>
>>>>>>>> Webrevs:
>>>>>>>> Incremental:
>>>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-fix/webrev
>>>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-factor-out/webrev 
>>>>>>>>
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-rename/webrev
>>>>>>>> Everything:
>>>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0/webrev
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> /Mikael
>>>>>>
>>>>>
>>




More information about the hotspot-gc-dev mailing list