RFR (S/M) JDK-8010196 NPG: Internal Error: Metaspace allocation lock -- possible deadlock

Coleen Phillimore coleen.phillimore at oracle.com
Tue Apr 9 07:48:09 PDT 2013


On 04/09/2013 04:54 AM, Mikael Gerdin wrote:
>
>
> On 2013-04-09 06:13, David Holmes wrote:
>> On 9/04/2013 1:58 PM, Coleen Phillimore wrote:
>>>
>>> Minor correction below.
>>>
>>> On 4/8/2013 11:20 PM, Coleen Phillimore wrote:
>>>> Mikael,
>>>> This looks good including the refactoring.
>>>>
>>>> On 4/8/2013 8:17 PM, David Holmes wrote:
>>>>> Hi Mikael,
>>>>>
>>>>> This approach sounds (and looks) reasonable to me, but I'll leav eit
>>>>> to others to officially Review.
>>>>>
>>>>> One concern I did have was whether the initialization of the array
>>>>> could fail and what that would mean for how this particular cld was
>>>>> used. That led me to the following code which looks suspect to me:
>>>>>
>>>>> // ClassLoaderData into the java/lang/ClassLoader object as a hidden
>>>>> field
>>>>> ClassLoaderData* ClassLoaderDataGraph::add(ClassLoaderData**
>>>>> cld_addr, Handle loader, TRAPS) {
>>>>>   ...
>>>>>   ClassLoaderData* cld = new ClassLoaderData(loader, is_anonymous);
>>>>>
>>>>>   if (cld_addr != NULL) {
>>>>>     // First, Atomically set it
>>>>>     ClassLoaderData* old = (ClassLoaderData*)
>>>>> Atomic::cmpxchg_ptr(cld, cld_addr, NULL);
>>>>>     if (old != NULL) {
>>>>>       delete cld;
>>>>>       // Returns the data.
>>>>>       return old;
>>>>>     }
>>>>>   }
>>>>>
>>>>>   // We won the race, and therefore the task of adding the data to
>>>>> the list of
>>>>>   // class loader data
>>>>>   do {
>>>>>     cld->set_next(next);
>>>>>     ClassLoaderData* exchanged =
>>>>> (ClassLoaderData*)Atomic::cmpxchg_ptr(cld, list_head, next);
>>>>>     if (exchanged == next) {
>>>>>       ...
>>>>>       // Create dependencies after the CLD is added to the list.
>>>>> Otherwise,
>>>>>       // the GC GC will not find the CLD and the _class_loader field
>>>>> will
>>>>>       // not be updated.
>>>>>       cld->init_dependencies(CHECK_NULL);
>>>>>       return cld;
>>>>>     }
>>>>>     next = exchanged;
>>>>>   } while (true);
>>>>> }
>>>>>
>>>>> So it explicitly states that we create dependencies after adding to
>>>>> the list, but if creating the dependencies fails we return null, but
>>>>> leave the new cld in the list. Is that a valid thing to do ??
>>>>>
>>>>
>>>> Not really.  If we hit OOM here there isn't a good recovery. We
>>>> shouldn't return NULL in any case.   I will file a bug.  In the
>>>> meantime, Mikael can you change it to CATCH_NULL and fix the comment
>>>> above (remove "the GC" before "GC")?
>>>
>>> David is right, it can return null with the pending exception but 
>>> should
>>> be removed from the list and from the class_loader hidden field in this
>>> case.   This function is dangerous because it has unhandled oops (the
>>> address passed in).  For now, I think you can leave it Mikael.  It'll
>>> throw the OOM anyway.
>>
>> So getting back to the original fix my concern is whether after
>> init_dependencies fails, can we ever call add_dependency for that cld
>> and so attempt to lock on a NULL object?
>
> If _dependencies is NULL when we call record_dependency I have a hard 
> time seeing how the current code would avoid crashing.
>
> In fact, it looks pretty likely that we'll crash if we ever fail 
> init_dependencies.

If we fail init_dependencies, we return a NULL CLD to the caller with 
the exception pending.  That'll do a CHECK_NULL() and the OOM will pop 
back out to some caller in Java.   But the CLD will still be attached to 
the class_loader() oop.

It should be removed from the CLD graph and unattached to the class 
loader oop.  Unfortunately the former needs more locking than this CAS 
that we have.

Do you want to fix this for your bug fix or file a new bug?

Coleen

>
>
> If we survive all the way to record_dependency the current version of 
> the code looks like it will assert twice in debug builds and then SEGV 
> at the call to last->obj_at{_put} in locked_add_dependency.
>
> /Mikael
>
>>
>> David
>>
>>> Coleen
>>>
>>>>
>>>> Coleen
>>>>
>>>>
>>>>
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>> On 8/04/2013 11:49 PM, Mikael Gerdin wrote:
>>>>>> Hi
>>>>>>
>>>>>> The problem is that when running the G1 garbage collector a call to
>>>>>> objArrayOopDesc::obj_at_put can in rare cases cause the G1 dirty 
>>>>>> card
>>>>>> queue buffer to fill up and this will cause G1 to take
>>>>>> DirtyCardQ_CBL_mon/16 to return the full buffer and get a new one.
>>>>>>
>>>>>> Adding dependencies to a ClassLoaderData is currently protected 
>>>>>> by the
>>>>>> per-metaspace "Metaspace allocation lock"/5 (which protects more 
>>>>>> than
>>>>>> just allocations).
>>>>>>
>>>>>> Because I want to avoid messing around with the lock ordering I
>>>>>> suggest
>>>>>> that we use an ObjectLocker on the _dependencies oop in
>>>>>> ClassLoaderData.
>>>>>> _dependencies is a 2-element objArrayOop, in effect it's an ad-hoc
>>>>>> linked list of ClassLoaders/Classes which must be kept alive by this
>>>>>> CLD.
>>>>>>
>>>>>> Using an ObjectLocker on the head element of the linked list 
>>>>>> should be
>>>>>> at least as good as using the metaspace_lock(). There should not be
>>>>>> any
>>>>>> new deadlock issues with VM mutexes since any application thread 
>>>>>> could
>>>>>> use a similar synchronized linked list construct.
>>>>>>
>>>>>> As a bonus I suggest that we factor out the dependency handling from
>>>>>> ClassLoaderData into a inner class: ClassLoaderData::Dependencies 
>>>>>> and
>>>>>> let Dependencies manage the head of the linked list by itself. This
>>>>>> should make it more clear that the dependency list is not guarded by
>>>>>> the
>>>>>> metaspace lock anymore.
>>>>>>
>>>>>> Buglinks:
>>>>>> https://jbs.oracle.com/bugs/browse/JDK-8010196
>>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010196
>>>>>>
>>>>>> Testing:
>>>>>> default jprt run
>>>>>> jdk/test/java/lang/invoke/BigArityTest.java (which could 
>>>>>> reproduce the
>>>>>> issue with -XX:+UseG1GC -XX:G1UpdateBufferSize=1)
>>>>>>
>>>>>> Webrevs:
>>>>>> Incremental:
>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-fix/webrev
>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-factor-out/webrev 
>>>>>>
>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-rename/webrev
>>>>>> Everything:
>>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0/webrev
>>>>>>
>>>>>> Thanks
>>>>>> /Mikael
>>>>
>>>



More information about the hotspot-runtime-dev mailing list