RFR (S/M) JDK-8010196 NPG: Internal Error: Metaspace allocation lock -- possible deadlock

Mikael Gerdin mikael.gerdin at oracle.com
Tue Apr 9 01:54:19 PDT 2013



On 2013-04-09 06:13, David Holmes wrote:
> On 9/04/2013 1:58 PM, Coleen Phillimore wrote:
>>
>> Minor correction below.
>>
>> On 4/8/2013 11:20 PM, Coleen Phillimore wrote:
>>> Mikael,
>>> This looks good including the refactoring.
>>>
>>> On 4/8/2013 8:17 PM, David Holmes wrote:
>>>> Hi Mikael,
>>>>
>>>> This approach sounds (and looks) reasonable to me, but I'll leav eit
>>>> to others to officially Review.
>>>>
>>>> One concern I did have was whether the initialization of the array
>>>> could fail and what that would mean for how this particular cld was
>>>> used. That led me to the following code which looks suspect to me:
>>>>
>>>> // ClassLoaderData into the java/lang/ClassLoader object as a hidden
>>>> field
>>>> ClassLoaderData* ClassLoaderDataGraph::add(ClassLoaderData**
>>>> cld_addr, Handle loader, TRAPS) {
>>>>   ...
>>>>   ClassLoaderData* cld = new ClassLoaderData(loader, is_anonymous);
>>>>
>>>>   if (cld_addr != NULL) {
>>>>     // First, Atomically set it
>>>>     ClassLoaderData* old = (ClassLoaderData*)
>>>> Atomic::cmpxchg_ptr(cld, cld_addr, NULL);
>>>>     if (old != NULL) {
>>>>       delete cld;
>>>>       // Returns the data.
>>>>       return old;
>>>>     }
>>>>   }
>>>>
>>>>   // We won the race, and therefore the task of adding the data to
>>>> the list of
>>>>   // class loader data
>>>>   do {
>>>>     cld->set_next(next);
>>>>     ClassLoaderData* exchanged =
>>>> (ClassLoaderData*)Atomic::cmpxchg_ptr(cld, list_head, next);
>>>>     if (exchanged == next) {
>>>>       ...
>>>>       // Create dependencies after the CLD is added to the list.
>>>> Otherwise,
>>>>       // the GC GC will not find the CLD and the _class_loader field
>>>> will
>>>>       // not be updated.
>>>>       cld->init_dependencies(CHECK_NULL);
>>>>       return cld;
>>>>     }
>>>>     next = exchanged;
>>>>   } while (true);
>>>> }
>>>>
>>>> So it explicitly states that we create dependencies after adding to
>>>> the list, but if creating the dependencies fails we return null, but
>>>> leave the new cld in the list. Is that a valid thing to do ??
>>>>
>>>
>>> Not really.  If we hit OOM here there isn't a good recovery. We
>>> shouldn't return NULL in any case.   I will file a bug.  In the
>>> meantime, Mikael can you change it to CATCH_NULL and fix the comment
>>> above (remove "the GC" before "GC")?
>>
>> David is right, it can return null with the pending exception but should
>> be removed from the list and from the class_loader hidden field in this
>> case.   This function is dangerous because it has unhandled oops (the
>> address passed in).  For now, I think you can leave it Mikael.  It'll
>> throw the OOM anyway.
>
> So getting back to the original fix my concern is whether after
> init_dependencies fails, can we ever call add_dependency for that cld
> and so attempt to lock on a NULL object?

If _dependencies is NULL when we call record_dependency I have a hard 
time seeing how the current code would avoid crashing.

In fact, it looks pretty likely that we'll crash if we ever fail 
init_dependencies.


If we survive all the way to record_dependency the current version of 
the code looks like it will assert twice in debug builds and then SEGV 
at the call to last->obj_at{_put} in locked_add_dependency.

/Mikael

>
> David
>
>> Coleen
>>
>>>
>>> Coleen
>>>
>>>
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>> On 8/04/2013 11:49 PM, Mikael Gerdin wrote:
>>>>> Hi
>>>>>
>>>>> The problem is that when running the G1 garbage collector a call to
>>>>> objArrayOopDesc::obj_at_put can in rare cases cause the G1 dirty card
>>>>> queue buffer to fill up and this will cause G1 to take
>>>>> DirtyCardQ_CBL_mon/16 to return the full buffer and get a new one.
>>>>>
>>>>> Adding dependencies to a ClassLoaderData is currently protected by the
>>>>> per-metaspace "Metaspace allocation lock"/5 (which protects more than
>>>>> just allocations).
>>>>>
>>>>> Because I want to avoid messing around with the lock ordering I
>>>>> suggest
>>>>> that we use an ObjectLocker on the _dependencies oop in
>>>>> ClassLoaderData.
>>>>> _dependencies is a 2-element objArrayOop, in effect it's an ad-hoc
>>>>> linked list of ClassLoaders/Classes which must be kept alive by this
>>>>> CLD.
>>>>>
>>>>> Using an ObjectLocker on the head element of the linked list should be
>>>>> at least as good as using the metaspace_lock(). There should not be
>>>>> any
>>>>> new deadlock issues with VM mutexes since any application thread could
>>>>> use a similar synchronized linked list construct.
>>>>>
>>>>> As a bonus I suggest that we factor out the dependency handling from
>>>>> ClassLoaderData into a inner class: ClassLoaderData::Dependencies and
>>>>> let Dependencies manage the head of the linked list by itself. This
>>>>> should make it more clear that the dependency list is not guarded by
>>>>> the
>>>>> metaspace lock anymore.
>>>>>
>>>>> Buglinks:
>>>>> https://jbs.oracle.com/bugs/browse/JDK-8010196
>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010196
>>>>>
>>>>> Testing:
>>>>> default jprt run
>>>>> jdk/test/java/lang/invoke/BigArityTest.java (which could reproduce the
>>>>> issue with -XX:+UseG1GC -XX:G1UpdateBufferSize=1)
>>>>>
>>>>> Webrevs:
>>>>> Incremental:
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-fix/webrev
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-factor-out/webrev
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-rename/webrev
>>>>> Everything:
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0/webrev
>>>>>
>>>>> Thanks
>>>>> /Mikael
>>>
>>


More information about the hotspot-runtime-dev mailing list