RFR (S/M) JDK-8010196 NPG: Internal Error: Metaspace allocation lock -- possible deadlock
Mikael Gerdin
mikael.gerdin at oracle.com
Tue Apr 9 08:54:19 UTC 2013
On 2013-04-09 06:13, David Holmes wrote:
> On 9/04/2013 1:58 PM, Coleen Phillimore wrote:
>>
>> Minor correction below.
>>
>> On 4/8/2013 11:20 PM, Coleen Phillimore wrote:
>>> Mikael,
>>> This looks good including the refactoring.
>>>
>>> On 4/8/2013 8:17 PM, David Holmes wrote:
>>>> Hi Mikael,
>>>>
>>>> This approach sounds (and looks) reasonable to me, but I'll leav eit
>>>> to others to officially Review.
>>>>
>>>> One concern I did have was whether the initialization of the array
>>>> could fail and what that would mean for how this particular cld was
>>>> used. That led me to the following code which looks suspect to me:
>>>>
>>>> // ClassLoaderData into the java/lang/ClassLoader object as a hidden
>>>> field
>>>> ClassLoaderData* ClassLoaderDataGraph::add(ClassLoaderData**
>>>> cld_addr, Handle loader, TRAPS) {
>>>> ...
>>>> ClassLoaderData* cld = new ClassLoaderData(loader, is_anonymous);
>>>>
>>>> if (cld_addr != NULL) {
>>>> // First, Atomically set it
>>>> ClassLoaderData* old = (ClassLoaderData*)
>>>> Atomic::cmpxchg_ptr(cld, cld_addr, NULL);
>>>> if (old != NULL) {
>>>> delete cld;
>>>> // Returns the data.
>>>> return old;
>>>> }
>>>> }
>>>>
>>>> // We won the race, and therefore the task of adding the data to
>>>> the list of
>>>> // class loader data
>>>> do {
>>>> cld->set_next(next);
>>>> ClassLoaderData* exchanged =
>>>> (ClassLoaderData*)Atomic::cmpxchg_ptr(cld, list_head, next);
>>>> if (exchanged == next) {
>>>> ...
>>>> // Create dependencies after the CLD is added to the list.
>>>> Otherwise,
>>>> // the GC GC will not find the CLD and the _class_loader field
>>>> will
>>>> // not be updated.
>>>> cld->init_dependencies(CHECK_NULL);
>>>> return cld;
>>>> }
>>>> next = exchanged;
>>>> } while (true);
>>>> }
>>>>
>>>> So it explicitly states that we create dependencies after adding to
>>>> the list, but if creating the dependencies fails we return null, but
>>>> leave the new cld in the list. Is that a valid thing to do ??
>>>>
>>>
>>> Not really. If we hit OOM here there isn't a good recovery. We
>>> shouldn't return NULL in any case. I will file a bug. In the
>>> meantime, Mikael can you change it to CATCH_NULL and fix the comment
>>> above (remove "the GC" before "GC")?
>>
>> David is right, it can return null with the pending exception but should
>> be removed from the list and from the class_loader hidden field in this
>> case. This function is dangerous because it has unhandled oops (the
>> address passed in). For now, I think you can leave it Mikael. It'll
>> throw the OOM anyway.
>
> So getting back to the original fix my concern is whether after
> init_dependencies fails, can we ever call add_dependency for that cld
> and so attempt to lock on a NULL object?
If _dependencies is NULL when we call record_dependency I have a hard
time seeing how the current code would avoid crashing.
In fact, it looks pretty likely that we'll crash if we ever fail
init_dependencies.
If we survive all the way to record_dependency the current version of
the code looks like it will assert twice in debug builds and then SEGV
at the call to last->obj_at{_put} in locked_add_dependency.
/Mikael
>
> David
>
>> Coleen
>>
>>>
>>> Coleen
>>>
>>>
>>>
>>>
>>>> David
>>>> -----
>>>>
>>>> On 8/04/2013 11:49 PM, Mikael Gerdin wrote:
>>>>> Hi
>>>>>
>>>>> The problem is that when running the G1 garbage collector a call to
>>>>> objArrayOopDesc::obj_at_put can in rare cases cause the G1 dirty card
>>>>> queue buffer to fill up and this will cause G1 to take
>>>>> DirtyCardQ_CBL_mon/16 to return the full buffer and get a new one.
>>>>>
>>>>> Adding dependencies to a ClassLoaderData is currently protected by the
>>>>> per-metaspace "Metaspace allocation lock"/5 (which protects more than
>>>>> just allocations).
>>>>>
>>>>> Because I want to avoid messing around with the lock ordering I
>>>>> suggest
>>>>> that we use an ObjectLocker on the _dependencies oop in
>>>>> ClassLoaderData.
>>>>> _dependencies is a 2-element objArrayOop, in effect it's an ad-hoc
>>>>> linked list of ClassLoaders/Classes which must be kept alive by this
>>>>> CLD.
>>>>>
>>>>> Using an ObjectLocker on the head element of the linked list should be
>>>>> at least as good as using the metaspace_lock(). There should not be
>>>>> any
>>>>> new deadlock issues with VM mutexes since any application thread could
>>>>> use a similar synchronized linked list construct.
>>>>>
>>>>> As a bonus I suggest that we factor out the dependency handling from
>>>>> ClassLoaderData into a inner class: ClassLoaderData::Dependencies and
>>>>> let Dependencies manage the head of the linked list by itself. This
>>>>> should make it more clear that the dependency list is not guarded by
>>>>> the
>>>>> metaspace lock anymore.
>>>>>
>>>>> Buglinks:
>>>>> https://jbs.oracle.com/bugs/browse/JDK-8010196
>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010196
>>>>>
>>>>> Testing:
>>>>> default jprt run
>>>>> jdk/test/java/lang/invoke/BigArityTest.java (which could reproduce the
>>>>> issue with -XX:+UseG1GC -XX:G1UpdateBufferSize=1)
>>>>>
>>>>> Webrevs:
>>>>> Incremental:
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-fix/webrev
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-factor-out/webrev
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0-rename/webrev
>>>>> Everything:
>>>>> http://cr.openjdk.java.net/~mgerdin/8010196/webrev.0/webrev
>>>>>
>>>>> Thanks
>>>>> /Mikael
>>>
>>
More information about the hotspot-gc-dev
mailing list