RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking

Erik Österlund erik.osterlund at oracle.com
Wed Feb 15 15:22:44 UTC 2017


Hi Erik,

I looked at the memory ordering aspect and I think it looks good as I 
discussed with you, now that the unnecessary synchronization on the 
immutable next value has been removed.

Thanks,
/Erik

On 2017-02-15 16:07, Erik Helin wrote:
> On 02/15/2017 02:48 AM, David Holmes wrote:
>> Hi Erik,
>
> Hi David,
>
> thanks for having a look! Please see new patches at:
> - incremental: http://cr.openjdk.java.net/~ehelin/8168914/00-01/
> - full: http://cr.openjdk.java.net/~ehelin/8168914/01/
>
>> oops_do can miss an oop that is in the process of being added - is that
>> a problem?
>
> Nope, that is not an issue. Even if `add_handle` and `oops_do` where 
> synchronized with each other (for example by using lock), you can 
> still have the scenario where a GC thread first runs (and finishes) 
> `oops_do` and then `add_handle` is called (a linearized execution).
>
> This scenario is something the concurrent marking algorithms already 
> should be prepared to handle.
>
>>  140       if (c->data[i] != NULL) {
>>  141         f->do_oop(&c->data[i]);
>>  142       }
>>
>> Given a slot can be nulled out concurrently at any time, is it worth
>> does this NULL check? The OopClosure will have to do its own NULL check
>> anyway.
>
> Sure, this was mostly to be a bit consistent with JNIHandleBlock (it 
> does the same NULL check). Think of it as an optimization, there is no 
> reason to call `f->do_oop` if `c->data[i] == NULL`. Do you think it 
> reads better if the NULL check is removed? I don't have a strong 
> opinion on this one.
>
>>  144     c = (Chunk*) OrderAccess::load_ptr_acquire((volatile
>> intptr_t*)&c->next);
>>
>> This doesn't need to be a load-acquire. You already loaded 'c' via
>> load-acquire of _head (or chained therefrom) and its next field is set
>> prior to the setting of the _head that you read.
>
> Agree. I just discussed this with Erik Ö as well, and we all agree on 
> this. I also removed the `volatile` specifier for `next`.
>
>>  624 jobject ClassLoaderData::add_handle(Handle h) {
>>  625   MutexLockerEx ml(metaspace_lock(), 
>> Mutex::_no_safepoint_check_flag);
>>  626   return (jobject) _handles.add(h());
>>  627 }
>>  628
>>  629 void ClassLoaderData::remove_handle_unsafe(jobject h) {
>>  630   assert(_handles.contains((oop*) h), "Got unexpected handle "
>> PTR_FORMAT, p2i((oop*) h));
>>  631   *((oop*) h) = NULL;
>>  632 }
>>
>> I'm a bit unclear on the typing here. Previously we use a JNI Handle
>> which was a jobject. But now we've simply stored an oop into a slot in
>> an array. We pass the address of that slot back as a jobject, but is it
>> really?
>
> So, this is a bit confusing, and I discussed this with Coleen a few 
> days ago. jobject was used originally used because that is what 
> `JNIHandleBlock::allocate_handle` returns. Using something other than 
> jobject means updating `ModuleEntry` and in particular users 
> `ModuleEntry::_pd`. This is not something we are keen on doing right 
> now for JDK 9, and Coleen is planning to clean up how the runtime 
> handles oops for 10 (or at least improve the situation).
>
> I don't know what jobject is meant to represent, in the code it is 
> just an empty class (an opaque type).
>
> ClassLoaderData::add_handle could just as well return an oop*, but 
> AFAIU we try to avoid oops and oop* in the runtime code (because that 
> might mean that some code forgot to tell the GC about the oop). Maybe 
> I'm wrong here?
>
> > I would also expect contains to take an oop not an oop* - it
> > seems to be asking "is this an address of a slot in our
> > ChunkedHandleList" rather than asking "is this oop in our
> > ChunkedHandleList". ??
>
> I actually wanted the assert to check whether the passed jobject 
> (which really is an oop*) actually originated from a call to 
> ClassLoaderData::add_handle. I don't want any code to pass a jobject 
> to ClassLoaderData::remove_handle_unsafe that doesn't come from a call 
> to ClassLoaderData::add_handle.
>
>> A few minor comments:
>>
>> Copyrights need updating.
>
> Aaah, it is that time of the year again. Fixed.
>
>> src/share/vm/classfile/classLoaderData.hpp
>>
>> 177     // Only on thread ...
>>
>> Typo: on -> one
>
> Thanks, fixed.
>
>> src/share/vm/classfile/moduleEntry.cpp
>>
>> 90   // Create a JNI handle for the shared ProtectionDomain and save it
>> atomically.
>>  91   // If someone beats us setting the _pd cache, the created JNI
>> handle is destroyed.
>>
>> These are no longer JNI Handles.
>
> Thanks, I updated the comment.
>
> Thanks,
> Erik
>
>> Thanks,
>> David
>> -----
>>
>>> Testing:
>>> - Tier 1 (aka JPRT), 2, 3, 4, 5.
>>>
>>> I would appreciate quite a few reviews on this patch, it contains a 
>>> nice
>>> mixture of:
>>> - me venturing into the runtime code :)
>>> - lock free code
>>> - unreproducible bug (even though I'm sure of the root cause)
>>>
>>> Thanks for everyone participating in the discussions around this bug 
>>> and
>>> potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and
>>> Jiangli.
>>>
>>> Thanks!
>>> Erik
>>>
>>> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914



More information about the hotspot-dev mailing list