RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
Erik Österlund
erik.osterlund at oracle.com
Wed Feb 15 15:22:44 UTC 2017
Hi Erik,
I looked at the memory ordering aspect and I think it looks good as I
discussed with you, now that the unnecessary synchronization on the
immutable next value has been removed.
Thanks,
/Erik
On 2017-02-15 16:07, Erik Helin wrote:
> On 02/15/2017 02:48 AM, David Holmes wrote:
>> Hi Erik,
>
> Hi David,
>
> thanks for having a look! Please see new patches at:
> - incremental: http://cr.openjdk.java.net/~ehelin/8168914/00-01/
> - full: http://cr.openjdk.java.net/~ehelin/8168914/01/
>
>> oops_do can miss an oop that is in the process of being added - is that
>> a problem?
>
> Nope, that is not an issue. Even if `add_handle` and `oops_do` where
> synchronized with each other (for example by using lock), you can
> still have the scenario where a GC thread first runs (and finishes)
> `oops_do` and then `add_handle` is called (a linearized execution).
>
> This scenario is something the concurrent marking algorithms already
> should be prepared to handle.
>
>> 140 if (c->data[i] != NULL) {
>> 141 f->do_oop(&c->data[i]);
>> 142 }
>>
>> Given a slot can be nulled out concurrently at any time, is it worth
>> does this NULL check? The OopClosure will have to do its own NULL check
>> anyway.
>
> Sure, this was mostly to be a bit consistent with JNIHandleBlock (it
> does the same NULL check). Think of it as an optimization, there is no
> reason to call `f->do_oop` if `c->data[i] == NULL`. Do you think it
> reads better if the NULL check is removed? I don't have a strong
> opinion on this one.
>
>> 144 c = (Chunk*) OrderAccess::load_ptr_acquire((volatile
>> intptr_t*)&c->next);
>>
>> This doesn't need to be a load-acquire. You already loaded 'c' via
>> load-acquire of _head (or chained therefrom) and its next field is set
>> prior to the setting of the _head that you read.
>
> Agree. I just discussed this with Erik Ö as well, and we all agree on
> this. I also removed the `volatile` specifier for `next`.
>
>> 624 jobject ClassLoaderData::add_handle(Handle h) {
>> 625 MutexLockerEx ml(metaspace_lock(),
>> Mutex::_no_safepoint_check_flag);
>> 626 return (jobject) _handles.add(h());
>> 627 }
>> 628
>> 629 void ClassLoaderData::remove_handle_unsafe(jobject h) {
>> 630 assert(_handles.contains((oop*) h), "Got unexpected handle "
>> PTR_FORMAT, p2i((oop*) h));
>> 631 *((oop*) h) = NULL;
>> 632 }
>>
>> I'm a bit unclear on the typing here. Previously we use a JNI Handle
>> which was a jobject. But now we've simply stored an oop into a slot in
>> an array. We pass the address of that slot back as a jobject, but is it
>> really?
>
> So, this is a bit confusing, and I discussed this with Coleen a few
> days ago. jobject was used originally used because that is what
> `JNIHandleBlock::allocate_handle` returns. Using something other than
> jobject means updating `ModuleEntry` and in particular users
> `ModuleEntry::_pd`. This is not something we are keen on doing right
> now for JDK 9, and Coleen is planning to clean up how the runtime
> handles oops for 10 (or at least improve the situation).
>
> I don't know what jobject is meant to represent, in the code it is
> just an empty class (an opaque type).
>
> ClassLoaderData::add_handle could just as well return an oop*, but
> AFAIU we try to avoid oops and oop* in the runtime code (because that
> might mean that some code forgot to tell the GC about the oop). Maybe
> I'm wrong here?
>
> > I would also expect contains to take an oop not an oop* - it
> > seems to be asking "is this an address of a slot in our
> > ChunkedHandleList" rather than asking "is this oop in our
> > ChunkedHandleList". ??
>
> I actually wanted the assert to check whether the passed jobject
> (which really is an oop*) actually originated from a call to
> ClassLoaderData::add_handle. I don't want any code to pass a jobject
> to ClassLoaderData::remove_handle_unsafe that doesn't come from a call
> to ClassLoaderData::add_handle.
>
>> A few minor comments:
>>
>> Copyrights need updating.
>
> Aaah, it is that time of the year again. Fixed.
>
>> src/share/vm/classfile/classLoaderData.hpp
>>
>> 177 // Only on thread ...
>>
>> Typo: on -> one
>
> Thanks, fixed.
>
>> src/share/vm/classfile/moduleEntry.cpp
>>
>> 90 // Create a JNI handle for the shared ProtectionDomain and save it
>> atomically.
>> 91 // If someone beats us setting the _pd cache, the created JNI
>> handle is destroyed.
>>
>> These are no longer JNI Handles.
>
> Thanks, I updated the comment.
>
> Thanks,
> Erik
>
>> Thanks,
>> David
>> -----
>>
>>> Testing:
>>> - Tier 1 (aka JPRT), 2, 3, 4, 5.
>>>
>>> I would appreciate quite a few reviews on this patch, it contains a
>>> nice
>>> mixture of:
>>> - me venturing into the runtime code :)
>>> - lock free code
>>> - unreproducible bug (even though I'm sure of the root cause)
>>>
>>> Thanks for everyone participating in the discussions around this bug
>>> and
>>> potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and
>>> Jiangli.
>>>
>>> Thanks!
>>> Erik
>>>
>>> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914
More information about the hotspot-dev
mailing list