RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
David Holmes
david.holmes at oracle.com
Mon Feb 20 22:52:14 UTC 2017
Hi Erik,
On 21/02/2017 1:47 AM, Erik Helin wrote:
> Ok, lets see if we can wrap this up :) I just uploaded a new version, 03:
> - incremental: http://cr.openjdk.java.net/~ehelin/8168914/02-03/
> - full: http://cr.openjdk.java.net/~ehelin/8168914/03/
>
> The following changes have been made to the original patch:
>
> 00 -> 01:
> - Updated copyright year
> - Fixed typo in comment
> - Updated stale comment
> - Removed unnecessary load_aquire when loading c->next
>
> 01 -> 02:
> - ChunkedHandleList now derives ValueObj instead of CHeapObj
> - Both ChunkedHandleList and Chunk are now classes instead of structs
> - Changed capacity of chunk to 32 from 64 (same capacity as used by
> JNIHandleBlock)
>
> 02 -> 03:
> - Only use load_acquire when reading the size for the "head" chunk
Sorry but I don't like the way this is done - the conditional may end up
being more expensive than the unnecessary load-acquire. Unrolling the
first loop iteration, as per the email discussion, is a better way to go
IMO.
And as Kim indicated you shouldn't need any casts for the OrderAccess
ops as you should be using the "void*" versions.
> - Fix grammar in comment
> - Refer to the correct lock in comment
>
> AFAICS (scanning through all the emails) there are now only three
> remaining comments:
> - Thomas: can we assert in ChunkedHandeList::add that the correct mutex
> is locked?
> - Then we would have to pass the mutex along as a parameter to `add`,
Why do you need to pass it in?
oop* ClassLoaderData::ChunkedHandleList::add(oop o) {
assert(ClassLoaderData::metaspace_lock()->owned_by_self(), "should
own lock");
Thanks,
David
> which to me seems unnecessary. Then ChunkedHandleList::add might just
> as well lock the mutex instead of asserting. But thanks for
> highlight this, I noticed I referred to the wrong lock in the comment
> :)
>
> - Coleen: can we remove the word unsafe from
> ClassLoaderData::remove_handle_unsafe?
> - I'd like to keep it until we design a better remove_handle function.
> There is only one user of remove_handle_unsafe, so changing the name
> in the future should be easy.
>
> - Kim: can we have something like `convert_x_to_y` instead of
> *((oop*) op) = NULL?
> - I would like to defer that to another patch, but I agree that it is
> rather nasty. Mostly because I don't know what the "right" API should
> look like, since this is the first time in run into this problem.
>
> Thanks,
> Erik
>
> On 02/14/2017 05:40 PM, Erik Helin wrote:
>> Hi all,
>>
>> this patch aims to solve the bug [0] by making it safe for a GC to
>> concurrently traverse the oops in a ClassLoaderData.
>>
>> The problem right now is that ClassLoaderData::_handles is a
>> JNIHandleBlock. It is not safe for one thread to iterate over the oops
>> in a JNIHandleBlock while another thread concurrently adds a new oop to
>> the JNIHandleBlock.
>>
>> My proposed solution is to replace JNIHandleBlock with another data
>> structure for ClassLoaderData. ClassLoaderData only needs a place to
>> store oops that a GC can iterate over, it has no use for the rest of the
>> methods in the JNIHandleBlock class. I have implemented a minimal
>> chunked linked list data structure (internal to ClassLoaderData) with
>> only two public methods:
>> - add (only executed by one thread at a time)
>> - oops_do (can be executed by any number of threads, possibly
>> concurrently with a call to `add`)
>>
>> ChunkedHandleList uses load_acquire and release_store to synchronize
>> access to the underlying chunks (arrays). Since ChunkedHandleList
>> utilizes the (rather) minimal requirements of its user
>> (ClassLoaderData), I decided to keep the class internal to
>> ClassLoaderData for now. If other find it useful elsewhere, the we can
>> try to create a more generalized version (this is trickier than it
>> appears at first sight, I tried ;)).
>>
>> I also changed ClassLoaderData::remove_handle to write NULL to the oop*
>> (that is represented by a jobject), instead of using a sentinel oop as
>> done by JNIHandleBlock. The GC closures should be prepared to handle a
>> field in a Java object becoming NULL, so this should be fine.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8168914
>>
>> Webrev:
>> http://cr.openjdk.java.net/~ehelin/8168914/00/
>>
>> Testing:
>> - Tier 1 (aka JPRT), 2, 3, 4, 5.
>>
>> I would appreciate quite a few reviews on this patch, it contains a nice
>> mixture of:
>> - me venturing into the runtime code :)
>> - lock free code
>> - unreproducible bug (even though I'm sure of the root cause)
>>
>> Thanks for everyone participating in the discussions around this bug and
>> potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and Jiangli.
>>
>> Thanks!
>> Erik
>>
>> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914
More information about the hotspot-dev
mailing list