RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking

Erik Helin erik.helin at oracle.com
Thu Feb 16 14:13:53 UTC 2017


On 02/15/2017 10:08 PM, coleen.phillimore at oracle.com wrote:
> Hi Erik,

Hey Coleen, thanks for reviewing! Please see new webrevs at:
- incremental: http://cr.openjdk.java.net/~ehelin/8168914/01-02/
- full: http://cr.openjdk.java.net/~ehelin/8168914/02/

> I thought you were going to hide the declaration of class
> ChunkedHandleList in classLoaderData.cpp since it is only used internally?
>
> And have a field in ClassLoaderData:   ChunkedHandleList*
> _chunked_handle_list;

I didn't want to have a pointer to ChunkedHandleList just because I want 
to "hide" the implementation, I prefer to embed an instance (even 
thought that means I have to declare the struct/class in the .hpp file).

> The class name "Chunk" is so overused!

Haha, I agree that it is quite commonly used :) Fortunately there is 
some context here: Chunk is an inner class/struct to ChunkedHandleList. 
That hopefully gives the reader enough insight to distinguish this Chunk 
from other Chunks in the VM.

> Here ChunkedHandleList should not be a subtype of CHeapObj<mtClass>.
> It's a ValueObj.

Agree, this is a left-over from an earlier state of the patch. Thanks 
for noticing!

> I don't think it should be a struct either because even though it's
> internal to ClassLoaderData, the fields could still be accessed outside
> the class.

Agree, I cleaned that up a bit. Thanks.

> Why is
>
> 629 void ClassLoaderData::remove_handle_unsafe(jobject h) {
>
>
> it called unsafe?  It is safe to zero the element in the block, isn't it?

Because code using remove_handle_unsafe should be cautious. It is only 
safe to write NULL to the *((oop*) op) if:
- the jobject originates from a call to ClassLoaderData::add_handle
   (the assert verifies this).
- *op is the sole remaining pointer to **op (the object).
- The native memory backing the chunked linked list is not freed.

It is unfortunate that we even need to have the remove_handle_unsafe at 
all. An alternative is to let the only code using remove_handle_unsafe 
(ModuleEntry) do *((oop*) op) = NULL on its own.

Would you prefer that?

> One other reason to make it internal is that the CAPACITY of 64 is way
> too big for the ClassLoaderData of anonymous classes which I think will
> only have presently 1 (or 2 maybe) oops in it.   And there are zillions
> of them.

Oops, thanks for catching this. I intended to use the same capacity as 
JNIHandleBlock, which uses 32 (don't recall where that 64 came from). As 
for using a smaller size, that seems to be a separate patch? For now I 
would prefer to keep the characteristics the same as for JNIHandleBlock.

Thanks,
Erik

> Thanks,
> Coleen
>
>
> On 2/14/17 11:40 AM, Erik Helin wrote:
>> Hi all,
>>
>> this patch aims to solve the bug [0] by making it safe for a GC to
>> concurrently traverse the oops in a ClassLoaderData.
>>
>> The problem right now is that ClassLoaderData::_handles is a
>> JNIHandleBlock. It is not safe for one thread to iterate over the oops
>> in a JNIHandleBlock while another thread concurrently adds a new oop
>> to the JNIHandleBlock.
>>
>> My proposed solution is to replace JNIHandleBlock with another data
>> structure for ClassLoaderData. ClassLoaderData only needs a place to
>> store oops that a GC can iterate over, it has no use for the rest of
>> the methods in the JNIHandleBlock class. I have implemented a minimal
>> chunked linked list data structure (internal to ClassLoaderData) with
>> only two public methods:
>> - add (only executed by one thread at a time)
>> - oops_do (can be executed by any number of threads, possibly
>>            concurrently with a call to `add`)
>>
>> ChunkedHandleList uses load_acquire and release_store to synchronize
>> access to the underlying chunks (arrays). Since ChunkedHandleList
>> utilizes the (rather) minimal requirements of its user
>> (ClassLoaderData), I decided to keep the class internal to
>> ClassLoaderData for now. If other find it useful elsewhere, the we can
>> try to create a more generalized version (this is trickier than it
>> appears at first sight, I tried ;)).
>>
>> I also changed ClassLoaderData::remove_handle to write NULL to the
>> oop* (that is represented by a jobject), instead of using a sentinel
>> oop as done by JNIHandleBlock. The GC closures should be prepared to
>> handle a field in a Java object becoming NULL, so this should be fine.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8168914
>>
>> Webrev:
>> http://cr.openjdk.java.net/~ehelin/8168914/00/
>>
>> Testing:
>> - Tier 1 (aka JPRT), 2, 3, 4, 5.
>>
>> I would appreciate quite a few reviews on this patch, it contains a
>> nice mixture of:
>> - me venturing into the runtime code :)
>> - lock free code
>> - unreproducible bug (even though I'm sure of the root cause)
>>
>> Thanks for everyone participating in the discussions around this bug
>> and potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and
>> Jiangli.
>>
>> Thanks!
>> Erik
>>
>> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914
>


More information about the hotspot-dev mailing list