RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Feb 16 18:14:01 UTC 2017
On 2/16/17 9:13 AM, Erik Helin wrote:
> On 02/15/2017 10:08 PM, coleen.phillimore at oracle.com wrote:
>> Hi Erik,
>
> Hey Coleen, thanks for reviewing! Please see new webrevs at:
> - incremental: http://cr.openjdk.java.net/~ehelin/8168914/01-02/
> - full: http://cr.openjdk.java.net/~ehelin/8168914/02/
>
>> I thought you were going to hide the declaration of class
>> ChunkedHandleList in classLoaderData.cpp since it is only used
>> internally?
>>
>> And have a field in ClassLoaderData: ChunkedHandleList*
>> _chunked_handle_list;
>
> I didn't want to have a pointer to ChunkedHandleList just because I
> want to "hide" the implementation, I prefer to embed an instance (even
> thought that means I have to declare the struct/class in the .hpp file).
My preference is different but this is fine.
>
>> The class name "Chunk" is so overused!
>
> Haha, I agree that it is quite commonly used :) Fortunately there is
> some context here: Chunk is an inner class/struct to
> ChunkedHandleList. That hopefully gives the reader enough insight to
> distinguish this Chunk from other Chunks in the VM.
>
>> Here ChunkedHandleList should not be a subtype of CHeapObj<mtClass>.
>> It's a ValueObj.
>
> Agree, this is a left-over from an earlier state of the patch. Thanks
> for noticing!
>
>> I don't think it should be a struct either because even though it's
>> internal to ClassLoaderData, the fields could still be accessed outside
>> the class.
>
> Agree, I cleaned that up a bit. Thanks.
>
>> Why is
>>
>> 629 void ClassLoaderData::remove_handle_unsafe(jobject h) {
>>
>>
>> it called unsafe? It is safe to zero the element in the block, isn't
>> it?
>
> Because code using remove_handle_unsafe should be cautious. It is only
> safe to write NULL to the *((oop*) op) if:
> - the jobject originates from a call to ClassLoaderData::add_handle
> (the assert verifies this).
> - *op is the sole remaining pointer to **op (the object).
> - The native memory backing the chunked linked list is not freed.
>
> It is unfortunate that we even need to have the remove_handle_unsafe
> at all. An alternative is to let the only code using
> remove_handle_unsafe (ModuleEntry) do *((oop*) op) = NULL on its own.
>
> Would you prefer that?
>
I answered this in my last mail.
>> One other reason to make it internal is that the CAPACITY of 64 is way
>> too big for the ClassLoaderData of anonymous classes which I think will
>> only have presently 1 (or 2 maybe) oops in it. And there are zillions
>> of them.
>
> Oops, thanks for catching this. I intended to use the same capacity as
> JNIHandleBlock, which uses 32 (don't recall where that 64 came from).
> As for using a smaller size, that seems to be a separate patch? For
> now I would prefer to keep the characteristics the same as for
> JNIHandleBlock.
If the whole ChunkHandleList class was internal, then you could pick 10
or something. I guess we'll see if anyone notices footprint and fix it
then. JNIHandleBlock was 32 so this won't increase footprint.
Thanks - the new webrev looks good.
Coleen
>
>
> Thanks,
> Erik
>
>> Thanks,
>> Coleen
>>
>>
>> On 2/14/17 11:40 AM, Erik Helin wrote:
>>> Hi all,
>>>
>>> this patch aims to solve the bug [0] by making it safe for a GC to
>>> concurrently traverse the oops in a ClassLoaderData.
>>>
>>> The problem right now is that ClassLoaderData::_handles is a
>>> JNIHandleBlock. It is not safe for one thread to iterate over the oops
>>> in a JNIHandleBlock while another thread concurrently adds a new oop
>>> to the JNIHandleBlock.
>>>
>>> My proposed solution is to replace JNIHandleBlock with another data
>>> structure for ClassLoaderData. ClassLoaderData only needs a place to
>>> store oops that a GC can iterate over, it has no use for the rest of
>>> the methods in the JNIHandleBlock class. I have implemented a minimal
>>> chunked linked list data structure (internal to ClassLoaderData) with
>>> only two public methods:
>>> - add (only executed by one thread at a time)
>>> - oops_do (can be executed by any number of threads, possibly
>>> concurrently with a call to `add`)
>>>
>>> ChunkedHandleList uses load_acquire and release_store to synchronize
>>> access to the underlying chunks (arrays). Since ChunkedHandleList
>>> utilizes the (rather) minimal requirements of its user
>>> (ClassLoaderData), I decided to keep the class internal to
>>> ClassLoaderData for now. If other find it useful elsewhere, the we can
>>> try to create a more generalized version (this is trickier than it
>>> appears at first sight, I tried ;)).
>>>
>>> I also changed ClassLoaderData::remove_handle to write NULL to the
>>> oop* (that is represented by a jobject), instead of using a sentinel
>>> oop as done by JNIHandleBlock. The GC closures should be prepared to
>>> handle a field in a Java object becoming NULL, so this should be fine.
>>>
>>> Bug:
>>> https://bugs.openjdk.java.net/browse/JDK-8168914
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~ehelin/8168914/00/
>>>
>>> Testing:
>>> - Tier 1 (aka JPRT), 2, 3, 4, 5.
>>>
>>> I would appreciate quite a few reviews on this patch, it contains a
>>> nice mixture of:
>>> - me venturing into the runtime code :)
>>> - lock free code
>>> - unreproducible bug (even though I'm sure of the root cause)
>>>
>>> Thanks for everyone participating in the discussions around this bug
>>> and potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and
>>> Jiangli.
>>>
>>> Thanks!
>>> Erik
>>>
>>> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914
>>
More information about the hotspot-dev
mailing list