RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking

Erik Helin erik.helin at oracle.com
Tue Feb 14 16:40:55 UTC 2017


Hi all,

this patch aims to solve the bug [0] by making it safe for a GC to 
concurrently traverse the oops in a ClassLoaderData.

The problem right now is that ClassLoaderData::_handles is a 
JNIHandleBlock. It is not safe for one thread to iterate over the oops 
in a JNIHandleBlock while another thread concurrently adds a new oop to 
the JNIHandleBlock.

My proposed solution is to replace JNIHandleBlock with another data 
structure for ClassLoaderData. ClassLoaderData only needs a place to 
store oops that a GC can iterate over, it has no use for the rest of the 
methods in the JNIHandleBlock class. I have implemented a minimal 
chunked linked list data structure (internal to ClassLoaderData) with 
only two public methods:
- add (only executed by one thread at a time)
- oops_do (can be executed by any number of threads, possibly
            concurrently with a call to `add`)

ChunkedHandleList uses load_acquire and release_store to synchronize 
access to the underlying chunks (arrays). Since ChunkedHandleList 
utilizes the (rather) minimal requirements of its user 
(ClassLoaderData), I decided to keep the class internal to 
ClassLoaderData for now. If other find it useful elsewhere, the we can 
try to create a more generalized version (this is trickier than it 
appears at first sight, I tried ;)).

I also changed ClassLoaderData::remove_handle to write NULL to the oop* 
(that is represented by a jobject), instead of using a sentinel oop as 
done by JNIHandleBlock. The GC closures should be prepared to handle a 
field in a Java object becoming NULL, so this should be fine.

Bug:
https://bugs.openjdk.java.net/browse/JDK-8168914

Webrev:
http://cr.openjdk.java.net/~ehelin/8168914/00/

Testing:
- Tier 1 (aka JPRT), 2, 3, 4, 5.

I would appreciate quite a few reviews on this patch, it contains a nice 
mixture of:
- me venturing into the runtime code :)
- lock free code
- unreproducible bug (even though I'm sure of the root cause)

Thanks for everyone participating in the discussions around this bug and 
potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and Jiangli.

Thanks!
Erik

[0]: https://bugs.openjdk.java.net/browse/JDK-8168914


More information about the hotspot-dev mailing list