RFR: 8168914: Crash in ClassLoaderData/JNIHandleBlock::oops_do during concurrent marking

Erik Helin erik.helin at oracle.com
Mon Feb 20 15:47:38 UTC 2017


Ok, lets see if we can wrap this up :)  I just uploaded a new version, 03:
- incremental: http://cr.openjdk.java.net/~ehelin/8168914/02-03/
- full: http://cr.openjdk.java.net/~ehelin/8168914/03/

The following changes have been made to the original patch:

00 -> 01:
- Updated copyright year
- Fixed typo in comment
- Updated stale comment
- Removed unnecessary load_aquire when loading c->next

01 -> 02:
- ChunkedHandleList now derives ValueObj instead of CHeapObj
- Both ChunkedHandleList and Chunk are now classes instead of structs
- Changed capacity of chunk to 32 from 64 (same capacity as used by
  JNIHandleBlock)

02 -> 03:
- Only use load_acquire when reading the size for the "head" chunk
- Fix grammar in comment
- Refer to the correct lock in comment

AFAICS (scanning through all the emails) there are now only three
remaining comments:
- Thomas: can we assert in ChunkedHandeList::add that the correct mutex
          is locked?
- Then we would have to pass the mutex along as a parameter to `add`,
  which to me seems unnecessary. Then ChunkedHandleList::add might just
  as well lock the mutex instead of asserting. But thanks for
  highlight this, I noticed I referred to the wrong lock in the comment
  :)

- Coleen: can we remove the word unsafe from
          ClassLoaderData::remove_handle_unsafe?
- I'd like to keep it until we design a better remove_handle function.
  There is only one user of remove_handle_unsafe, so changing the name
  in the future should be easy.

- Kim: can we have something like `convert_x_to_y` instead of
       *((oop*) op) = NULL?
- I would like to defer that to another patch, but I agree that it is
  rather nasty. Mostly because I don't know what the "right" API should
  look like, since this is the first time in run into this problem.

Thanks,
Erik

On 02/14/2017 05:40 PM, Erik Helin wrote:
> Hi all,
> 
> this patch aims to solve the bug [0] by making it safe for a GC to
> concurrently traverse the oops in a ClassLoaderData.
> 
> The problem right now is that ClassLoaderData::_handles is a
> JNIHandleBlock. It is not safe for one thread to iterate over the oops
> in a JNIHandleBlock while another thread concurrently adds a new oop to
> the JNIHandleBlock.
> 
> My proposed solution is to replace JNIHandleBlock with another data
> structure for ClassLoaderData. ClassLoaderData only needs a place to
> store oops that a GC can iterate over, it has no use for the rest of the
> methods in the JNIHandleBlock class. I have implemented a minimal
> chunked linked list data structure (internal to ClassLoaderData) with
> only two public methods:
> - add (only executed by one thread at a time)
> - oops_do (can be executed by any number of threads, possibly
>            concurrently with a call to `add`)
> 
> ChunkedHandleList uses load_acquire and release_store to synchronize
> access to the underlying chunks (arrays). Since ChunkedHandleList
> utilizes the (rather) minimal requirements of its user
> (ClassLoaderData), I decided to keep the class internal to
> ClassLoaderData for now. If other find it useful elsewhere, the we can
> try to create a more generalized version (this is trickier than it
> appears at first sight, I tried ;)).
> 
> I also changed ClassLoaderData::remove_handle to write NULL to the oop*
> (that is represented by a jobject), instead of using a sentinel oop as
> done by JNIHandleBlock. The GC closures should be prepared to handle a
> field in a Java object becoming NULL, so this should be fine.
> 
> Bug:
> https://bugs.openjdk.java.net/browse/JDK-8168914
> 
> Webrev:
> http://cr.openjdk.java.net/~ehelin/8168914/00/
> 
> Testing:
> - Tier 1 (aka JPRT), 2, 3, 4, 5.
> 
> I would appreciate quite a few reviews on this patch, it contains a nice
> mixture of:
> - me venturing into the runtime code :)
> - lock free code
> - unreproducible bug (even though I'm sure of the root cause)
> 
> Thanks for everyone participating in the discussions around this bug and
> potential solutions: Volker, Coleen, Mikael G, StefanK, Erik Ö and Jiangli.
> 
> Thanks!
> Erik
> 
> [0]: https://bugs.openjdk.java.net/browse/JDK-8168914


More information about the hotspot-dev mailing list