RFR (s): JDK-8013129 Possible deadlock with Metaspace locks due to	mixed usage of safepoint aware and non-safepoint aware locking
    Mikael Gerdin 
    mikael.gerdin at oracle.com
       
    Thu Apr 25 02:10:16 PDT 2013
    
    
  
Hi,
Problem:
We've seen some hangs in the GC nightly testing recently.
When I looked at the minidump files from some of those hangs they looked 
like safepoint deadlocks where one thread was parked in 
Mutex::lock_without_safepoint_check on one of the "Metaspace allocation 
locks".
Both of the hangs I investigated also had threads trying to lock the 
same Metaspace lock but without eliding safepoint checks because they 
were originating from Metaspace::deallocate.
I believe that since the change to allocate MethodCounters on demand and 
potentially deallocating them when racing this issue was brought to the 
surface because of more frequent calls to Metaspace::deallocate when not 
at a safepoint.
I was able to reproduce the hang after about an hour by running an 
instrumented build where MethodCounters are allocated and then 
unconditionally deallocated on each entry to Method::build_method_counters.
I can't describe the failure mode in detail since I'm not familiar with 
the Mutex code but I can imagine that the locking state machine is 
broken when we take completely different code paths for the same Mutex.
Suggested fix:
My suggested fix is to change Metaspace::deallocate to take the lock 
with Mutex::_no_safepoint_check_flag.
With my fix I ran the test that managed to reproduce the failure 
overnight without reproducing the hang.
I also ran the parallel class loading tests and nashorn's 
test262parallel for good measure.
Webrev: http://cr.openjdk.java.net/~mgerdin/8013129/webrev.0/
JBS: https://jbs.oracle.com/bugs/browse/JDK-8013129
bugs.sun.com: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013129
/Mikael
    
    
More information about the hotspot-runtime-dev
mailing list