RFR (s): JDK-8013129 Possible deadlock with Metaspace locks due to mixed usage of safepoint aware and non-safepoint aware locking

Mikael Gerdin mikael.gerdin at oracle.com
Thu Apr 25 12:03:44 UTC 2013


Coleen,

On 04/25/2013 01:57 PM, Coleen Phillimore wrote:
>
> Mikael,
>
> I believe this change is correct.  I think if we elide safepoint locks
> for a lock sometimes, we always have to elide safepoint checks for that
> lock.   David will correct me if I'm wrong.

I also believe that the change is correct.
And I hope that David will chime in on this one.

/Mikael

>
> Coleen
>
> On 4/25/2013 5:10 AM, Mikael Gerdin wrote:
>> Hi,
>>
>> Problem:
>> We've seen some hangs in the GC nightly testing recently.
>> When I looked at the minidump files from some of those hangs they
>> looked like safepoint deadlocks where one thread was parked in
>> Mutex::lock_without_safepoint_check on one of the "Metaspace
>> allocation locks".
>>
>> Both of the hangs I investigated also had threads trying to lock the
>> same Metaspace lock but without eliding safepoint checks because they
>> were originating from Metaspace::deallocate.
>>
>> I believe that since the change to allocate MethodCounters on demand
>> and potentially deallocating them when racing this issue was brought
>> to the surface because of more frequent calls to Metaspace::deallocate
>> when not at a safepoint.
>>
>> I was able to reproduce the hang after about an hour by running an
>> instrumented build where MethodCounters are allocated and then
>> unconditionally deallocated on each entry to
>> Method::build_method_counters.
>>
>> I can't describe the failure mode in detail since I'm not familiar
>> with the Mutex code but I can imagine that the locking state machine
>> is broken when we take completely different code paths for the same
>> Mutex.
>>
>> Suggested fix:
>> My suggested fix is to change Metaspace::deallocate to take the lock
>> with Mutex::_no_safepoint_check_flag.
>>
>> With my fix I ran the test that managed to reproduce the failure
>> overnight without reproducing the hang.
>> I also ran the parallel class loading tests and nashorn's
>> test262parallel for good measure.
>>
>> Webrev: http://cr.openjdk.java.net/~mgerdin/8013129/webrev.0/
>> JBS: https://jbs.oracle.com/bugs/browse/JDK-8013129
>> bugs.sun.com: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8013129
>>
>> /Mikael
>




More information about the hotspot-gc-dev mailing list