RFR (XS) 8010723 - possible deadlock with SystemDictionary_lock

Ioi Lam ioi.lam at oracle.com
Fri Mar 29 09:07:32 PDT 2013


Karen & Coleen,

Thanks for the review.

As I re-read the code and then re-read again, it seems that:


[1] If the class is already being loaded in another thread, a
     place holder with LOAD_INSTANCE key will exist

[2] We will wait for SystemDictionary_lock only with this condition:

     if (class_loader.is_null()) {
         SystemDictionary_lock->wait();
     } else {
         // case 2: traditional with broken classloader lock. wait on first
         // requestor.
         double_lock_wait(lockObject, THREAD); /* calls
SystemDictionary_lock->wait() */
     }

Question: does this mean that for non-boot, parallel-capable class 
loaders, two threads
will both load the same classfile, and only one of them would succeed in 
putting
the loaded Klass into the SystemDictionary?

Anyway, I've changed the comment into the following. Is it OK?

     // This must be done outside of the SystemDictionary_lock to
     // avoid deadlock.
     //
     // Note that Klass::restore_unshareable_info (called via
     // load_instance_class above) is also called outside
     // of SystemDictionary_lock. All other threads that are
     // trying to load this class in the boot classloader
     // are already blocked above at
     // if (class_loader.is_null()) {SystemDictionary_lock->wait();}
     //
     // This need to be re-thought when parallel-capable non-boot
     // classloaders are supported by CDS (today they're not).


Thanks
- Ioi

On 03/29/2013 07:22 AM, Coleen Phillimore wrote:
>
> Hi Ioi,
>
> I think the fact that this class is in the placeholder table with 
> LOAD_INSTANCE key is what makes this remove_unshareable_info() thread 
> safe, not the SystemDictionary_lock->wait() call.   Since we remove 
> the LOAD_INSTANCE entry below, changing the comment to say this will 
> prevent someone from rearranging this code.
> Also we shouldn't put bug numbers in the code, except for very rare 
> cases, so this one should be removed.
>
> The logic looks correct.
> Thanks,
> Coleen
>
>
>
> On 3/29/2013 10:00 AM, Karen Kinnear wrote:
>> Ioi,
>>
>> I take it back, this fix works fine. A bootstrap class loading is 
>> waiting on
>> the SystemDictionary_lock and the oldprobe->instance_load_in_progress().
>>
>> So, the remove_unshareable_info and restore_unshareable_info today
>> are safe because they are single-threaded within the 
>> instance_load_in_progress
>> specifically for the boot loader.
>>
>> Might want to add that to the comment. Other parallel class loaders 
>> have to be
>> multi-threaded through load_instance to prevent deadlocks so if we 
>> use a shared
>> archive for them, we would need a different solution.
>>
>> thanks,
>> Karen
>>
>> On Mar 29, 2013, at 9:06 AM, Karen Kinnear wrote:
>>
>>> Ioi,
>>>
>>> Thank you for jumping on this so quickly. I totally appreciate the 
>>> detailed comments
>>> in the code.
>>>
>>> The comment about the the other threads waiting on the SD_lock is 
>>> not accurate -
>>> there is a wait above, but there is a notify in find_or_define which 
>>> means that other
>>> threads will get through.
>>>
>>> Does the solution depend on this?
>>>
>>> thanks,
>>> Karen
>>>
>>> On Mar 28, 2013, at 6:17 PM, Ioi Lam wrote:
>>>
>>>> Please review:
>>>>
>>>> http://cr.openjdk.java.net/~iklam/8010723/sysdic_lock_001/
>>>>
>>>> Bug: fatal error: acquiring lock Metaspace allocation lock/5 out
>>>> of order with lock SystemDictionary_lock/4 -- possible deadlock
>>>>
>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010723
>>>> https://jbs.oracle.com/bugs/browse/JDK-8010723
>>>>
>>>> Summary of fix:
>>>>
>>>>     This crash is caused by the patch in JDK-8007725
>>>> http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/1fc4d4768b90
>>>>
>>>>     The code was calling Klass::remove_unshareable_info()(from within
>>>> SystemDictionary::clean_up_shared_class)while
>>>>     holding SystemDictionary_lock. This is unnecessary and possibly
>>>>     unsafe.
>>>>
>>>>     I changed the code to call Klass::remove_unshareable_info()outside
>>>>     of the SystemDictionary_lock. This is OK
>>>>     because Klass::restore_unshareable_info()is also being called 
>>>> outside
>>>>     of SystemDictionary_lock.
>>>>
>>>> Tests:
>>>>
>>>>     I am running the UTE stress tests now.
>>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130329/696764af/attachment-0001.html 


More information about the hotspot-runtime-dev mailing list