RFR (XS) 8010723 - possible deadlock with SystemDictionary_lock

Coleen Phillimore coleen.phillimore at oracle.com
Fri Mar 29 10:38:38 PDT 2013


Ok, I see the code now.  Your comment only mentions #2 but #1 is also 
necessary to keep another class from loading this one.  I'm still 
worried that someone could not know and move this cleanup down after the 
placeholder is removed.   See below.

On 3/29/2013 12:07 PM, Ioi Lam wrote:
> Karen & Coleen,
>
> Thanks for the review.
>
> As I re-read the code and then re-read again, it seems that:
>
>
> [1] If the class is already being loaded in another thread, a
>     place holder with LOAD_INSTANCE key will exist
>
> [2] We will wait for SystemDictionary_lock only with this condition:
>
>     if (class_loader.is_null()) {
>         SystemDictionary_lock->wait();
>     } else {
>         // case 2: traditional with broken classloader lock. wait on first
>         // requestor.
>         double_lock_wait(lockObject, THREAD); /* calls
> SystemDictionary_lock->wait() */
>     }
>
> Question: does this mean that for non-boot, parallel-capable class 
> loaders, two threads
> will both load the same classfile, and only one of them would succeed 
> in putting
> the loaded Klass into the SystemDictionary?
>
> Anyway, I've changed the comment into the following. Is it OK?
>
>     // This must be done outside of the SystemDictionary_lock to
>     // avoid deadlock.
>     //
>     // Note that Klass::restore_unshareable_info (called via
>     // load_instance_class above) is also called outside
>     // of SystemDictionary_lock. All other threads that are
>     // trying to load this class in the boot classloader
>     // are already blocked above at
>     // if (class_loader.is_null()) {SystemDictionary_lock->wait();}
How about

       // Other threads are blocked from loading this class because they are
       // are waiting on the SystemDictionary_lock until this thread removes
       // the placeholder below.

thanks,
Coleen
>     //
>     // This need to be re-thought when parallel-capable non-boot
>     // classloaders are supported by CDS (today they're not).
>
>
> Thanks
> - Ioi
>
> On 03/29/2013 07:22 AM, Coleen Phillimore wrote:
>>
>> Hi Ioi,
>>
>> I think the fact that this class is in the placeholder table with 
>> LOAD_INSTANCE key is what makes this remove_unshareable_info() thread 
>> safe, not the SystemDictionary_lock->wait() call.   Since we remove 
>> the LOAD_INSTANCE entry below, changing the comment to say this will 
>> prevent someone from rearranging this code.
>> Also we shouldn't put bug numbers in the code, except for very rare 
>> cases, so this one should be removed.
>>
>> The logic looks correct.
>> Thanks,
>> Coleen
>>
>>
>>
>> On 3/29/2013 10:00 AM, Karen Kinnear wrote:
>>> Ioi,
>>>
>>> I take it back, this fix works fine. A bootstrap class loading is 
>>> waiting on
>>> the SystemDictionary_lock and the oldprobe->instance_load_in_progress().
>>>
>>> So, the remove_unshareable_info and restore_unshareable_info today
>>> are safe because they are single-threaded within the 
>>> instance_load_in_progress
>>> specifically for the boot loader.
>>>
>>> Might want to add that to the comment. Other parallel class loaders 
>>> have to be
>>> multi-threaded through load_instance to prevent deadlocks so if we 
>>> use a shared
>>> archive for them, we would need a different solution.
>>>
>>> thanks,
>>> Karen
>>>
>>> On Mar 29, 2013, at 9:06 AM, Karen Kinnear wrote:
>>>
>>>> Ioi,
>>>>
>>>> Thank you for jumping on this so quickly. I totally appreciate the 
>>>> detailed comments
>>>> in the code.
>>>>
>>>> The comment about the the other threads waiting on the SD_lock is 
>>>> not accurate -
>>>> there is a wait above, but there is a notify in find_or_define 
>>>> which means that other
>>>> threads will get through.
>>>>
>>>> Does the solution depend on this?
>>>>
>>>> thanks,
>>>> Karen
>>>>
>>>> On Mar 28, 2013, at 6:17 PM, Ioi Lam wrote:
>>>>
>>>>> Please review:
>>>>>
>>>>> http://cr.openjdk.java.net/~iklam/8010723/sysdic_lock_001/
>>>>>
>>>>> Bug: fatal error: acquiring lock Metaspace allocation lock/5 out
>>>>> of order with lock SystemDictionary_lock/4 -- possible deadlock
>>>>>
>>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8010723
>>>>> https://jbs.oracle.com/bugs/browse/JDK-8010723
>>>>>
>>>>> Summary of fix:
>>>>>
>>>>>     This crash is caused by the patch in JDK-8007725
>>>>> http://hg.openjdk.java.net/hsx/hsx25/hotspot/rev/1fc4d4768b90
>>>>>
>>>>>     The code was calling Klass::remove_unshareable_info()(from within
>>>>> SystemDictionary::clean_up_shared_class)while
>>>>>     holding SystemDictionary_lock. This is unnecessary and possibly
>>>>>     unsafe.
>>>>>
>>>>>     I changed the code to call Klass::remove_unshareable_info()outside
>>>>>     of the SystemDictionary_lock. This is OK
>>>>>     because Klass::restore_unshareable_info()is also being called 
>>>>> outside
>>>>>     of SystemDictionary_lock.
>>>>>
>>>>> Tests:
>>>>>
>>>>>     I am running the UTE stress tests now.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20130329/9023c3c9/attachment.html 


More information about the hotspot-runtime-dev mailing list