RFR (S) 8213092: Add more runtime locks for concurrent class unloading

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Thu Nov 15 15:26:13 UTC 2018


The changes for 8213751: ClassLoaderDataGraph::cld_do() should sometimes 
require CLDG_lock
add a NoSafepointVerifier when iterating over the CLDG.

This change has been tested on top of that change.

open webrev at http://cr.openjdk.java.net/~coleenp/8213092.03/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8213092

This change is needed for concurrent class unloading.

Please review both.  Any further changes for 
https://bugs.openjdk.java.net/browse/JDK-8213150 can be discussed in 
that CR.

Thanks,
Coleen

On 10/31/18 7:55 AM, coleen.phillimore at oracle.com wrote:
>
> Thank you Erik for answering David's email.
>
> In the updated version of this patch: 
> http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
> please note as Martin pointed out, I had the is_concurrent logic 
> backwards in SystemDictionary::do_unloading().
>
> The JavaThreads OR the GC threads contend on 
> ClassLoaderDataGraph_lock.  The STS joiner logic that I cut/pasted 
> assures that safepoints are not taken while holding the lock.  So the 
> new code is actually more correct than what we have.
>
> There is an existing problem that you're right to worry about in the 
> VM thread doing unloading.  Consider:
>
> JavaThread - takes out ClassLoaderDataGraph_lock (setting owner, 
> ignore sneaking) to iterate over the CLDG, then safepoints while 
> holding the lock
> VMThread at safepoint does either 1 or two things:
>      1. Calls the HeapDumper or JFR and tries to iterate over the 
> CLDG, but is blocked at the safepoint.
>      2.  Does class unloading and breaks the CLDG links that the 
> iterator is using.
>
> Both of these are currently existing possibilities, which I'd like to 
> address in https://bugs.openjdk.java.net/browse/JDK-8213150
>
> But maybe that bug is too general, or the title is too general. It 
> would be nice if this wasn't such a minefield.
>
> I think Erik should come talk to us about the STS joiner logic.
>
> Thanks,
> Coleen
>
>
> On 10/31/18 4:42 AM, David Holmes wrote:
>> Hi Erik,
>>
>> So the magic STS Joiner ensures everything works okay. But how can I 
>> tell from the code that you are in fact using the STS Joiner? Can it be:
>>
>> assert_locked_by_sts_joiner_or_safepoint(...);
>>
>> ? Is that at all possible?
>>
>> I don't know when SuspendibleThreadSet came into play or exactly how 
>> it operates.
>>
>> Thanks,
>> David
>>
>> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>>> Hi David,
>>>
>>> On 2018-10-31 01:56, David Holmes wrote:
>>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as 
>>>>>>>>> well and for calling SystemDictionary::do_unloading concurrently.
>>>>>>>>>
>>>>>>>>> Ran linux-x64 tier1-6 through mach5 and hotspot/jtreg/runtime 
>>>>>>>>> tests, which include the module tests.
>>>>>>>>>
>>>>>>>>> open webrev at 
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>>
>>>>>>>> So ... all the locks covered by an assert_locked_or_safepoint, 
>>>>>>>> or which are acquired by the VMThread at a safepoint, must 
>>>>>>>> never be held by a JavaThread if it could reach a safepoint 
>>>>>>>> whilst that lock is held - else we could deadlock. So can we 
>>>>>>>> check that with NoSafepointVerifiers?
>>>>>>>
>>>>>>> Actually I think this is not possible to add NSV. You can 
>>>>>>> acquire the ClassLoaderDataGraph_lock and then the Module_lock. 
>>>>>>> The latter would check for a safepoint also for a Java thread. 
>>>>>>> This is currently done for Jvmti and JFR, but not in other code 
>>>>>>> that I can see.  I don't actually know how to fix this problem.
>>>>>>
>>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while 
>>>>>> blocked at a safepoint (acquiring the Module_lock), then what is 
>>>>>> to stop the VMThread from hitting one of these sections of code 
>>>>>> protected by locked_or_safepoint and then proceeding into what is 
>>>>>> effectively a critical section (by virtue of there being a 
>>>>>> safepoint) when the JavaThread is itself in the midst of a 
>>>>>> critical section? Do we actively take steps to prevent this 
>>>>>> somehow, or to make it safe for the VMThread to proceed?
>>>>>
>>>>> No we don't.  I think we have this problem today (not introduced 
>>>>> or made worse by this patch).
>>>>
>>>> It is made worse by this patch. Before this patch you have code 
>>>> only executed by the VMThread at a safepoint. After the patch you 
>>>> have code that may be executed by the VMThread at a safepoint or by 
>>>> another thread under a lock (and there may or may not also be a 
>>>> safepoint). And this allows the VMThread to execute the same code 
>>>> at the same as a lock-holding thread executes it!
>>>
>>> The scenario you describe can not happen. The code you are referring 
>>> to is executed either by the VM thread in a safepoint, or by a 
>>> concurrent GC thread outside of a safepoint, holding the STS joiner, 
>>> and hence effectively blocking concurrent safepoints from entering 
>>> the scope. The two can never run concurrent to each other.
>>>
>>> I tried going down the path of not using the STS joiner in an 
>>> earlier prototype, but it required major rewrites due to too many 
>>> assumptions being violated. And yet I could not convince myself what 
>>> I was doing was safe, due to possibly strange interactions with 
>>> crazy safepoint operations, like redefinition. So I dropped that and 
>>> decided to protect these operations with the STS joiner instead.
>>>
>>>>> I'll file an bug to fix it and hopefully add detection for this.  
>>>>> I think we don't need to take CLDG_lock in a safepoint and should 
>>>>> prevent doing so, but things like this are more reliable to do 
>>>>> with computers than visual inspection.  Can you add your 
>>>>> suggestions to my RFE? 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>
>>>> This needs to be fixed before enabling the current patch IMHO. At 
>>>> least the two usages you mention in your follow up email need some 
>>>> kind of guard to ensure a safepoint can't be reached while the lock 
>>>> is held. Or the VMThread must also acquire the lock(s) - though if 
>>>> NJTs are involved there is a risk of breakage due to lock sneaking!
>>>
>>> These locks are taken under the STS joiner, specifically to dodge 
>>> the issues you refer to. The STS joiner makes sure that the locks 
>>> you are taking do not suffer from bad lock sneaking interactions, 
>>> because they do not take the locks concurrent to the VM thread. So 
>>> we have the safety we need. I agree that the CLDG lock does not need 
>>> to be taken by the VM thread in a safepoint, simply because it is 
>>> unnecessary and risky. But that is orthogonal to this patch.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> David
>>>> -----
>>>>
>>>>> The CLDG lock can be shared by non-java threads and java threads, 
>>>>> which is the point. There may be other locks though.
>>>>
>>>>
>>>>
>>>>>>
>>>>>>>
>>>>>>> The locks added in this patch set though are for the 
>>>>>>> NonJavaThreads, who do not do safepoint checks.  The 
>>>>>>> NonJavaThreads that acquire these locks use the STS joiner 
>>>>>>> mechanism which disallows safepoints while being held (and since 
>>>>>>> they are non Java threads, do not check for safepoints themselves).
>>>>>>>
>>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>>
>>>>>>> {
>>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>>> // Unlink the classes.
>>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>>> unloading_occurred = 
>>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>>> true /* do_cleaning */););
>>>>>>> }
>>>>>>
>>>>>> Somehow I missed the creation/invention of the STS joiner mechanism.
>>>>>>
>>>>>
>>>>> Me too!   It's in gc/shared but it's really runtime code, except 
>>>>> people in runtime didn't know about it because it's used by GC 
>>>>> threads.
>>>>>
>>>>> Let me know if you have more questions and can review this code.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Further, are these locks acquired by non-JavaThreads such that 
>>>>>>>> the VMThread may be delayed whilst a safepoint is active?
>>>>>>>
>>>>>>> Yes, theoretically they could delay the VMThread from getting to 
>>>>>>> a safepoint or doing its work while in a safepoint but the 
>>>>>>> threads that take out these locks only hold them for short 
>>>>>>> durations.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>



More information about the hotspot-runtime-dev mailing list