RFR (S) 8213092: Add more runtime locks for concurrent class unloading

Erik Österlund erik.osterlund at oracle.com
Fri Nov 16 14:52:45 UTC 2018


Hi Coleen,

Looks good, maybe just swap the names of ml2 and ml1 so that ml1 comes 
first and ml2 after. But I don't need to see another webrev for that.

/Erik

On 2018-11-15 16:26, coleen.phillimore at oracle.com wrote:
> 
> The changes for 8213751: ClassLoaderDataGraph::cld_do() should sometimes 
> require CLDG_lock
> add a NoSafepointVerifier when iterating over the CLDG.
> 
> This change has been tested on top of that change.
> 
> open webrev at http://cr.openjdk.java.net/~coleenp/8213092.03/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
> 
> This change is needed for concurrent class unloading.
> 
> Please review both.  Any further changes for 
> https://bugs.openjdk.java.net/browse/JDK-8213150 can be discussed in 
> that CR.
> 
> Thanks,
> Coleen
> 
> On 10/31/18 7:55 AM, coleen.phillimore at oracle.com wrote:
>>
>> Thank you Erik for answering David's email.
>>
>> In the updated version of this patch: 
>> http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
>> please note as Martin pointed out, I had the is_concurrent logic 
>> backwards in SystemDictionary::do_unloading().
>>
>> The JavaThreads OR the GC threads contend on 
>> ClassLoaderDataGraph_lock.  The STS joiner logic that I cut/pasted 
>> assures that safepoints are not taken while holding the lock.  So the 
>> new code is actually more correct than what we have.
>>
>> There is an existing problem that you're right to worry about in the 
>> VM thread doing unloading.  Consider:
>>
>> JavaThread - takes out ClassLoaderDataGraph_lock (setting owner, 
>> ignore sneaking) to iterate over the CLDG, then safepoints while 
>> holding the lock
>> VMThread at safepoint does either 1 or two things:
>>      1. Calls the HeapDumper or JFR and tries to iterate over the 
>> CLDG, but is blocked at the safepoint.
>>      2.  Does class unloading and breaks the CLDG links that the 
>> iterator is using.
>>
>> Both of these are currently existing possibilities, which I'd like to 
>> address in https://bugs.openjdk.java.net/browse/JDK-8213150
>>
>> But maybe that bug is too general, or the title is too general. It 
>> would be nice if this wasn't such a minefield.
>>
>> I think Erik should come talk to us about the STS joiner logic.
>>
>> Thanks,
>> Coleen
>>
>>
>> On 10/31/18 4:42 AM, David Holmes wrote:
>>> Hi Erik,
>>>
>>> So the magic STS Joiner ensures everything works okay. But how can I 
>>> tell from the code that you are in fact using the STS Joiner? Can it be:
>>>
>>> assert_locked_by_sts_joiner_or_safepoint(...);
>>>
>>> ? Is that at all possible?
>>>
>>> I don't know when SuspendibleThreadSet came into play or exactly how 
>>> it operates.
>>>
>>> Thanks,
>>> David
>>>
>>> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>>>> Hi David,
>>>>
>>>> On 2018-10-31 01:56, David Holmes wrote:
>>>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>>>> Hi Coleen,
>>>>>>>>>
>>>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as 
>>>>>>>>>> well and for calling SystemDictionary::do_unloading concurrently.
>>>>>>>>>>
>>>>>>>>>> Ran linux-x64 tier1-6 through mach5 and hotspot/jtreg/runtime 
>>>>>>>>>> tests, which include the module tests.
>>>>>>>>>>
>>>>>>>>>> open webrev at 
>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>>>
>>>>>>>>> So ... all the locks covered by an assert_locked_or_safepoint, 
>>>>>>>>> or which are acquired by the VMThread at a safepoint, must 
>>>>>>>>> never be held by a JavaThread if it could reach a safepoint 
>>>>>>>>> whilst that lock is held - else we could deadlock. So can we 
>>>>>>>>> check that with NoSafepointVerifiers?
>>>>>>>>
>>>>>>>> Actually I think this is not possible to add NSV. You can 
>>>>>>>> acquire the ClassLoaderDataGraph_lock and then the Module_lock. 
>>>>>>>> The latter would check for a safepoint also for a Java thread. 
>>>>>>>> This is currently done for Jvmti and JFR, but not in other code 
>>>>>>>> that I can see.  I don't actually know how to fix this problem.
>>>>>>>
>>>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while 
>>>>>>> blocked at a safepoint (acquiring the Module_lock), then what is 
>>>>>>> to stop the VMThread from hitting one of these sections of code 
>>>>>>> protected by locked_or_safepoint and then proceeding into what is 
>>>>>>> effectively a critical section (by virtue of there being a 
>>>>>>> safepoint) when the JavaThread is itself in the midst of a 
>>>>>>> critical section? Do we actively take steps to prevent this 
>>>>>>> somehow, or to make it safe for the VMThread to proceed?
>>>>>>
>>>>>> No we don't.  I think we have this problem today (not introduced 
>>>>>> or made worse by this patch).
>>>>>
>>>>> It is made worse by this patch. Before this patch you have code 
>>>>> only executed by the VMThread at a safepoint. After the patch you 
>>>>> have code that may be executed by the VMThread at a safepoint or by 
>>>>> another thread under a lock (and there may or may not also be a 
>>>>> safepoint). And this allows the VMThread to execute the same code 
>>>>> at the same as a lock-holding thread executes it!
>>>>
>>>> The scenario you describe can not happen. The code you are referring 
>>>> to is executed either by the VM thread in a safepoint, or by a 
>>>> concurrent GC thread outside of a safepoint, holding the STS joiner, 
>>>> and hence effectively blocking concurrent safepoints from entering 
>>>> the scope. The two can never run concurrent to each other.
>>>>
>>>> I tried going down the path of not using the STS joiner in an 
>>>> earlier prototype, but it required major rewrites due to too many 
>>>> assumptions being violated. And yet I could not convince myself what 
>>>> I was doing was safe, due to possibly strange interactions with 
>>>> crazy safepoint operations, like redefinition. So I dropped that and 
>>>> decided to protect these operations with the STS joiner instead.
>>>>
>>>>>> I'll file an bug to fix it and hopefully add detection for this. I 
>>>>>> think we don't need to take CLDG_lock in a safepoint and should 
>>>>>> prevent doing so, but things like this are more reliable to do 
>>>>>> with computers than visual inspection.  Can you add your 
>>>>>> suggestions to my RFE? 
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>>
>>>>> This needs to be fixed before enabling the current patch IMHO. At 
>>>>> least the two usages you mention in your follow up email need some 
>>>>> kind of guard to ensure a safepoint can't be reached while the lock 
>>>>> is held. Or the VMThread must also acquire the lock(s) - though if 
>>>>> NJTs are involved there is a risk of breakage due to lock sneaking!
>>>>
>>>> These locks are taken under the STS joiner, specifically to dodge 
>>>> the issues you refer to. The STS joiner makes sure that the locks 
>>>> you are taking do not suffer from bad lock sneaking interactions, 
>>>> because they do not take the locks concurrent to the VM thread. So 
>>>> we have the safety we need. I agree that the CLDG lock does not need 
>>>> to be taken by the VM thread in a safepoint, simply because it is 
>>>> unnecessary and risky. But that is orthogonal to this patch.
>>>>
>>>> Thanks,
>>>> /Erik
>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> The CLDG lock can be shared by non-java threads and java threads, 
>>>>>> which is the point. There may be other locks though.
>>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> The locks added in this patch set though are for the 
>>>>>>>> NonJavaThreads, who do not do safepoint checks.  The 
>>>>>>>> NonJavaThreads that acquire these locks use the STS joiner 
>>>>>>>> mechanism which disallows safepoints while being held (and since 
>>>>>>>> they are non Java threads, do not check for safepoints themselves).
>>>>>>>>
>>>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>>>
>>>>>>>> {
>>>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>>>> // Unlink the classes.
>>>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>>>> unloading_occurred = 
>>>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>>>> true /* do_cleaning */););
>>>>>>>> }
>>>>>>>
>>>>>>> Somehow I missed the creation/invention of the STS joiner mechanism.
>>>>>>>
>>>>>>
>>>>>> Me too!   It's in gc/shared but it's really runtime code, except 
>>>>>> people in runtime didn't know about it because it's used by GC 
>>>>>> threads.
>>>>>>
>>>>>> Let me know if you have more questions and can review this code.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Further, are these locks acquired by non-JavaThreads such that 
>>>>>>>>> the VMThread may be delayed whilst a safepoint is active?
>>>>>>>>
>>>>>>>> Yes, theoretically they could delay the VMThread from getting to 
>>>>>>>> a safepoint or doing its work while in a safepoint but the 
>>>>>>>> threads that take out these locks only hold them for short 
>>>>>>>> durations.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>
>>>>>>
>>
> 


More information about the hotspot-runtime-dev mailing list