RFR (S) 8213092: Add more runtime locks for concurrent class unloading

coleen.phillimore at oracle.com coleen.phillimore at oracle.com
Fri Nov 16 16:22:45 UTC 2018


Thanks Erik and Harold.

David, I'm checking this in.  If you find a further bug in this, after 
adding the NSV, please file a CR for it.

thanks,
Coleen

On 11/16/18 10:42 AM, Harold David Seigel wrote:
> Hi Coleen,
>
> It looks good to me, also.
>
> Thanks, Harold
>
> On 11/16/2018 9:52 AM, Erik Österlund wrote:
>> Hi Coleen,
>>
>> Looks good, maybe just swap the names of ml2 and ml1 so that ml1 
>> comes first and ml2 after. But I don't need to see another webrev for 
>> that.
>>
>> /Erik
>>
>> On 2018-11-15 16:26, coleen.phillimore at oracle.com wrote:
>>>
>>> The changes for 8213751: ClassLoaderDataGraph::cld_do() should 
>>> sometimes require CLDG_lock
>>> add a NoSafepointVerifier when iterating over the CLDG.
>>>
>>> This change has been tested on top of that change.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8213092.03/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>
>>> This change is needed for concurrent class unloading.
>>>
>>> Please review both.  Any further changes for 
>>> https://bugs.openjdk.java.net/browse/JDK-8213150 can be discussed in 
>>> that CR.
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 10/31/18 7:55 AM, coleen.phillimore at oracle.com wrote:
>>>>
>>>> Thank you Erik for answering David's email.
>>>>
>>>> In the updated version of this patch: 
>>>> http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
>>>> please note as Martin pointed out, I had the is_concurrent logic 
>>>> backwards in SystemDictionary::do_unloading().
>>>>
>>>> The JavaThreads OR the GC threads contend on 
>>>> ClassLoaderDataGraph_lock.  The STS joiner logic that I cut/pasted 
>>>> assures that safepoints are not taken while holding the lock.  So 
>>>> the new code is actually more correct than what we have.
>>>>
>>>> There is an existing problem that you're right to worry about in 
>>>> the VM thread doing unloading.  Consider:
>>>>
>>>> JavaThread - takes out ClassLoaderDataGraph_lock (setting owner, 
>>>> ignore sneaking) to iterate over the CLDG, then safepoints while 
>>>> holding the lock
>>>> VMThread at safepoint does either 1 or two things:
>>>>      1. Calls the HeapDumper or JFR and tries to iterate over the 
>>>> CLDG, but is blocked at the safepoint.
>>>>      2.  Does class unloading and breaks the CLDG links that the 
>>>> iterator is using.
>>>>
>>>> Both of these are currently existing possibilities, which I'd like 
>>>> to address in https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>
>>>> But maybe that bug is too general, or the title is too general. It 
>>>> would be nice if this wasn't such a minefield.
>>>>
>>>> I think Erik should come talk to us about the STS joiner logic.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>
>>>> On 10/31/18 4:42 AM, David Holmes wrote:
>>>>> Hi Erik,
>>>>>
>>>>> So the magic STS Joiner ensures everything works okay. But how can 
>>>>> I tell from the code that you are in fact using the STS Joiner? 
>>>>> Can it be:
>>>>>
>>>>> assert_locked_by_sts_joiner_or_safepoint(...);
>>>>>
>>>>> ? Is that at all possible?
>>>>>
>>>>> I don't know when SuspendibleThreadSet came into play or exactly 
>>>>> how it operates.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> On 2018-10-31 01:56, David Holmes wrote:
>>>>>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>>>>>> Hi Coleen,
>>>>>>>>>>>
>>>>>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as 
>>>>>>>>>>>> well and for calling SystemDictionary::do_unloading 
>>>>>>>>>>>> concurrently.
>>>>>>>>>>>>
>>>>>>>>>>>> Ran linux-x64 tier1-6 through mach5 and 
>>>>>>>>>>>> hotspot/jtreg/runtime tests, which include the module tests.
>>>>>>>>>>>>
>>>>>>>>>>>> open webrev at 
>>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>>>>>
>>>>>>>>>>> So ... all the locks covered by an 
>>>>>>>>>>> assert_locked_or_safepoint, or which are acquired by the 
>>>>>>>>>>> VMThread at a safepoint, must never be held by a JavaThread 
>>>>>>>>>>> if it could reach a safepoint whilst that lock is held - 
>>>>>>>>>>> else we could deadlock. So can we check that with 
>>>>>>>>>>> NoSafepointVerifiers?
>>>>>>>>>>
>>>>>>>>>> Actually I think this is not possible to add NSV. You can 
>>>>>>>>>> acquire the ClassLoaderDataGraph_lock and then the 
>>>>>>>>>> Module_lock. The latter would check for a safepoint also for 
>>>>>>>>>> a Java thread. This is currently done for Jvmti and JFR, but 
>>>>>>>>>> not in other code that I can see.  I don't actually know how 
>>>>>>>>>> to fix this problem.
>>>>>>>>>
>>>>>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while 
>>>>>>>>> blocked at a safepoint (acquiring the Module_lock), then what 
>>>>>>>>> is to stop the VMThread from hitting one of these sections of 
>>>>>>>>> code protected by locked_or_safepoint and then proceeding into 
>>>>>>>>> what is effectively a critical section (by virtue of there 
>>>>>>>>> being a safepoint) when the JavaThread is itself in the midst 
>>>>>>>>> of a critical section? Do we actively take steps to prevent 
>>>>>>>>> this somehow, or to make it safe for the VMThread to proceed?
>>>>>>>>
>>>>>>>> No we don't.  I think we have this problem today (not 
>>>>>>>> introduced or made worse by this patch).
>>>>>>>
>>>>>>> It is made worse by this patch. Before this patch you have code 
>>>>>>> only executed by the VMThread at a safepoint. After the patch 
>>>>>>> you have code that may be executed by the VMThread at a 
>>>>>>> safepoint or by another thread under a lock (and there may or 
>>>>>>> may not also be a safepoint). And this allows the VMThread to 
>>>>>>> execute the same code at the same as a lock-holding thread 
>>>>>>> executes it!
>>>>>>
>>>>>> The scenario you describe can not happen. The code you are 
>>>>>> referring to is executed either by the VM thread in a safepoint, 
>>>>>> or by a concurrent GC thread outside of a safepoint, holding the 
>>>>>> STS joiner, and hence effectively blocking concurrent safepoints 
>>>>>> from entering the scope. The two can never run concurrent to each 
>>>>>> other.
>>>>>>
>>>>>> I tried going down the path of not using the STS joiner in an 
>>>>>> earlier prototype, but it required major rewrites due to too many 
>>>>>> assumptions being violated. And yet I could not convince myself 
>>>>>> what I was doing was safe, due to possibly strange interactions 
>>>>>> with crazy safepoint operations, like redefinition. So I dropped 
>>>>>> that and decided to protect these operations with the STS joiner 
>>>>>> instead.
>>>>>>
>>>>>>>> I'll file an bug to fix it and hopefully add detection for 
>>>>>>>> this. I think we don't need to take CLDG_lock in a safepoint 
>>>>>>>> and should prevent doing so, but things like this are more 
>>>>>>>> reliable to do with computers than visual inspection. Can you 
>>>>>>>> add your suggestions to my RFE? 
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>>>>
>>>>>>> This needs to be fixed before enabling the current patch IMHO. 
>>>>>>> At least the two usages you mention in your follow up email need 
>>>>>>> some kind of guard to ensure a safepoint can't be reached while 
>>>>>>> the lock is held. Or the VMThread must also acquire the lock(s) 
>>>>>>> - though if NJTs are involved there is a risk of breakage due to 
>>>>>>> lock sneaking!
>>>>>>
>>>>>> These locks are taken under the STS joiner, specifically to dodge 
>>>>>> the issues you refer to. The STS joiner makes sure that the locks 
>>>>>> you are taking do not suffer from bad lock sneaking interactions, 
>>>>>> because they do not take the locks concurrent to the VM thread. 
>>>>>> So we have the safety we need. I agree that the CLDG lock does 
>>>>>> not need to be taken by the VM thread in a safepoint, simply 
>>>>>> because it is unnecessary and risky. But that is orthogonal to 
>>>>>> this patch.
>>>>>>
>>>>>> Thanks,
>>>>>> /Erik
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> The CLDG lock can be shared by non-java threads and java 
>>>>>>>> threads, which is the point. There may be other locks though.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The locks added in this patch set though are for the 
>>>>>>>>>> NonJavaThreads, who do not do safepoint checks.  The 
>>>>>>>>>> NonJavaThreads that acquire these locks use the STS joiner 
>>>>>>>>>> mechanism which disallows safepoints while being held (and 
>>>>>>>>>> since they are non Java threads, do not check for safepoints 
>>>>>>>>>> themselves).
>>>>>>>>>>
>>>>>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>>>>>> // Unlink the classes.
>>>>>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>>>>>> unloading_occurred = 
>>>>>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>>>>>> true /* do_cleaning */););
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Somehow I missed the creation/invention of the STS joiner 
>>>>>>>>> mechanism.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Me too!   It's in gc/shared but it's really runtime code, 
>>>>>>>> except people in runtime didn't know about it because it's used 
>>>>>>>> by GC threads.
>>>>>>>>
>>>>>>>> Let me know if you have more questions and can review this code.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Further, are these locks acquired by non-JavaThreads such 
>>>>>>>>>>> that the VMThread may be delayed whilst a safepoint is active?
>>>>>>>>>>
>>>>>>>>>> Yes, theoretically they could delay the VMThread from getting 
>>>>>>>>>> to a safepoint or doing its work while in a safepoint but the 
>>>>>>>>>> threads that take out these locks only hold them for short 
>>>>>>>>>> durations.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Coleen
>>>>>>>>>>
>>>>>>>>
>>>>
>>>
>



More information about the hotspot-runtime-dev mailing list