RFR (S) 8213092: Add more runtime locks for concurrent class unloading

Harold David Seigel harold.seigel at oracle.com
Fri Nov 16 15:42:13 UTC 2018


Hi Coleen,

It looks good to me, also.

Thanks, Harold

On 11/16/2018 9:52 AM, Erik Österlund wrote:
> Hi Coleen,
>
> Looks good, maybe just swap the names of ml2 and ml1 so that ml1 comes 
> first and ml2 after. But I don't need to see another webrev for that.
>
> /Erik
>
> On 2018-11-15 16:26, coleen.phillimore at oracle.com wrote:
>>
>> The changes for 8213751: ClassLoaderDataGraph::cld_do() should 
>> sometimes require CLDG_lock
>> add a NoSafepointVerifier when iterating over the CLDG.
>>
>> This change has been tested on top of that change.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8213092.03/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>
>> This change is needed for concurrent class unloading.
>>
>> Please review both.  Any further changes for 
>> https://bugs.openjdk.java.net/browse/JDK-8213150 can be discussed in 
>> that CR.
>>
>> Thanks,
>> Coleen
>>
>> On 10/31/18 7:55 AM, coleen.phillimore at oracle.com wrote:
>>>
>>> Thank you Erik for answering David's email.
>>>
>>> In the updated version of this patch: 
>>> http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
>>> please note as Martin pointed out, I had the is_concurrent logic 
>>> backwards in SystemDictionary::do_unloading().
>>>
>>> The JavaThreads OR the GC threads contend on 
>>> ClassLoaderDataGraph_lock.  The STS joiner logic that I cut/pasted 
>>> assures that safepoints are not taken while holding the lock.  So 
>>> the new code is actually more correct than what we have.
>>>
>>> There is an existing problem that you're right to worry about in the 
>>> VM thread doing unloading.  Consider:
>>>
>>> JavaThread - takes out ClassLoaderDataGraph_lock (setting owner, 
>>> ignore sneaking) to iterate over the CLDG, then safepoints while 
>>> holding the lock
>>> VMThread at safepoint does either 1 or two things:
>>>      1. Calls the HeapDumper or JFR and tries to iterate over the 
>>> CLDG, but is blocked at the safepoint.
>>>      2.  Does class unloading and breaks the CLDG links that the 
>>> iterator is using.
>>>
>>> Both of these are currently existing possibilities, which I'd like 
>>> to address in https://bugs.openjdk.java.net/browse/JDK-8213150
>>>
>>> But maybe that bug is too general, or the title is too general. It 
>>> would be nice if this wasn't such a minefield.
>>>
>>> I think Erik should come talk to us about the STS joiner logic.
>>>
>>> Thanks,
>>> Coleen
>>>
>>>
>>> On 10/31/18 4:42 AM, David Holmes wrote:
>>>> Hi Erik,
>>>>
>>>> So the magic STS Joiner ensures everything works okay. But how can 
>>>> I tell from the code that you are in fact using the STS Joiner? Can 
>>>> it be:
>>>>
>>>> assert_locked_by_sts_joiner_or_safepoint(...);
>>>>
>>>> ? Is that at all possible?
>>>>
>>>> I don't know when SuspendibleThreadSet came into play or exactly 
>>>> how it operates.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>>>>> Hi David,
>>>>>
>>>>> On 2018-10-31 01:56, David Holmes wrote:
>>>>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>>>>> Hi Coleen,
>>>>>>>>>>
>>>>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as 
>>>>>>>>>>> well and for calling SystemDictionary::do_unloading 
>>>>>>>>>>> concurrently.
>>>>>>>>>>>
>>>>>>>>>>> Ran linux-x64 tier1-6 through mach5 and 
>>>>>>>>>>> hotspot/jtreg/runtime tests, which include the module tests.
>>>>>>>>>>>
>>>>>>>>>>> open webrev at 
>>>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>>>>
>>>>>>>>>> So ... all the locks covered by an 
>>>>>>>>>> assert_locked_or_safepoint, or which are acquired by the 
>>>>>>>>>> VMThread at a safepoint, must never be held by a JavaThread 
>>>>>>>>>> if it could reach a safepoint whilst that lock is held - else 
>>>>>>>>>> we could deadlock. So can we check that with 
>>>>>>>>>> NoSafepointVerifiers?
>>>>>>>>>
>>>>>>>>> Actually I think this is not possible to add NSV. You can 
>>>>>>>>> acquire the ClassLoaderDataGraph_lock and then the 
>>>>>>>>> Module_lock. The latter would check for a safepoint also for a 
>>>>>>>>> Java thread. This is currently done for Jvmti and JFR, but not 
>>>>>>>>> in other code that I can see.  I don't actually know how to 
>>>>>>>>> fix this problem.
>>>>>>>>
>>>>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while 
>>>>>>>> blocked at a safepoint (acquiring the Module_lock), then what 
>>>>>>>> is to stop the VMThread from hitting one of these sections of 
>>>>>>>> code protected by locked_or_safepoint and then proceeding into 
>>>>>>>> what is effectively a critical section (by virtue of there 
>>>>>>>> being a safepoint) when the JavaThread is itself in the midst 
>>>>>>>> of a critical section? Do we actively take steps to prevent 
>>>>>>>> this somehow, or to make it safe for the VMThread to proceed?
>>>>>>>
>>>>>>> No we don't.  I think we have this problem today (not introduced 
>>>>>>> or made worse by this patch).
>>>>>>
>>>>>> It is made worse by this patch. Before this patch you have code 
>>>>>> only executed by the VMThread at a safepoint. After the patch you 
>>>>>> have code that may be executed by the VMThread at a safepoint or 
>>>>>> by another thread under a lock (and there may or may not also be 
>>>>>> a safepoint). And this allows the VMThread to execute the same 
>>>>>> code at the same as a lock-holding thread executes it!
>>>>>
>>>>> The scenario you describe can not happen. The code you are 
>>>>> referring to is executed either by the VM thread in a safepoint, 
>>>>> or by a concurrent GC thread outside of a safepoint, holding the 
>>>>> STS joiner, and hence effectively blocking concurrent safepoints 
>>>>> from entering the scope. The two can never run concurrent to each 
>>>>> other.
>>>>>
>>>>> I tried going down the path of not using the STS joiner in an 
>>>>> earlier prototype, but it required major rewrites due to too many 
>>>>> assumptions being violated. And yet I could not convince myself 
>>>>> what I was doing was safe, due to possibly strange interactions 
>>>>> with crazy safepoint operations, like redefinition. So I dropped 
>>>>> that and decided to protect these operations with the STS joiner 
>>>>> instead.
>>>>>
>>>>>>> I'll file an bug to fix it and hopefully add detection for this. 
>>>>>>> I think we don't need to take CLDG_lock in a safepoint and 
>>>>>>> should prevent doing so, but things like this are more reliable 
>>>>>>> to do with computers than visual inspection. Can you add your 
>>>>>>> suggestions to my RFE? 
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>>>
>>>>>> This needs to be fixed before enabling the current patch IMHO. At 
>>>>>> least the two usages you mention in your follow up email need 
>>>>>> some kind of guard to ensure a safepoint can't be reached while 
>>>>>> the lock is held. Or the VMThread must also acquire the lock(s) - 
>>>>>> though if NJTs are involved there is a risk of breakage due to 
>>>>>> lock sneaking!
>>>>>
>>>>> These locks are taken under the STS joiner, specifically to dodge 
>>>>> the issues you refer to. The STS joiner makes sure that the locks 
>>>>> you are taking do not suffer from bad lock sneaking interactions, 
>>>>> because they do not take the locks concurrent to the VM thread. So 
>>>>> we have the safety we need. I agree that the CLDG lock does not 
>>>>> need to be taken by the VM thread in a safepoint, simply because 
>>>>> it is unnecessary and risky. But that is orthogonal to this patch.
>>>>>
>>>>> Thanks,
>>>>> /Erik
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> The CLDG lock can be shared by non-java threads and java 
>>>>>>> threads, which is the point. There may be other locks though.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> The locks added in this patch set though are for the 
>>>>>>>>> NonJavaThreads, who do not do safepoint checks.  The 
>>>>>>>>> NonJavaThreads that acquire these locks use the STS joiner 
>>>>>>>>> mechanism which disallows safepoints while being held (and 
>>>>>>>>> since they are non Java threads, do not check for safepoints 
>>>>>>>>> themselves).
>>>>>>>>>
>>>>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>>>>> // Unlink the classes.
>>>>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>>>>> unloading_occurred = 
>>>>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>>>>> true /* do_cleaning */););
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Somehow I missed the creation/invention of the STS joiner 
>>>>>>>> mechanism.
>>>>>>>>
>>>>>>>
>>>>>>> Me too!   It's in gc/shared but it's really runtime code, except 
>>>>>>> people in runtime didn't know about it because it's used by GC 
>>>>>>> threads.
>>>>>>>
>>>>>>> Let me know if you have more questions and can review this code.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Further, are these locks acquired by non-JavaThreads such 
>>>>>>>>>> that the VMThread may be delayed whilst a safepoint is active?
>>>>>>>>>
>>>>>>>>> Yes, theoretically they could delay the VMThread from getting 
>>>>>>>>> to a safepoint or doing its work while in a safepoint but the 
>>>>>>>>> threads that take out these locks only hold them for short 
>>>>>>>>> durations.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Coleen
>>>>>>>>>
>>>>>>>
>>>
>>



More information about the hotspot-runtime-dev mailing list