RFR (S) 8213092: Add more runtime locks for concurrent class unloading

Wed Oct 31 11:55:45 UTC 2018

Thank you Erik for answering David's email.

In the updated version of this patch: 
http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
please note as Martin pointed out, I had the is_concurrent logic 
backwards in SystemDictionary::do_unloading().

The JavaThreads OR the GC threads contend on ClassLoaderDataGraph_lock.  
The STS joiner logic that I cut/pasted assures that safepoints are not 
taken while holding the lock.  So the new code is actually more correct 
than what we have.

There is an existing problem that you're right to worry about in the VM 
thread doing unloading.  Consider:

JavaThread - takes out ClassLoaderDataGraph_lock (setting owner, ignore 
sneaking) to iterate over the CLDG, then safepoints while holding the lock
VMThread at safepoint does either 1 or two things:
      1. Calls the HeapDumper or JFR and tries to iterate over the CLDG, 
but is blocked at the safepoint.
      2.  Does class unloading and breaks the CLDG links that the 
iterator is using.

Both of these are currently existing possibilities, which I'd like to 
address in https://bugs.openjdk.java.net/browse/JDK-8213150

But maybe that bug is too general, or the title is too general.  It 
would be nice if this wasn't such a minefield.

I think Erik should come talk to us about the STS joiner logic.

Thanks,
Coleen

On 10/31/18 4:42 AM, David Holmes wrote:
> Hi Erik,
>
> So the magic STS Joiner ensures everything works okay. But how can I 
> tell from the code that you are in fact using the STS Joiner? Can it be:
>
> assert_locked_by_sts_joiner_or_safepoint(...);
>
> ? Is that at all possible?
>
> I don't know when SuspendibleThreadSet came into play or exactly how 
> it operates.
>
> Thanks,
> David
>
> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>> Hi David,
>>
>> On 2018-10-31 01:56, David Holmes wrote:
>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>> Hi Coleen,
>>>>>>>
>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as well 
>>>>>>>> and for calling SystemDictionary::do_unloading concurrently.
>>>>>>>>
>>>>>>>> Ran linux-x64 tier1-6 through mach5 and hotspot/jtreg/runtime 
>>>>>>>> tests, which include the module tests.
>>>>>>>>
>>>>>>>> open webrev at 
>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>
>>>>>>> So ... all the locks covered by an assert_locked_or_safepoint, 
>>>>>>> or which are acquired by the VMThread at a safepoint, must never 
>>>>>>> be held by a JavaThread if it could reach a safepoint whilst 
>>>>>>> that lock is held - else we could deadlock. So can we check that 
>>>>>>> with NoSafepointVerifiers?
>>>>>>
>>>>>> Actually I think this is not possible to add NSV.   You can 
>>>>>> acquire the ClassLoaderDataGraph_lock and then the Module_lock. 
>>>>>> The latter would check for a safepoint also for a Java thread. 
>>>>>> This is currently done for Jvmti and JFR, but not in other code 
>>>>>> that I can see.  I don't actually know how to fix this problem.
>>>>>
>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while 
>>>>> blocked at a safepoint (acquiring the Module_lock), then what is 
>>>>> to stop the VMThread from hitting one of these sections of code 
>>>>> protected by locked_or_safepoint and then proceeding into what is 
>>>>> effectively a critical section (by virtue of there being a 
>>>>> safepoint) when the JavaThread is itself in the midst of a 
>>>>> critical section? Do we actively take steps to prevent this 
>>>>> somehow, or to make it safe for the VMThread to proceed?
>>>>
>>>> No we don't.  I think we have this problem today (not introduced or 
>>>> made worse by this patch).
>>>
>>> It is made worse by this patch. Before this patch you have code only 
>>> executed by the VMThread at a safepoint. After the patch you have 
>>> code that may be executed by the VMThread at a safepoint or by 
>>> another thread under a lock (and there may or may not also be a 
>>> safepoint). And this allows the VMThread to execute the same code at 
>>> the same as a lock-holding thread executes it!
>>
>> The scenario you describe can not happen. The code you are referring 
>> to is executed either by the VM thread in a safepoint, or by a 
>> concurrent GC thread outside of a safepoint, holding the STS joiner, 
>> and hence effectively blocking concurrent safepoints from entering 
>> the scope. The two can never run concurrent to each other.
>>
>> I tried going down the path of not using the STS joiner in an earlier 
>> prototype, but it required major rewrites due to too many assumptions 
>> being violated. And yet I could not convince myself what I was doing 
>> was safe, due to possibly strange interactions with crazy safepoint 
>> operations, like redefinition. So I dropped that and decided to 
>> protect these operations with the STS joiner instead.
>>
>>>> I'll file an bug to fix it and hopefully add detection for this.  I 
>>>> think we don't need to take CLDG_lock in a safepoint and should 
>>>> prevent doing so, but things like this are more reliable to do with 
>>>> computers than visual inspection.  Can you add your suggestions to 
>>>> my RFE? https://bugs.openjdk.java.net/browse/JDK-8213150
>>>
>>> This needs to be fixed before enabling the current patch IMHO. At 
>>> least the two usages you mention in your follow up email need some 
>>> kind of guard to ensure a safepoint can't be reached while the lock 
>>> is held. Or the VMThread must also acquire the lock(s) - though if 
>>> NJTs are involved there is a risk of breakage due to lock sneaking!
>>
>> These locks are taken under the STS joiner, specifically to dodge the 
>> issues you refer to. The STS joiner makes sure that the locks you are 
>> taking do not suffer from bad lock sneaking interactions, because 
>> they do not take the locks concurrent to the VM thread. So we have 
>> the safety we need. I agree that the CLDG lock does not need to be 
>> taken by the VM thread in a safepoint, simply because it is 
>> unnecessary and risky. But that is orthogonal to this patch.
>>
>> Thanks,
>> /Erik
>>
>>> David
>>> -----
>>>
>>>> The CLDG lock can be shared by non-java threads and java threads, 
>>>> which is the point. There may be other locks though.
>>>
>>>
>>>
>>>>>
>>>>>>
>>>>>> The locks added in this patch set though are for the 
>>>>>> NonJavaThreads, who do not do safepoint checks.  The 
>>>>>> NonJavaThreads that acquire these locks use the STS joiner 
>>>>>> mechanism which disallows safepoints while being held (and since 
>>>>>> they are non Java threads, do not check for safepoints themselves).
>>>>>>
>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>
>>>>>> {
>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>> // Unlink the classes.
>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>> unloading_occurred = 
>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>> true /* do_cleaning */););
>>>>>> }
>>>>>
>>>>> Somehow I missed the creation/invention of the STS joiner mechanism.
>>>>>
>>>>
>>>> Me too!   It's in gc/shared but it's really runtime code, except 
>>>> people in runtime didn't know about it because it's used by GC 
>>>> threads.
>>>>
>>>> Let me know if you have more questions and can review this code.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>> David
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Further, are these locks acquired by non-JavaThreads such that 
>>>>>>> the VMThread may be delayed whilst a safepoint is active?
>>>>>>
>>>>>> Yes, theoretically they could delay the VMThread from getting to 
>>>>>> a safepoint or doing its work while in a safepoint but the 
>>>>>> threads that take out these locks only hold them for short 
>>>>>> durations.
>>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Coleen
>>>>>>
>>>>