RFR (S) 8213092: Add more runtime locks for concurrent class unloading
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Thu Nov 15 15:26:13 UTC 2018
The changes for 8213751: ClassLoaderDataGraph::cld_do() should sometimes
require CLDG_lock
add a NoSafepointVerifier when iterating over the CLDG.
This change has been tested on top of that change.
open webrev at http://cr.openjdk.java.net/~coleenp/8213092.03/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8213092
This change is needed for concurrent class unloading.
Please review both. Any further changes for
https://bugs.openjdk.java.net/browse/JDK-8213150 can be discussed in
that CR.
Thanks,
Coleen
On 10/31/18 7:55 AM, coleen.phillimore at oracle.com wrote:
>
> Thank you Erik for answering David's email.
>
> In the updated version of this patch:
> http://cr.openjdk.java.net/~coleenp/8213092.02/webrev
> please note as Martin pointed out, I had the is_concurrent logic
> backwards in SystemDictionary::do_unloading().
>
> The JavaThreads OR the GC threads contend on
> ClassLoaderDataGraph_lock. The STS joiner logic that I cut/pasted
> assures that safepoints are not taken while holding the lock. So the
> new code is actually more correct than what we have.
>
> There is an existing problem that you're right to worry about in the
> VM thread doing unloading. Consider:
>
> JavaThread - takes out ClassLoaderDataGraph_lock (setting owner,
> ignore sneaking) to iterate over the CLDG, then safepoints while
> holding the lock
> VMThread at safepoint does either 1 or two things:
> 1. Calls the HeapDumper or JFR and tries to iterate over the
> CLDG, but is blocked at the safepoint.
> 2. Does class unloading and breaks the CLDG links that the
> iterator is using.
>
> Both of these are currently existing possibilities, which I'd like to
> address in https://bugs.openjdk.java.net/browse/JDK-8213150
>
> But maybe that bug is too general, or the title is too general. It
> would be nice if this wasn't such a minefield.
>
> I think Erik should come talk to us about the STS joiner logic.
>
> Thanks,
> Coleen
>
>
> On 10/31/18 4:42 AM, David Holmes wrote:
>> Hi Erik,
>>
>> So the magic STS Joiner ensures everything works okay. But how can I
>> tell from the code that you are in fact using the STS Joiner? Can it be:
>>
>> assert_locked_by_sts_joiner_or_safepoint(...);
>>
>> ? Is that at all possible?
>>
>> I don't know when SuspendibleThreadSet came into play or exactly how
>> it operates.
>>
>> Thanks,
>> David
>>
>> On 31/10/2018 6:19 PM, Erik Österlund wrote:
>>> Hi David,
>>>
>>> On 2018-10-31 01:56, David Holmes wrote:
>>>> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>>>>> On 10/30/18 8:36 AM, David Holmes wrote:
>>>>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>>>>> Hi Coleen,
>>>>>>>>
>>>>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>>>>> Summary: Add locks for calling CLDG::purge concurrently as
>>>>>>>>> well and for calling SystemDictionary::do_unloading concurrently.
>>>>>>>>>
>>>>>>>>> Ran linux-x64 tier1-6 through mach5 and hotspot/jtreg/runtime
>>>>>>>>> tests, which include the module tests.
>>>>>>>>>
>>>>>>>>> open webrev at
>>>>>>>>> http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>>>>
>>>>>>>> So ... all the locks covered by an assert_locked_or_safepoint,
>>>>>>>> or which are acquired by the VMThread at a safepoint, must
>>>>>>>> never be held by a JavaThread if it could reach a safepoint
>>>>>>>> whilst that lock is held - else we could deadlock. So can we
>>>>>>>> check that with NoSafepointVerifiers?
>>>>>>>
>>>>>>> Actually I think this is not possible to add NSV. You can
>>>>>>> acquire the ClassLoaderDataGraph_lock and then the Module_lock.
>>>>>>> The latter would check for a safepoint also for a Java thread.
>>>>>>> This is currently done for Jvmti and JFR, but not in other code
>>>>>>> that I can see. I don't actually know how to fix this problem.
>>>>>>
>>>>>> This seems risky. If a JavaThread can hold the CLDG_lock while
>>>>>> blocked at a safepoint (acquiring the Module_lock), then what is
>>>>>> to stop the VMThread from hitting one of these sections of code
>>>>>> protected by locked_or_safepoint and then proceeding into what is
>>>>>> effectively a critical section (by virtue of there being a
>>>>>> safepoint) when the JavaThread is itself in the midst of a
>>>>>> critical section? Do we actively take steps to prevent this
>>>>>> somehow, or to make it safe for the VMThread to proceed?
>>>>>
>>>>> No we don't. I think we have this problem today (not introduced
>>>>> or made worse by this patch).
>>>>
>>>> It is made worse by this patch. Before this patch you have code
>>>> only executed by the VMThread at a safepoint. After the patch you
>>>> have code that may be executed by the VMThread at a safepoint or by
>>>> another thread under a lock (and there may or may not also be a
>>>> safepoint). And this allows the VMThread to execute the same code
>>>> at the same as a lock-holding thread executes it!
>>>
>>> The scenario you describe can not happen. The code you are referring
>>> to is executed either by the VM thread in a safepoint, or by a
>>> concurrent GC thread outside of a safepoint, holding the STS joiner,
>>> and hence effectively blocking concurrent safepoints from entering
>>> the scope. The two can never run concurrent to each other.
>>>
>>> I tried going down the path of not using the STS joiner in an
>>> earlier prototype, but it required major rewrites due to too many
>>> assumptions being violated. And yet I could not convince myself what
>>> I was doing was safe, due to possibly strange interactions with
>>> crazy safepoint operations, like redefinition. So I dropped that and
>>> decided to protect these operations with the STS joiner instead.
>>>
>>>>> I'll file an bug to fix it and hopefully add detection for this.
>>>>> I think we don't need to take CLDG_lock in a safepoint and should
>>>>> prevent doing so, but things like this are more reliable to do
>>>>> with computers than visual inspection. Can you add your
>>>>> suggestions to my RFE?
>>>>> https://bugs.openjdk.java.net/browse/JDK-8213150
>>>>
>>>> This needs to be fixed before enabling the current patch IMHO. At
>>>> least the two usages you mention in your follow up email need some
>>>> kind of guard to ensure a safepoint can't be reached while the lock
>>>> is held. Or the VMThread must also acquire the lock(s) - though if
>>>> NJTs are involved there is a risk of breakage due to lock sneaking!
>>>
>>> These locks are taken under the STS joiner, specifically to dodge
>>> the issues you refer to. The STS joiner makes sure that the locks
>>> you are taking do not suffer from bad lock sneaking interactions,
>>> because they do not take the locks concurrent to the VM thread. So
>>> we have the safety we need. I agree that the CLDG lock does not need
>>> to be taken by the VM thread in a safepoint, simply because it is
>>> unnecessary and risky. But that is orthogonal to this patch.
>>>
>>> Thanks,
>>> /Erik
>>>
>>>> David
>>>> -----
>>>>
>>>>> The CLDG lock can be shared by non-java threads and java threads,
>>>>> which is the point. There may be other locks though.
>>>>
>>>>
>>>>
>>>>>>
>>>>>>>
>>>>>>> The locks added in this patch set though are for the
>>>>>>> NonJavaThreads, who do not do safepoint checks. The
>>>>>>> NonJavaThreads that acquire these locks use the STS joiner
>>>>>>> mechanism which disallows safepoints while being held (and since
>>>>>>> they are non Java threads, do not check for safepoints themselves).
>>>>>>>
>>>>>>> This is how it's going to look for the ZGC caller:
>>>>>>>
>>>>>>> {
>>>>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>>>>> // Unlink the classes.
>>>>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>>>>> unloading_occurred =
>>>>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>>>>> true /* do_cleaning */););
>>>>>>> }
>>>>>>
>>>>>> Somehow I missed the creation/invention of the STS joiner mechanism.
>>>>>>
>>>>>
>>>>> Me too! It's in gc/shared but it's really runtime code, except
>>>>> people in runtime didn't know about it because it's used by GC
>>>>> threads.
>>>>>
>>>>> Let me know if you have more questions and can review this code.
>>>>>
>>>>> Thanks,
>>>>> Coleen
>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Further, are these locks acquired by non-JavaThreads such that
>>>>>>>> the VMThread may be delayed whilst a safepoint is active?
>>>>>>>
>>>>>>> Yes, theoretically they could delay the VMThread from getting to
>>>>>>> a safepoint or doing its work while in a safepoint but the
>>>>>>> threads that take out these locks only hold them for short
>>>>>>> durations.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Coleen
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Coleen
>>>>>>>
>>>>>
>
More information about the hotspot-runtime-dev
mailing list