RFR (S) 8213092: Add more runtime locks for concurrent class unloading
Erik Österlund
erik.osterlund at oracle.com
Wed Oct 31 08:19:21 UTC 2018
Hi David,
On 2018-10-31 01:56, David Holmes wrote:
> On 30/10/2018 11:16 PM, coleen.phillimore at oracle.com wrote:
>> On 10/30/18 8:36 AM, David Holmes wrote:
>>> On 30/10/2018 10:24 PM, coleen.phillimore at oracle.com wrote:
>>>> On 10/30/18 12:17 AM, David Holmes wrote:
>>>>> Hi Coleen,
>>>>>
>>>>> On 30/10/2018 1:45 PM, coleen.phillimore at oracle.com wrote:
>>>>>> Summary: Add locks for calling CLDG::purge concurrently as well
>>>>>> and for calling SystemDictionary::do_unloading concurrently.
>>>>>>
>>>>>> Ran linux-x64 tier1-6 through mach5 and hotspot/jtreg/runtime
>>>>>> tests, which include the module tests.
>>>>>>
>>>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8213092.01/webrev
>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8213092
>>>>>
>>>>> So ... all the locks covered by an assert_locked_or_safepoint, or
>>>>> which are acquired by the VMThread at a safepoint, must never be
>>>>> held by a JavaThread if it could reach a safepoint whilst that lock
>>>>> is held - else we could deadlock. So can we check that with
>>>>> NoSafepointVerifiers?
>>>>
>>>> Actually I think this is not possible to add NSV. You can acquire
>>>> the ClassLoaderDataGraph_lock and then the Module_lock. The latter
>>>> would check for a safepoint also for a Java thread. This is
>>>> currently done for Jvmti and JFR, but not in other code that I can
>>>> see. I don't actually know how to fix this problem.
>>>
>>> This seems risky. If a JavaThread can hold the CLDG_lock while
>>> blocked at a safepoint (acquiring the Module_lock), then what is to
>>> stop the VMThread from hitting one of these sections of code
>>> protected by locked_or_safepoint and then proceeding into what is
>>> effectively a critical section (by virtue of there being a safepoint)
>>> when the JavaThread is itself in the midst of a critical section? Do
>>> we actively take steps to prevent this somehow, or to make it safe
>>> for the VMThread to proceed?
>>
>> No we don't. I think we have this problem today (not introduced or
>> made worse by this patch).
>
> It is made worse by this patch. Before this patch you have code only
> executed by the VMThread at a safepoint. After the patch you have code
> that may be executed by the VMThread at a safepoint or by another thread
> under a lock (and there may or may not also be a safepoint). And this
> allows the VMThread to execute the same code at the same as a
> lock-holding thread executes it!
The scenario you describe can not happen. The code you are referring to
is executed either by the VM thread in a safepoint, or by a concurrent
GC thread outside of a safepoint, holding the STS joiner, and hence
effectively blocking concurrent safepoints from entering the scope. The
two can never run concurrent to each other.
I tried going down the path of not using the STS joiner in an earlier
prototype, but it required major rewrites due to too many assumptions
being violated. And yet I could not convince myself what I was doing was
safe, due to possibly strange interactions with crazy safepoint
operations, like redefinition. So I dropped that and decided to protect
these operations with the STS joiner instead.
>> I'll file an bug to fix it and hopefully add detection for this. I
>> think we don't need to take CLDG_lock in a safepoint and should
>> prevent doing so, but things like this are more reliable to do with
>> computers than visual inspection. Can you add your suggestions to my
>> RFE? https://bugs.openjdk.java.net/browse/JDK-8213150
>
> This needs to be fixed before enabling the current patch IMHO. At least
> the two usages you mention in your follow up email need some kind of
> guard to ensure a safepoint can't be reached while the lock is held. Or
> the VMThread must also acquire the lock(s) - though if NJTs are involved
> there is a risk of breakage due to lock sneaking!
These locks are taken under the STS joiner, specifically to dodge the
issues you refer to. The STS joiner makes sure that the locks you are
taking do not suffer from bad lock sneaking interactions, because they
do not take the locks concurrent to the VM thread. So we have the safety
we need. I agree that the CLDG lock does not need to be taken by the VM
thread in a safepoint, simply because it is unnecessary and risky. But
that is orthogonal to this patch.
Thanks,
/Erik
> David
> -----
>
>> The CLDG lock can be shared by non-java threads and java threads,
>> which is the point. There may be other locks though.
>
>
>
>>>
>>>>
>>>> The locks added in this patch set though are for the NonJavaThreads,
>>>> who do not do safepoint checks. The NonJavaThreads that acquire
>>>> these locks use the STS joiner mechanism which disallows safepoints
>>>> while being held (and since they are non Java threads, do not check
>>>> for safepoints themselves).
>>>>
>>>> This is how it's going to look for the ZGC caller:
>>>>
>>>> {
>>>> SuspendibleThreadSetJoiner sts_joiner;
>>>> // Unlink the classes.
>>>> MutexLockerEx ml(ClassLoaderDataGraph_lock);
>>>> unloading_occurred =
>>>> SystemDictionary::do_unloading(ZStatPhase::timer(),
>>>> true /* do_cleaning */););
>>>> }
>>>
>>> Somehow I missed the creation/invention of the STS joiner mechanism.
>>>
>>
>> Me too! It's in gc/shared but it's really runtime code, except
>> people in runtime didn't know about it because it's used by GC threads.
>>
>> Let me know if you have more questions and can review this code.
>>
>> Thanks,
>> Coleen
>>
>>> David
>>>
>>>>
>>>>>
>>>>> Further, are these locks acquired by non-JavaThreads such that the
>>>>> VMThread may be delayed whilst a safepoint is active?
>>>>
>>>> Yes, theoretically they could delay the VMThread from getting to a
>>>> safepoint or doing its work while in a safepoint but the threads
>>>> that take out these locks only hold them for short durations.
>>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>> Thanks,
>>>>>> Coleen
>>>>
>>
More information about the hotspot-runtime-dev
mailing list