RFR: 8274282: Clarify special wait assert [v2]
coleen.phillimore at oracle.com
coleen.phillimore at oracle.com
Tue Oct 5 12:31:34 UTC 2021
On 10/4/21 7:06 PM, David Holmes wrote:
> On 4/10/2021 11:16 pm, Coleen Phillimore wrote:
>> On Fri, 1 Oct 2021 16:46:33 GMT, Patricio Chilano Mateo
>> <pchilanomate at openjdk.org> wrote:
>>
>>>> src/hotspot/share/runtime/mutex.cpp line 388:
>>>>
>>>>> 386: Mutex* least =
>>>>> get_least_ranked_lock_besides_this(locks_owned);
>>>>> 387: // We enforce not holding locks of rank nosafepoint or
>>>>> lower while waiting for JavaThreads,
>>>>> 388: // because the held lock has a NoSafepointVerifier so
>>>>> waiting on a lock will block out safepoints.
>>>>
>>>> The NSV is not the reason, it is a mechanism to detect violation
>>>> of the rules. The reason we must not block in a JavaThread is
>>>> because it will block without a safepoint check and without
>>>> becoming safepoint-safe, thus blocking out safepoints while the
>>>> wait is in progress.
>>>>
>>>> Also note that this applies to any wait() on a nosafepoint lock,
>>>> not just one done while holding a safepoint lock.
>>>
>>> I can probably shed some light on the origin of this comment since
>>> it came from discussions I had with Coleen. We were looking at cases
>>> like the Service_lock, where the ServiceThread waits on a lock with
>>> rank <= nosafepoint but within the context of a ThreadBlockInVM. If
>>> a JT would call wait() on a nosafepoint lock and we detect it holds
>>> some other nosafepoint lock, then that would imply there was no
>>> immediate TBIVM, like with the Service_lock case, because that would
>>> have asserted due to the NSV and so we know the call will block
>>> safepoints (assuming no manual changes to _thread_blocked that would
>>> bypass the check_possible_safepoint() call or some weird use of the
>>> TBIVM like: TBIVM, lock(a), lock(b), wait(b), where a and b are
>>> nosafepoint locks). But as you pointed out the issue of this JT
>>> blocking safepoints is a consequence of waiting on nosafepoint
>>> locks, regardless of which other locks the JT is holding. It just
>>> happens that if the JT is holding other nosafepoint locks then we
>> already know this will block out safepoints (ruling out those
>> behaviors described above).
>>>
>>> I also just realized that waiting while holding a nosafepoint lock
>>> could also block out safepoints due to other JT's being blocked in a
>>> safepoint-unsafe state while trying to lock that monitor. So even if
>>> the waiting JT did a manual transition to _thread_blocked or used
>>> the TBIVM in some weird way, safepoints could still be blocked out
>>> due to those other JT's.
>>>
>>> But in any case, if we want to verify if a wait() call might block
>>> safepoints, we should verify all cases where this JT will wait in an
>>> unsafe state, and that means checking whether the current lock is
>>> nosafepoint and the current state of the JT: "if
>>> thread->is_Java_thread() && this->rank <= Mutex::nosafepoint &&
>>> (thread->thread_state() != _thread_blocked)". Today that would
>>> assert since we have cases where a JT waits on a nosafepoint lock
>>> without being _thread_blocked, mainly os::create_thread() and
>>> JfrThreadSampler::on_javathread_suspend() from some tests I run.
>>> VMThread::wait_for_vm_thread_exit() and
>>> ThreadsSMRSupport::wait_until_not_protected() also assert but in
>>> those cases the JT was already removed from the JT list so it
>>> wouldn't be blocking safepoints.
>>
>> Thank you for the explanation and further experiments @pchilano .
>>
>>> VMThread::wait_for_vm_thread_exit() and
>>> ThreadsSMRSupport::wait_until_not_protected() also assert but in
>>> those cases the JT was already removed from the JT list so it
>>> wouldn't be blocking safepoints.
>>
>> I wonder if the check should be is_active_Java_thread instead?
>>
>> The Service_lock, EscapeBarrier_lock, MonitorDeflation_lock,
>> Notification_lock, and CodeSweeper_lock are held with a TBIVM (same
>> mechanism). These locks are fairly low level locks and don't check
>> for safepoints.
>>
>> I think the problematic scenario that Patricio is referring to is:
>> Thread1: TBIVM (state == _thread_blocked), comes out of wait so owns
>> Service_lock, does things while safepoint safe
>> Thread2: waiting on Service_lock, so blocking safepoint
>> But then Thread1 will come back to wait when done with things,
>> Thread2 will proceed then get to it's safepoint check when done with
>> eg. Service_lock.
>
> The ServiceThread releases the Service_lock before doing any
> significant work and returns to _thread_in_vm.
Yes except in the oops_do functions, the Service_lock is held while
walking the static deferred JVMTI events. But the Service_lock waiting
in TBIVM is safe.
Coleen
>
> David
> -----
>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/5684
>>
More information about the hotspot-runtime-dev
mailing list