RFR: 8274282: Clarify special wait assert [v2]

Tue Oct 5 12:31:34 UTC 2021

On 10/4/21 7:06 PM, David Holmes wrote:
> On 4/10/2021 11:16 pm, Coleen Phillimore wrote:
>> On Fri, 1 Oct 2021 16:46:33 GMT, Patricio Chilano Mateo 
>> <pchilanomate at openjdk.org> wrote:
>>
>>>> src/hotspot/share/runtime/mutex.cpp line 388:
>>>>
>>>>> 386:     Mutex* least = 
>>>>> get_least_ranked_lock_besides_this(locks_owned);
>>>>> 387:     // We enforce not holding locks of rank nosafepoint or 
>>>>> lower while waiting for JavaThreads,
>>>>> 388:     // because the held lock has a NoSafepointVerifier so 
>>>>> waiting on a lock will block out safepoints.
>>>>
>>>> The NSV is not the reason, it is  a mechanism to detect violation 
>>>> of the rules. The reason we must not block in a JavaThread is 
>>>> because it will block without a safepoint check and without 
>>>> becoming safepoint-safe, thus blocking out safepoints while the 
>>>> wait is in progress.
>>>>
>>>> Also note that this applies to any wait() on a nosafepoint lock, 
>>>> not just one done while holding a safepoint lock.
>>>
>>> I can probably shed some light on the origin of this comment since 
>>> it came from discussions I had with Coleen. We were looking at cases 
>>> like the Service_lock, where the ServiceThread waits on a lock with 
>>> rank <= nosafepoint but within the context of a ThreadBlockInVM. If 
>>> a JT would call wait() on a nosafepoint lock and we detect it holds 
>>> some other nosafepoint lock, then that would imply there was no 
>>> immediate TBIVM, like with the Service_lock case, because that would 
>>> have asserted due to the NSV and so we know the call will block 
>>> safepoints (assuming no manual changes to _thread_blocked that would 
>>> bypass the check_possible_safepoint() call or some weird use of the 
>>> TBIVM like: TBIVM, lock(a), lock(b), wait(b), where a and b are 
>>> nosafepoint locks). But as you pointed out the issue of this JT 
>>> blocking safepoints is a consequence of waiting on nosafepoint 
>>> locks, regardless of which other locks the JT is holding. It just 
>>> happens that if the JT is holding other nosafepoint locks then we
>>    already know this will block out safepoints (ruling out those 
>> behaviors described above).
>>>
>>> I also just realized that waiting while holding a nosafepoint lock 
>>> could also block out safepoints due to other JT's being blocked in a 
>>> safepoint-unsafe state while trying to lock that monitor. So even if 
>>> the waiting JT did a manual transition to _thread_blocked or used 
>>> the TBIVM in some weird way, safepoints could still be blocked out 
>>> due to those other JT's.
>>>
>>> But in any case, if we want to verify if a wait() call might block 
>>> safepoints, we should verify all cases where this JT will wait in an 
>>> unsafe state, and that means checking whether the current lock is 
>>> nosafepoint and the current state of the JT: "if 
>>> thread->is_Java_thread() && this->rank <= Mutex::nosafepoint && 
>>> (thread->thread_state() != _thread_blocked)". Today that would 
>>> assert since we have cases where a JT waits on a nosafepoint lock 
>>> without being _thread_blocked, mainly os::create_thread() and 
>>> JfrThreadSampler::on_javathread_suspend() from some tests I run. 
>>> VMThread::wait_for_vm_thread_exit() and 
>>> ThreadsSMRSupport::wait_until_not_protected() also assert but in 
>>> those cases the JT was already removed from the JT list so it 
>>> wouldn't be blocking safepoints.
>>
>> Thank you for the explanation and further experiments @pchilano .
>>
>>> VMThread::wait_for_vm_thread_exit() and 
>>> ThreadsSMRSupport::wait_until_not_protected() also assert but in 
>>> those cases the JT was already removed from the JT list so it 
>>> wouldn't be blocking safepoints.
>>
>> I wonder if the check should be is_active_Java_thread instead?
>>
>> The Service_lock, EscapeBarrier_lock, MonitorDeflation_lock, 
>> Notification_lock, and CodeSweeper_lock are held with a TBIVM (same 
>> mechanism).  These locks are fairly low level locks and don't check 
>> for safepoints.
>>
>> I think the problematic scenario that Patricio is referring to is:
>> Thread1: TBIVM (state == _thread_blocked), comes out of wait so owns 
>> Service_lock, does things while safepoint safe
>> Thread2: waiting on Service_lock, so blocking safepoint
>> But then Thread1 will come back to wait when done with things, 
>> Thread2 will proceed then get to it's safepoint check when done with 
>> eg. Service_lock.
>
> The ServiceThread releases the Service_lock before doing any 
> significant work and returns to _thread_in_vm.

Yes except in the oops_do functions, the Service_lock is held while 
walking the static deferred JVMTI events.  But the Service_lock waiting 
in TBIVM is safe.

Coleen

>
> David
> -----
>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/5684
>>