RFR: 8274282: Clarify special wait assert [v2]

Fri Oct 1 16:49:34 UTC 2021

On Wed, 29 Sep 2021 04:26:10 GMT, David Holmes <dholmes at openjdk.org> wrote:

>> Coleen Phillimore has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR.
>
> src/hotspot/share/runtime/mutex.cpp line 388:
> 
>> 386:     Mutex* least = get_least_ranked_lock_besides_this(locks_owned);
>> 387:     // We enforce not holding locks of rank nosafepoint or lower while waiting for JavaThreads,
>> 388:     // because the held lock has a NoSafepointVerifier so waiting on a lock will block out safepoints.
> 
> The NSV is not the reason, it is  a mechanism to detect violation of the rules. The reason we must not block in a JavaThread is because it will block without a safepoint check and without becoming safepoint-safe, thus blocking out safepoints while the wait is in progress.
> 
> Also note that this applies to any wait() on a nosafepoint lock, not just one done while holding a safepoint lock.

I can probably shed some light on the origin of this comment since it came from discussions I had with Coleen. We were looking at cases like the Service_lock, where the ServiceThread waits on a lock with rank <= nosafepoint but within the context of a ThreadBlockInVM. If a JT would call wait() on a nosafepoint lock and we detect it holds some other nosafepoint lock, then that would imply there was no immediate TBIVM, like with the Service_lock case, because that would have asserted due to the NSV and so we know the call will block safepoints (assuming no manual changes to _thread_blocked that would bypass the check_possible_safepoint() call or some weird use of the TBIVM like: TBIVM, lock(a), lock(b), wait(b), where a and b are nosafepoint locks). But as you pointed out the issue of this JT blocking safepoints is a consequence of waiting on nosafepoint locks, regardless of which other locks the JT is holding. It just happens that if the JT is holding other nosafepoint locks then we a
 lready know this will block out safepoints (ruling out those behaviors described above).

I also just realized that waiting while holding a nosafepoint lock could also block out safepoints due to other JT's being blocked in a safepoint-unsafe state while trying to lock that monitor. So even if the waiting JT did a manual transition to _thread_blocked or used the TBIVM in some weird way, safepoints could still be blocked out due to those other JT's.

But in any case, if we want to verify if a wait() call might block safepoints, we should verify all cases where this JT will wait in an unsafe state, and that means checking whether the current lock is nosafepoint and the current state of the JT: "if thread->is_Java_thread() && this->rank <= Mutex::nosafepoint && (thread->thread_state() != _thread_blocked)". Today that would assert since we have cases where a JT waits on a nosafepoint lock without being _thread_blocked, mainly os::create_thread() and JfrThreadSampler::on_javathread_suspend() from some tests I run. VMThread::wait_for_vm_thread_exit() and ThreadsSMRSupport::wait_until_not_protected() also assert but in those cases the JT was already removed from the JT list so it wouldn't be blocking safepoints.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5684