RFR: 8257831: Suspend with handshakes [v2]

Tue Mar 30 21:16:37 UTC 2021

On Fri, 26 Mar 2021 13:21:43 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> A suspend request is done by handshaking thread target thread(s). When executing the handshake operation we know the target mutator thread is in a dormant state (as in safepoint safe state). We have a guarantee that it will check it's poll before leaving the dormant state. To stop the thread from leaving the the dormant state we install a second asynchronous handshake to be executed by the targeted thread. The asynchronous handshake will wait on a monitor while the thread is suspended. The target thread cannot not leave the dormant state without a resume request.
>> 
>> Per thread suspend requests are naturally serialized by the per thread HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use HandshakeState lock for serializing access to the suspend flag and for wait/notify. 
>> 
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>> 	- Set suspended flag
>> 	- Install asynchronous handshake
>> 
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>> 	- While suspended
>> 	- Go to blocked
>> 	- Wait on HandshakeState lock
>> 
>> Resume:
>> Resuming thread:
>> 	- Lock HandshakeState lock
>> 	- Clear suspended flag
>> 	- Notify HandshakeState lock
>> 	- Unlock HandshakeState lock
>> 
>> The "suspend requested" flag is an optimization, without it a dormant thread could be suspended and resumed many times and each would add a new asynchronous handshake. Suspend requested flag means there is already an asynchronous suspend handshake in queue which can be re-used, only the suspend flag needs to be set.
>> 
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors.
>> 
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it."
>> 
>> But the code:
>>   LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>>   err = jvmti->SuspendThreadList(threads_count, threads, results);
>>   ...
>>   // Allow the Main thread to inspect the result of tested threads suspension	
>>   agent_unlock(jni);
>>   
>> The thread will never return from SuspendThreadList until resumed, so it cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>>   // Block until the suspender thread competes the tested threads suspension	
>>   agent_lock(jni);
>> 
>> And never checks and resumes the threads. So I removed that lock instead just sleep and check until all thread have the expected suspended state.
>> 
>> ----
>> 
>> This version already contains updates after pre-review comments from @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>> 
>> ---- 
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
> 
>  - Merge branch 'master' into SuspendInHandshake
>  - 8257831: Suspend with handshake (review baseline)

Hi Robbin,

this is a great simplification. Excellent work! :)

Thanks, Richard.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 318:

> 316:   if (self->is_Java_thread()) {
> 317:     jt = self->as_Java_thread();
> 318:     while (true) {

>From the PR description:

> Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors.

I read this section but then forgot about it. So you can skip the following comment.

Possible follow-up: It could be worth the attempt to make use of the generic state transition classes provided by interface.hpp. Directly I don't see why this wouldn't be feasible and I doubt the [old comment in JvmtiEnv::RawMonitorEnter()](https://github.com/reinrich/jdk/blob/aefc1560b51f0ce96d8f5ce396ba0d2fe08fd650/src/hotspot/share/prims/jvmtiEnv.cpp#L3208-L3214).
This would make the custom suspend logic here superfluous.
If ThreadBlockInVM was changed to take a generic callback for unlocking, then this could replace the custom logic in L360-L374.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 323:

> 321:       // Suspend request flag can only be set in handshakes.
> 322:       // By blocking handshakes, suspend request flag cannot change its value.
> 323:       if (!jt->handshake_state()->is_suspend_requested()) {

Shouldn't `jt->handshake_state()->is_suspend_requested()` be replaced with `jt->is_suspended()`? See also comment on declaration of `_suspend_requested`.

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 364:

> 362:     for (;;) {
> 363:       simple_enter(jt);
> 364:       if (!SafepointMechanism::should_process(jt)) {

It seems to be likely that the condition is false in the first loop iteration and the thread has to do another iteration even if not suspended. Wouldn't it be ok to break from the loop if `!jt->is_suspended()`?
I'm asking because in ObjectMonitor::enter() L414 there is similar code and the condition there is `SafepointMechanism::should_process(current) && current->suspend_request_pending()`

src/hotspot/share/prims/jvmtiRawMonitor.cpp line 421:

> 419:     guarantee(jt->thread_state() == _thread_in_native, "invariant");
> 420:     for (;;) {
> 421:       if (!SafepointMechanism::should_process(jt)) {

Potential follow-up enhancement: there's a very similar for(;;) loop in JvmtiRawMonitor::raw_enter(). It looks as if it was possible to merge both loops into just one loop in simple_enter().

src/hotspot/share/runtime/handshake.cpp line 485:

> 483:       } else {
> 484:         // Asynchronous may block so they may not execute ~PreserveExceptionMark before safepointing
> 485:         // in outer loop.

Sorry, I don't understand the comment.

src/hotspot/share/runtime/handshake.cpp line 486:

> 484:         // Asynchronous may block so they may not execute ~PreserveExceptionMark before safepointing
> 485:         // in outer loop.
> 486:         op->do_handshake(_handshakee);

Maybe add PauseNoSafepointVerifier to document that the current thread can transition to a safe state?

src/hotspot/share/runtime/handshake.cpp line 492:

> 490:       }
> 491:     } else {
> 492:       return true;

`true` as return value means that the caller does _not_ need to check again for a safepoint, correct? Maybe this could be added as comment to the method declaration?

src/hotspot/share/runtime/handshake.cpp line 617:

> 615:     {
> 616:       MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 617:       if (!is_suspend_requested()) {

The current thread is _thread_in_vm. Can it be suspended then? For a suspend it has to be in a safe thread state which it cannot leave while suspended. So it must have reached _thread_in_vm not suspended and it cannot be suspended while in that state.

src/hotspot/share/runtime/handshake.cpp line 715:

> 713: bool HandshakeState::resume() {
> 714:   // Can't be suspended if there is no suspend request.
> 715:   if (!is_suspend_requested()) {

Shouldn't `is_suspend_requested()` be replaced with `is_suspended()`? See also comment on declaration of `_suspend_requested`.

src/hotspot/share/runtime/handshake.cpp line 720:

> 718:   MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 719:   // Can't be suspended if there is no suspend request.
> 720:   if (!is_suspend_requested()) {

Shouldn't `is_suspend_requested()` be replaced with `is_suspended()`? See also comment on declaration of `_suspend_requested`.

src/hotspot/share/runtime/handshake.hpp line 142:

> 140:  private:
> 141:   volatile bool _suspended;
> 142:   volatile bool _suspend_requested;

According to the PR description `_suspend_requested` is an optimization.

> The "suspend requested" flag is an optimization, without it a dormant thread
> could be suspended and resumed many times and each would add a new
> asynchronous handshake. Suspend requested flag means there is already an
> asynchronous suspend handshake in queue which can be re-used, only the suspend
> flag needs to be set.

I think there are a few places where _suspend_requested is queried by mistake instead of _suspended. Maybe it would help to prevent this if _suspend_requested was renamed to something that better describes its purpose, e.g. _has_async_suspend_handshake or similar.

src/hotspot/share/runtime/mutex.cpp line 244:

> 242:     wait_status = _lock.wait(timeout);
> 243:     in_flight_mutex = this;  // save for ~ThreadBlockInVM
> 244: 

Empty line can be removed.

src/hotspot/share/runtime/mutexLocker.hpp line 260:

> 258: 
> 259:   bool wait(int64_t timeout = 0,
> 260:             bool as_suspend_equivalent = !Mutex::_as_suspend_equivalent_flag) {

The declaration of Mutex::_as_suspend_equivalent_flag should be removed. You might want to grep for 'equivalent' to find more that can be removed.

src/hotspot/share/runtime/objectMonitor.cpp line 415:

> 413:       current->set_thread_state_fence(_thread_blocked_trans);
> 414:       if (SafepointMechanism::should_process(current) &&
> 415:         current->suspend_request_pending()) {

Shouldn't `suspend_request_pending()` be replaced with `is_suspended()`? See also comment on declaration of `_suspend_requested`.
And isn't checking `SafepointMechanism::should_process(current)` redundant?

src/hotspot/share/runtime/objectMonitor.cpp line 973:

> 971:       if (SafepointMechanism::should_process(current)) {
> 972:         if (_succ == current) {
> 973:             _succ = NULL;

IIUC then this is now not only executed if a thread is suspended but also when there's a safepoint / handshake. I tried to understand the effect of this but failed with timeout ;)

src/hotspot/share/runtime/os.cpp line 874:

> 872: 
> 873: void os::start_thread(Thread* thread) {
> 874:   if (thread->is_Java_thread()) {

Then and else blocks seem to do the very same things.

src/hotspot/share/runtime/safepointMechanism.cpp line 101:

> 99:         !thread->handshake_state()->process_by_self()) {
> 100:       need_rechecking = true;
> 101:     }

What about this version
    if (thread->handshake_state()->should_process()) {
      need_rechecking = !thread->handshake_state()->process_by_self();
    }
or even 
    need_rechecking =
        thread->handshake_state()->should_process() && !thread->handshake_state()->process_by_self();
With the latter you could eliminate L82
      need_rechecking = false;
Also I'd find it more natural if `process_by_self()` could return true if rechecking is needed.

src/hotspot/share/runtime/sweeper.cpp line 276:

> 274: 
> 275:     ThreadBlockInVM tbivm(thread);
> 276:     thread->java_suspend_self();

In the baseline version of NMethodSweeper::handle_safepoint_request() there is a call `thread->java_suspend_self()`. This call will immediately return, won't it? Do you know what the purpose of this call was? The destructor of ThreadBlockInVM will block the current thread for the safepoint. So `java_suspend_self()` seems redundant.

-------------

Changes requested by rrich (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3191