RFR: 8257831: Suspend with handshakes [v10]

David Holmes dholmes at openjdk.java.net
Mon Apr 19 06:19:00 UTC 2021


On Fri, 16 Apr 2021 06:44:06 GMT, Robbin Ehn <rehn at openjdk.org> wrote:

>> A suspend request is done by handshaking thread target thread(s). When executing the handshake operation we know the target mutator thread is in a dormant state (as in safepoint safe state). We have a guarantee that it will check it's poll before leaving the dormant state. To stop the thread from leaving the the dormant state we install a second asynchronous handshake to be executed by the targeted thread. The asynchronous handshake will wait on a monitor while the thread is suspended. The target thread cannot not leave the dormant state without a resume request.
>> 
>> Per thread suspend requests are naturally serialized by the per thread HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use HandshakeState lock for serializing access to the suspend flag and for wait/notify. 
>> 
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>> 	- Set suspended flag
>> 	- Install asynchronous handshake
>> 
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>> 	- While suspended
>> 	- Go to blocked
>> 	- Wait on HandshakeState lock
>> 
>> Resume:
>> Resuming thread:
>> 	- Lock HandshakeState lock
>> 	- Clear suspended flag
>> 	- Notify HandshakeState lock
>> 	- Unlock HandshakeState lock
>> 
>> The "suspend requested" flag is an optimization, without it a dormant thread could be suspended and resumed many times and each would add a new asynchronous handshake. Suspend requested flag means there is already an asynchronous suspend handshake in queue which can be re-used, only the suspend flag needs to be set.
>> 
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors.
>> 
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it."
>> 
>> But the code:
>>   LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>>   err = jvmti->SuspendThreadList(threads_count, threads, results);
>>   ...
>>   // Allow the Main thread to inspect the result of tested threads suspension	
>>   agent_unlock(jni);
>>   
>> The thread will never return from SuspendThreadList until resumed, so it cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>>   // Block until the suspender thread competes the tested threads suspension	
>>   agent_lock(jni);
>> 
>> And never checks and resumes the threads. So I removed that lock instead just sleep and check until all thread have the expected suspended state.
>> 
>> ----
>> 
>> This version already contains updates after pre-review comments from @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>> 
>> ---- 
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fixed nits

Hi Robbin,

I have gone through everything again in detail. A few comments on comments and a couple of details I'm still not really clear on.

A couple of small renaming requests, but otherwise this is good to go - with the caveat of a future RFE to clear up the suspend-while-locking implementations so things are more cleanly encapsulated.

Thanks,
David

src/hotspot/share/runtime/handshake.cpp line 630:

> 628: // This is the closure that prevents a suspended JavaThread from
> 629: // escaping the suspend request.
> 630: class ThreadSuspensionHandshake : public AsyncHandshakeClosure {

I still find ThreadSuspensionHandShake versus SuspendThreadHandshake to be too similarly named and with no obvious way to remember which one is which. Perhaps this one could be called ThreadSelfSuspensionHandshake instead?

src/hotspot/share/runtime/handshake.cpp line 636:

> 634:     JavaThread* target = thr->as_Java_thread();
> 635:     target->handshake_state()->do_self_suspend();
> 636:   }

As this must be the current thread we are dealing with can we rename the variables to indicate that.

src/hotspot/share/runtime/handshake.hpp line 128:

> 126:   // must take slow path, process_by_self(), if _lock is held.
> 127:   bool should_process() {
> 128:     // When doing thread suspension the holder of the _lock

Potentially this could be any async handshake operation - right? It seems odd to talk specifically about thread suspension in the core of the handshake code.

src/hotspot/share/runtime/handshake.hpp line 131:

> 129:     // can add an asynchronous handshake to queue.
> 130:     // To make sure it is seen by the handshakee, the handshakee must first
> 131:     // check the _lock, if held go to slow path.

..., _and_ if held ...

src/hotspot/share/runtime/handshake.hpp line 133:

> 131:     // check the _lock, if held go to slow path.
> 132:     // Since the handshakee is unsafe if _lock gets locked after this check
> 133:     // we know another thread cannot process any handshakes.

I can't quite understand this comment. I'm not sure what thread is calling this method and when, in relation to what the handshakee may be doing.

src/hotspot/share/runtime/handshake.hpp line 134:

> 132:     // Since the handshakee is unsafe if _lock gets locked after this check
> 133:     // we know another thread cannot process any handshakes.
> 134:     // Now we can check queue if there is anything we should process.

// Now we can check the queue to see if there is anything we should process.

src/hotspot/share/runtime/handshake.hpp line 157:

> 155:   Thread* active_handshaker() const { return _active_handshaker; }
> 156: 
> 157:   // Suspend/resume support

Taking a step back I can't help but think that this is entirely the wrong place to have the suspend/resume API and supporting code. It is only here because we re-use the HandshakeState _lock monitor right? If we introduced a new thread-suspension monitor instead then this code would all reside somewhere else - probably in the JavaThread class.

src/hotspot/share/runtime/mutex.hpp line 70:

> 68:        tty            = access         +   2,
> 69:        special        = tty            +   3,
> 70:        oopstorage     = special        +   3,

Why +3? I'm assuming to keep the same numeric value as before, but that doesn't seem necessary and just seems to add to the mystery of why these ranks are defined in such a strange way.

src/hotspot/share/runtime/thread.cpp line 1444:

> 1442:     }
> 1443: 
> 1444:     // The careful dance between thread suspension and exit is handled here.

:) It isn't really a "careful dance" any more, as it is very straight forward. exit() can't race with a suspend request as such because we won't hit a handshake polling point.

src/hotspot/share/runtime/thread.cpp line 1793:

> 1791: bool JavaThread::java_suspend() {
> 1792:   ThreadsListHandle tlh;
> 1793:   if (!tlh.includes(this)) {

Pre-existing: this check bothers me. Surely we should just be asserting "this" thread is in a TL as we are supposed to have ensured that before we even got this far. Future RFE.

src/hotspot/share/runtime/thread.cpp line 1814:

> 1812: // state before returning are performed. The current thread is required to
> 1813: // change to _thread_blocked in order to be seen to be safepoint/handshake safe
> 1814: // whilst suspended and only after becoming handshake safe, the other thread can

It is still not at all clear to me what suspended has to do with the execution of this method. The new code just ignores thread suspension. It seems to me that the old code could also have ignored suspension if the checks in handle_special_runtime_exit_condition had be reordered.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Thread.java line 83:

> 81:   }
> 82: 
> 83:   public boolean isExtSuspended() {

A new comment to avoid the "resolved" conversation. I'm still not sure you can just delete this API from the SA. @sspitsyn do you know what the obligations are with respect to the S.A here?

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3191


More information about the hotspot-runtime-dev mailing list