RFR: 8257831: Suspend with handshakes [v8]
Daniel D.Daugherty
dcubed at openjdk.java.net
Wed Apr 14 16:22:58 UTC 2021
On Wed, 14 Apr 2021 06:48:40 GMT, Robbin Ehn <rehn at openjdk.org> wrote:
>> A suspend request is done by handshaking thread target thread(s). When executing the handshake operation we know the target mutator thread is in a dormant state (as in safepoint safe state). We have a guarantee that it will check it's poll before leaving the dormant state. To stop the thread from leaving the the dormant state we install a second asynchronous handshake to be executed by the targeted thread. The asynchronous handshake will wait on a monitor while the thread is suspended. The target thread cannot not leave the dormant state without a resume request.
>>
>> Per thread suspend requests are naturally serialized by the per thread HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use HandshakeState lock for serializing access to the suspend flag and for wait/notify.
>>
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>> - Set suspended flag
>> - Install asynchronous handshake
>>
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>> - While suspended
>> - Go to blocked
>> - Wait on HandshakeState lock
>>
>> Resume:
>> Resuming thread:
>> - Lock HandshakeState lock
>> - Clear suspended flag
>> - Notify HandshakeState lock
>> - Unlock HandshakeState lock
>>
>> The "suspend requested" flag is an optimization, without it a dormant thread could be suspended and resumed many times and each would add a new asynchronous handshake. Suspend requested flag means there is already an asynchronous suspend handshake in queue which can be re-used, only the suspend flag needs to be set.
>>
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors.
>>
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it."
>>
>> But the code:
>> LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>> err = jvmti->SuspendThreadList(threads_count, threads, results);
>> ...
>> // Allow the Main thread to inspect the result of tested threads suspension
>> agent_unlock(jni);
>>
>> The thread will never return from SuspendThreadList until resumed, so it cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>> // Block until the suspender thread competes the tested threads suspension
>> agent_lock(jni);
>>
>> And never checks and resumes the threads. So I removed that lock instead just sleep and check until all thread have the expected suspended state.
>>
>> ----
>>
>> This version already contains updates after pre-review comments from @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>>
>> ----
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
> Fixed flag undef dep + spelling error
Reviewed v07 incremental. Still looks good.
Update: Did a crawl thru review of v07 full from the bottom up. I'm hoping
the change in review order results in less "change blindness" since I've
reviewed so many times. There are a few left overs that still need to be
address. There are a couple buried in "resolved comments" where it looks
like the issue that I raised is not actually resolved.
src/hotspot/share/runtime/handshake.hpp line 99:
> 97: // but we need to check for a safepoint before.
> 98: // (this is due to a suspension handshake which put the JavaThread in blocked
> 99: // state so a safepoint may be in-progress)
nit: s/(this/(This/
nit: s/in-progress)/in-progress.)/
src/hotspot/share/runtime/handshake.hpp line 166:
> 164: // thread gets suspended again (after a resume)
> 165: // and we have not yet processed it.
> 166: bool _async_suspend_handshake;
Does this need to be `volatile`?
src/hotspot/share/runtime/handshake.hpp line 177:
> 175: void set_suspended(bool to) { return Atomic::store(&_suspended, to); }
> 176: bool has_async_suspend_handshake() { return _async_suspend_handshake; }
> 177: void set_async_suspend_handshake(bool to) { _async_suspend_handshake = to; }
Does this need to be an Atomic::store?
src/hotspot/share/runtime/objectMonitor.cpp line 424:
> 422: // thread that suspended us.
> 423: _recursions = 0;
> 424: _succ = NULL;
You might want to add a comment here:
// Don't need a full fence after clearing successor here because of the call to exit().
-------------
Marked as reviewed by dcubed (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3191
More information about the serviceability-dev
mailing list