RFR (S): 8218446: SuspendAtExit hangs
David Holmes
david.holmes at oracle.com
Mon Mar 18 05:42:51 UTC 2019
Bug: https://bugs.openjdk.java.net/browse/JDK-8218446
webrev: http://cr.openjdk.java.net/~dholmes/8218446/webrev/
This test, when run standalone, unintentionally exposed a long time bug
in the Suspend/Resume protocol. There are lots of details in the bug
report, but basically if you encountered a suspend request along this path:
JavaThread::check_safepoint_and_suspend_for_native_trans
-> SafepointMechanism::block_if_requested_slow
-> Safepoint::block() // unblocks after safepoint
-> JavaThread::handle_special_runtime_exit_condition
the thread is in _thread_in_native_trans state, and the code it executes
explicitly leaves it in that state when calling java_suspend_self():
// Because thread is external suspended the safepoint code will count
// thread as at a safepoint. This can be odd because we can be here
// as _thread_in_Java which would normally transition to
_thread_blocked
// at a safepoint. We would like to mark the thread as _thread_blocked
// before calling java_suspend_self like all other callers of it but
// we must then observe proper safepoint protocol. (We can't leave
// _thread_blocked with a safepoint in progress). However we can be
// here as _thread_in_native_trans so we can't use a normal transition
// constructor/destructor pair because they assert on that type of
// transition. We could do something like:
//
// JavaThreadState state = thread_state();
// set_thread_state(_thread_in_vm);
// {
// ThreadBlockInVM tbivm(this);
// java_suspend_self()
// }
// set_thread_state(_thread_in_vm_trans);
// if (safepoint) block;
// set_thread_state(state);
//
// but that is pretty messy. Instead we just go with the way the
// code has worked before and note that this is the only path to
// java_suspend_self that doesn't put the thread in _thread_blocked
// mode.
unfortunately the thread that issues the suspend() is looping inside
is_ext_suspend_completed() waiting for it to move out of the trans state
(to _thread_blocked):
// We wait for the thread to transition to a more usable state.
for (int i = 1; i <= SuspendRetryCount; i++) {
SR_lock()->wait(!Thread::current()->is_Java_thread(), i * delay);
// check the actual thread state instead of what we saved above
if (thread_state() != _thread_in_native_trans) {
// the thread has transitioned to another thread state so
// try all the checks (except this one) one more time.
do_trans_retry = true;
break;
}
}
After ~6.375 seconds we will exit the loop regardless and then take the
VM to a safepoint to "force" suspension of the target thread (which was
actually suspended anyway). In the test we issue back-to-back
suspend()/resume() up to 10000 times which means we can hit this 6+
second delay frequently (test augmentation showed delays of ~7000 seconds).
The fix is quite simple: we put the thread in the _thread_blocked state
exactly as we already do for the suspend path in
JavaThread::check_special_condition_for_native_trans.
With that fix SuspendAt Exit no longer appears to hang.
Additional testing: (all tests that use suspend() just for good measure)
(in progress)
- hotspot
- vmtestbase/nsk/jdi
- runtime/Thread/SuspendAtExit.java
- runtime/handshake/HandshakeWalkSuspendExitTest.java
- runtime/jni/terminatedThread/TestTerminatedThread.java
- vmTestbase/nsk/jvmti/GetThreadState/thrstat002/
- jdk
- com/sun/jdi/PopAsynchronousTest.java
- java/nio/channels/SocketChannel/SendUrgentData.java
- java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java
- java/lang/ThreadGroup/Suspend.java
Plus mach5 tiers 1-3.
Thanks,
David
More information about the hotspot-runtime-dev
mailing list