RFR (S): 8218446: SuspendAtExit hangs

David Holmes david.holmes at oracle.com
Mon Mar 18 05:42:51 UTC 2019


Bug: https://bugs.openjdk.java.net/browse/JDK-8218446
webrev: http://cr.openjdk.java.net/~dholmes/8218446/webrev/

This test, when run standalone, unintentionally exposed a long time bug 
in the Suspend/Resume protocol. There are lots of details in the bug 
report, but basically if you encountered a suspend request along this path:

JavaThread::check_safepoint_and_suspend_for_native_trans
  -> SafepointMechanism::block_if_requested_slow
     -> Safepoint::block()  // unblocks after safepoint
       -> JavaThread::handle_special_runtime_exit_condition

the thread is in _thread_in_native_trans state, and the code it executes 
explicitly leaves it in that state when calling java_suspend_self():

     // Because thread is external suspended the safepoint code will count
     // thread as at a safepoint. This can be odd because we can be here
     // as _thread_in_Java which would normally transition to 
_thread_blocked
     // at a safepoint. We would like to mark the thread as _thread_blocked
     // before calling java_suspend_self like all other callers of it but
     // we must then observe proper safepoint protocol. (We can't leave
     // _thread_blocked with a safepoint in progress). However we can be
     // here as _thread_in_native_trans so we can't use a normal transition
     // constructor/destructor pair because they assert on that type of
     // transition. We could do something like:
     //
     // JavaThreadState state = thread_state();
     // set_thread_state(_thread_in_vm);
     // {
     //   ThreadBlockInVM tbivm(this);
     //   java_suspend_self()
     // }
     // set_thread_state(_thread_in_vm_trans);
     // if (safepoint) block;
     // set_thread_state(state);
     //
     // but that is pretty messy. Instead we just go with the way the
     // code has worked before and note that this is the only path to
     // java_suspend_self that doesn't put the thread in _thread_blocked
     // mode.

unfortunately the thread that issues the suspend() is looping inside 
is_ext_suspend_completed() waiting for it to move out of the trans state 
(to _thread_blocked):

       // We wait for the thread to transition to a more usable state.
       for (int i = 1; i <= SuspendRetryCount; i++) {
         SR_lock()->wait(!Thread::current()->is_Java_thread(), i * delay);
         // check the actual thread state instead of what we saved above
         if (thread_state() != _thread_in_native_trans) {
           // the thread has transitioned to another thread state so
           // try all the checks (except this one) one more time.
           do_trans_retry = true;
           break;
         }
      }

After ~6.375 seconds we will exit the loop regardless and then take the 
VM to a safepoint to "force" suspension of the target thread (which was 
actually suspended anyway). In the test we issue back-to-back 
suspend()/resume() up to 10000 times which means we can hit this 6+ 
second delay frequently (test augmentation showed delays of ~7000 seconds).

The fix is quite simple: we put the thread in the _thread_blocked state 
exactly as we already do for the suspend path in 
JavaThread::check_special_condition_for_native_trans.

With that fix SuspendAt Exit no longer appears to hang.

Additional testing: (all tests that use suspend() just for good measure) 
(in progress)
   - hotspot
     - vmtestbase/nsk/jdi
     - runtime/Thread/SuspendAtExit.java
     - runtime/handshake/HandshakeWalkSuspendExitTest.java
     - runtime/jni/terminatedThread/TestTerminatedThread.java
     - vmTestbase/nsk/jvmti/GetThreadState/thrstat002/
   - jdk
     - com/sun/jdi/PopAsynchronousTest.java
     - java/nio/channels/SocketChannel/SendUrgentData.java
     - java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java
     - java/lang/ThreadGroup/Suspend.java

Plus mach5 tiers 1-3.

Thanks,
David


More information about the hotspot-runtime-dev mailing list