RFR (S): 8218446: SuspendAtExit hangs

David Holmes david.holmes at oracle.com
Tue Mar 19 00:03:12 UTC 2019


Hi Dan,

Thanks for looking at this.

On 19/03/2019 2:31 am, Daniel D. Daugherty wrote:
> On 3/18/19 1:42 AM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8218446
>> webrev: http://cr.openjdk.java.net/~dholmes/8218446/webrev/
> 
> src/hotspot/share/runtime/thread.cpp
>      Setting the thread state to _thread_blocked is only part of the
>      solution. You should also mark the thread as suspend equivalent:
> 
>                 // We mark this thread_blocked state as a 
> suspend-equivalent so
>                 // that a caller to is_ext_suspend_completed() won't be 
> confused.
>                 // The suspend-equivalent state is cleared by 
> java_suspend_self().
>                 set_suspend_equivalent();
> 
>      L2285:     set_thread_state(_thread_blocked);
> 
>      By doing that you'll hit this logic in is_ext_suspend_completed():
> 
>        if (save_state == _thread_blocked && is_suspend_equivalent()) {
>          // If the thread's state is _thread_blocked and this blocking
>          // condition is known to be equivalent to a suspend, then we can
>          // consider the thread to be externally suspended. This means that
>          // the code that sets _thread_blocked has been modified to do
>          // self-suspension if the blocking condition releases. We also
>          // used to check for CONDVAR_WAIT here, but that is now covered by
>          // the _thread_blocked with self-suspension check.
>          //
>          // Return true since we wouldn't be here unless there was still an
>          // external suspend request.
>          *bits |= 0x00001000;
>          return true;
> 
>      This will allow is_ext_suspend_completed() to return true without
>      waiting for java_suspend_self() to grab the SR_lock and complete
>      the self-suspension. Your change to _thread_blocked allowed you
>      to avoid the retry loop in is_ext_suspend_completed() which solved
>      the long delay problem. However, you can make it a little bit
>      faster by marking the thread as suspend equivalent also.

I agree this could make it a little bit faster in a particular race but 
I actually think the existing code that you copied above from 
JavaThread::check_safepoint_and_suspend_for_native_trans is "wrong". 
Semantically this is not a "suspend-equivalent" condition it is the 
actual suspend condition! You only set suspend-equivalent when blocking 
somewhere else, as per the comment you quote above:

      // If the thread's state is _thread_blocked and this blocking
      // condition is known to be equivalent to a suspend, then we can
      // consider the thread to be externally suspended. This means that
      // the code that sets _thread_blocked has been modified to do
      // self-suspension if the blocking condition releases.

Here the "blocking condition" _is_ the suspend so it's not going to 
release such that we then need to do self-suspension.

Calling set_suspend_equivalent() before calling java_suspend_self() is 
semantically pointless given that java_suspend_self() will clear it 
again. So the code you quoted that allows us to take a quick exit in 
is_ext_suspend_completed() will only apply for the few instructions 
between setting the state to _thread_blocked and clearing the 
suspend-equivalent bit. (Note that normally** in this case the SR_lock 
will be free when the suspendee needs to grab it in java_suspend_self()).

** I'm ignoring multiple threads issuing t.suspend() at the same time as 
no real code does that.

That all said if you really want me to add this I will, as it makes the 
two sections of code consistent even if not necessary (and potentially 
confusing).

> More (history related) comments below...
> 
> 
>> This test, when run standalone, unintentionally exposed a long time 
>> bug in the Suspend/Resume protocol. There are lots of details in the 
>> bug report, but basically if you encountered a suspend request along 
>> this path:
>>
>> JavaThread::check_safepoint_and_suspend_for_native_trans
>>  -> SafepointMechanism::block_if_requested_slow
>>     -> Safepoint::block()  // unblocks after safepoint
>>       -> JavaThread::handle_special_runtime_exit_condition
>>
>> the thread is in _thread_in_native_trans state, and the code it 
>> executes explicitly leaves it in that state when calling 
>> java_suspend_self():
>>
>>     // Because thread is external suspended the safepoint code will count
>>     // thread as at a safepoint. This can be odd because we can be here
>>     // as _thread_in_Java which would normally transition to 
>> _thread_blocked
>>     // at a safepoint. We would like to mark the thread as 
>> _thread_blocked
>>     // before calling java_suspend_self like all other callers of it but
>>     // we must then observe proper safepoint protocol. (We can't leave
>>     // _thread_blocked with a safepoint in progress). However we can be
>>     // here as _thread_in_native_trans so we can't use a normal 
>> transition
>>     // constructor/destructor pair because they assert on that type of
>>     // transition. We could do something like:
>>     //
>>     // JavaThreadState state = thread_state();
>>     // set_thread_state(_thread_in_vm);
>>     // {
>>     //   ThreadBlockInVM tbivm(this);
>>     //   java_suspend_self()
>>     // }
>>     // set_thread_state(_thread_in_vm_trans);
>>     // if (safepoint) block;
>>     // set_thread_state(state);
>>     //
>>     // but that is pretty messy. Instead we just go with the way the
>>     // code has worked before and note that this is the only path to
>>     // java_suspend_self that doesn't put the thread in _thread_blocked
>>     // mode.
> 
> The above comment block has been bothering me while I have been
> tracking this bug... I finally went spelunking and found that
> it was added back in 2003 by this delta:
> 
> $ sp -r1.672 src/share/vm/runtime/thread.cpp
> src/share/vm/runtime/SCCS/s.thread.cpp:
> 
> D 1.672 03/05/28 12:27:17 sgoldman 1767 1764    00037/00027/03293
> MRs:
> COMMENTS:
> 4800175 -  remove StateSaver as they are obsolete. Removed deopt suspend 
> code.
> Have java_suspend_self restore the original thread state the thread had 
> on entry.
> 
> Based on reading that bug report and my email archive for that bug,
> that bug came up during my JVM/PI stress testing in my lab in
> Colorado. I suspect that we didn't see this particular timeout
> issue since we didn't have the SuspendAtExit test program until
> I added it in the Thread-SMR project.
> 
> 
>> unfortunately the thread that issues the suspend() is looping inside 
>> is_ext_suspend_completed() waiting for it to move out of the trans 
>> state (to _thread_blocked):
>>
>>       // We wait for the thread to transition to a more usable state.
>>       for (int i = 1; i <= SuspendRetryCount; i++) {
>> SR_lock()->wait(!Thread::current()->is_Java_thread(), i * delay);
>>         // check the actual thread state instead of what we saved above
>>         if (thread_state() != _thread_in_native_trans) {
>>           // the thread has transitioned to another thread state so
>>           // try all the checks (except this one) one more time.
>>           do_trans_retry = true;
>>           break;
>>         }
>>      }
>>
>> After ~6.375 seconds we will exit the loop regardless and then take 
>> the VM to a safepoint to "force" suspension of the target thread 
>> (which was actually suspended anyway). In the test we issue 
>> back-to-back suspend()/resume() up to 10000 times which means we can 
>> hit this 6+ second delay frequently (test augmentation showed delays 
>> of ~7000 seconds).
>>
>> The fix is quite simple: we put the thread in the _thread_blocked 
>> state exactly as we already do for the suspend path in 
>> JavaThread::check_special_condition_for_native_trans.
> 
> Very nice job chasing this down. Thanks!

Thanks. To be honest the failure mode make it blatantly obvious where 
the problem was. :)

Cheers,
David
-----

> Dan
> 
> 
>>
>> With that fix SuspendAt Exit no longer appears to hang.
>>
>> Additional testing: (all tests that use suspend() just for good 
>> measure) (in progress)
>>   - hotspot
>>     - vmtestbase/nsk/jdi
>>     - runtime/Thread/SuspendAtExit.java
>>     - runtime/handshake/HandshakeWalkSuspendExitTest.java
>>     - runtime/jni/terminatedThread/TestTerminatedThread.java
>>     - vmTestbase/nsk/jvmti/GetThreadState/thrstat002/
>>   - jdk
>>     - com/sun/jdi/PopAsynchronousTest.java
>>     - java/nio/channels/SocketChannel/SendUrgentData.java
>>     - java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java
>>     - java/lang/ThreadGroup/Suspend.java
>>
>> Plus mach5 tiers 1-3.
>>
>> Thanks,
>> David
> 


More information about the hotspot-runtime-dev mailing list