RFR: 8316397: StackTrace/Suspended/GetStackTraceSuspendedStressTest.java failed with: SingleStep event is NOT expected [v4]

Serguei Spitsyn sspitsyn at openjdk.org
Thu Apr 3 10:15:55 UTC 2025


On Thu, 6 Mar 2025 00:39:53 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

>> I don't think we should block suspend requests in all cases of being `_thread_blocked`, but `get_jvmti_thread_state` is a special case. We either need to have a general suspension suppression mechanism (which I thought already existed - but may it was in old code), or we need to special-case the acquisition of the `Thread_State_lock`.
>
>> Let me see if I have the sequence of events correct:
>>  - we have a thread leaving Java to enter the VM, in this case to post a single-step event, which is not suspended
>>   - when trying to acquire the JvmtiThreadState_lock the thread blocks and becomes handshake/safepoint-safe and is then suspended
>>   - the thread acquires the lock and goes to post the event but because it has now been suspended it is an error to do so
> 
> This is almost right but let me add a couple of details.
> A suspend request was processed/registered/accepted at the handshake/safepoint-safe point but physical suspension in the `ThreadSelfSuspensionHandshake` has not happened yet which created a problem. Also, the problem is not that the thread goes to post the event. Suspension and event posting are racy by design. The problem is that the `SingleStep` events for this thread are also enabled at the handshake/safepoint-safe point after the suspend request was processed/registered/accepted.
> 
>> So what we need is a way to disable suspension so that the transient/temporary switch to _thread_blocked does not allow a suspend request to be processed. I think that is preferable to trying to find where to add a place to check for suspension and actually suspend it.
> 
> I'm also thinking toward this direction. The best way to fix this would be to disallow processing/accepting new suspend requests in `MutexLocker mu(JvmtiThreadState_lock)` which is executed in context of `get_jvmti_thread_state()` call:
> 
> inline JvmtiThreadState* JvmtiThreadState::state_for(JavaThread *thread, Handle thread_handle) {
>   // In a case of unmounted virtual thread the thread can be null.
>   JvmtiThreadState* state = thread_handle == nullptr ? thread->jvmti_thread_state() :
>                                                 java_lang_Thread::jvmti_thread_state(thread_handle());
>   if (state == nullptr) {
>     MutexLocker mu(JvmtiThreadState_lock);
>   . . .
> 
> Unfortunately, it looks like it is tricky and risky to do.

David, we had an internal Zoom meeting with Patricio and agreed on a couple of points (I hope, Patricio will fix me if needed).
#1:
The main (biggest) problem is the call to `JvmtiEventController::thread_started()` in the function `get_jvmti_thread_state()`. The function `JvmtiThreadState::state_for()` is called there if a mounted virtual thread is the current thread and any thread-filtered event is enabled globally. The function `JvmtiThreadState::state_for()` has this line: `MutexLocker mu(JvmtiThreadState_lock);`

The `JvmtiThreadState_lock` allows safepointing. It is where a JVMTI suspend request for this thread has been processed/registered. The `SingleStep` events for this thread are also enabled after the suspend request was processed/registered. However, suspends are not allowed by this `MutexLocker`.
This is very common code path for any thread-filtered event (e.g. the event can be enabled for a specific thread).

#2:
This PR fixes this exact issue with this update:

JvmtiThreadState* JvmtiExport::get_jvmti_thread_state(JavaThread *thread) {
  assert(thread == JavaThread::current(), "must be current thread");
  if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) {
    JvmtiEventController::thread_started(thread);
+    if (thread->is_suspended()) {
+      // suspend here if there is a suspend request
+      ThreadBlockInVM tbivm(thread);
+    }
  }
  . . .


But we also agreed to add a tweak for the function `post_dynamic_code_generated_while_holding_locks()` to disallow suspension in this context. Then it will look like this:

+JvmtiThreadState* JvmtiExport::get_jvmti_thread_state(JavaThread *thread, bool allow_suspend) {
  assert(thread == JavaThread::current(), "must be current thread");
  if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) {
    JvmtiEventController::thread_started(thread);
+    if (allow_suspend && thread->is_suspended()) {
+      // suspend here if there is a suspend request
+      ThreadBlockInVM tbivm(thread, true /* allow suspend */);
+    }
  }
  . . .

void JvmtiExport::post_dynamic_code_generated_while_holding_locks(const char* name,
                                                                  address code_begin, address code_end)
{
  . . .
  JvmtiThreadState *state = get_jvmti_thread_state(thread, false /* allow_suspend */);
  . . .


I have already pushed the minor tweak above.

#3:
We also agreed to investigate more case where a similar situation can occur withing a context/scope of all the`JvmtiExport::post_*` function calls.  One interesting case was found:

bool InstanceKlass::link_class_impl(TRAPS) {
  . . .
      if (Universe::is_fully_initialized()) {
        DeoptimizationScope deopt_scope;
        {
          // Now mark all code that assumes the class is not linked.
          // Set state under the Compile_lock also.
          MutexLocker ml(THREAD, Compile_lock);                                         <== !!!

          set_init_state(linked);
          CodeCache::mark_dependents_on(&deopt_scope, this);
        }
        // Perform the deopt handshake outside Compile_lock.
        deopt_scope.deoptimize_marked();
      } else {
        set_init_state(linked);
      }
      if (JvmtiExport::should_post_class_prepare()) {
        JvmtiExport::post_class_prepare(THREAD, this);                            <== !!!
      }
    }
  }
  return true;
}

It feels like more thoughts are needed here.
I'm thinking it would be okay to file a separate bug for this additional investigation.
Most likely we do not have a good test coverage for this, and so, no tests are failing but we may want to add it in the future. At the moment, I'm not sure of it is worth it or not.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/23490#discussion_r2026660121


More information about the hotspot-runtime-dev mailing list