RFR: 8316397: StackTrace/Suspended/GetStackTraceSuspendedStressTest.java failed with: SingleStep event is NOT expected

Mon Feb 10 05:15:15 UTC 2025

On Thu, 6 Feb 2025 10:45:29 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:

> The JVMTI functions `SuspendThread()`, `SuspendThreadList()` and `SuspendAllVirtualThreads()` use the runtime `HandshakeState::suspend()` function `SuspendThreadHandshake` class to suspend the `JavaThread` of a mounted virtual thread. They work under protection of the `JvmtiVTMSTransitionDisabler` to make the association of virtual thread with `JavaThread` stable. The function `HandshakeState::suspend()` creates an instance of`SuspendThreadHandshake`, executes it synchronously and then just returns. The `SuspendThreadHandshake:: do_thread()` in its order create an instance of the `ThreadSelfSuspensionHandshake` (which is a subclass of the `AsyncHandshakeClosure`) to force the handshakee's self-suspension asynchronously. The `HandshakeState::suspend()` does not wait for target thread real self-suspension, nor reaching a safe thread state that can be treated as a suspend-equivalent. This creates problems as the target virtual thread's activity can be observable after the JVMTI `Suspe
 ndThread()` and others are returned. For instance, some `SingleStep` events can be posted.
> The fix is to wait in the `HandshakeState::suspend()` for the target handshakee to reach a safe thread state. This is done for the virtual thread case only. The suspension of normal platform threads remains the same.
> 
> Testing:
>  - Ran mach5 tiers 1-6

I am not at all sure about this. Why are virtual threads different to platform threads here? 

My recollection is that the handshake API deliberately does not wait for the suspension to occur, and that there is a separate mechanism to do that for code that needs it - in the old API we had `JvmtiEnv::is_thread_fully_suspended`

src/hotspot/share/runtime/handshake.cpp line 804:

> 802:         MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
> 803:         _lock.wait_without_safepoint_check(1);
> 804:       }

This would normally be incorrectly coded as you do not hold the lock around the state-change and so you may miss the notification. However, it is possible in this case that the overall handshake protocol prevents that from happening, but I cannot easily determine that.

-------------

PR Review: https://git.openjdk.org/jdk/pull/23490#pullrequestreview-2604703300
PR Review Comment: https://git.openjdk.org/jdk/pull/23490#discussion_r1948403444