RFR: 8316397: StackTrace/Suspended/GetStackTraceSuspendedStressTest.java failed with: SingleStep event is NOT expected
Patricio Chilano Mateo
pchilanomate at openjdk.org
Tue Feb 11 19:23:17 UTC 2025
On Thu, 6 Feb 2025 10:45:29 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:
> The JVMTI functions `SuspendThread()`, `SuspendThreadList()` and `SuspendAllVirtualThreads()` use the runtime `HandshakeState::suspend()` function `SuspendThreadHandshake` class to suspend the `JavaThread` of a mounted virtual thread. They work under protection of the `JvmtiVTMSTransitionDisabler` to make the association of virtual thread with `JavaThread` stable. The function `HandshakeState::suspend()` creates an instance of`SuspendThreadHandshake`, executes it synchronously and then just returns. The `SuspendThreadHandshake:: do_thread()` in its order create an instance of the `ThreadSelfSuspensionHandshake` (which is a subclass of the `AsyncHandshakeClosure`) to force the handshakee's self-suspension asynchronously. The `HandshakeState::suspend()` does not wait for target thread real self-suspension, nor reaching a safe thread state that can be treated as a suspend-equivalent. This creates problems as the target virtual thread's activity can be observable after the JVMTI `Suspe
ndThread()` and others are returned. For instance, some `SingleStep` events can be posted.
> The fix is to wait in the `HandshakeState::suspend()` for the target handshakee to reach a safe thread state. This is done for the virtual thread case only. The suspension of normal platform threads remains the same.
>
> Testing:
> - Ran mach5 tiers 1-6
Now, I was able to reproduce the crash and found the problem. The target is being suspended while creating the JvmtiThreadState in `JvmtiExport::at_single_stepping_point()`. It is found in the `_thread_blocked` state due to blocking while trying to acquire `JvmtiThreadState_lock`. We never process a suspend handshake when coming out of the blocked state though, since we can be holding VM monitors, so the target can continue executing until the next transition to Java or transition out of native. The agent then enables single stepping notifications for the target, and the target reaches `JvmtiExport::post_single_step()` and posts the event before the notifications are disabled again. Seems this issue can happen with other events too, it’s just that we probably don't have tests for them.
I thought we can add a suspend check before making JVMTI callbacks. But although that would fix this issue, there is still always a race due to the `JvmtiJavaThreadEventTransition` object, since after switching to native a suspend request will succeed. So if the test would instead enable single stepping first (or some other event) and then suspend, we could still see the callback after suspending the target. Note that this last race can also happen with platform threads.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/23490#issuecomment-2651851154
More information about the hotspot-runtime-dev
mailing list