RFR: 8316397: StackTrace/Suspended/GetStackTraceSuspendedStressTest.java failed with: SingleStep event is NOT expected
Serguei Spitsyn
sspitsyn at openjdk.org
Wed Feb 12 09:30:10 UTC 2025
On Mon, 10 Feb 2025 05:11:41 GMT, David Holmes <dholmes at openjdk.org> wrote:
>> The JVMTI functions `SuspendThread()`, `SuspendThreadList()` and `SuspendAllVirtualThreads()` use the runtime `HandshakeState::suspend()` function `SuspendThreadHandshake` class to suspend the `JavaThread` of a mounted virtual thread. They work under protection of the `JvmtiVTMSTransitionDisabler` to make the association of virtual thread with `JavaThread` stable. The function `HandshakeState::suspend()` creates an instance of`SuspendThreadHandshake`, executes it synchronously and then just returns. The `SuspendThreadHandshake:: do_thread()` in its order create an instance of the `ThreadSelfSuspensionHandshake` (which is a subclass of the `AsyncHandshakeClosure`) to force the handshakee's self-suspension asynchronously. The `HandshakeState::suspend()` does not wait for target thread real self-suspension, nor reaching a safe thread state that can be treated as a suspend-equivalent. This creates problems as the target virtual thread's activity can be observable after the JVMTI `Susp
endThread()` and others are returned. For instance, some `SingleStep` events can be posted.
>> The fix is to wait in the `HandshakeState::suspend()` for the target handshakee to reach a safe thread state. This is done for the virtual thread case only. The suspension of normal platform threads remains the same.
>>
>> Testing:
>> - Ran mach5 tiers 1-6
>
> src/hotspot/share/runtime/handshake.cpp line 804:
>
>> 802: MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
>> 803: _lock.wait_without_safepoint_check(1);
>> 804: }
>
> This would normally be incorrectly coded as you do not hold the lock around the state-change and so you may miss the notification. However, it is possible in this case that the overall handshake protocol prevents that from happening, but I cannot easily determine that.
Thank you for the comment, David.
You are right. It is why waiting is with the timeout: `_lock.wait_without_safepoint_check(1);`
But this is not fully correct either.
I see, Patricio also disagreed with my hack.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/23490#discussion_r1952280789
More information about the hotspot-runtime-dev
mailing list