RFR: 8311218: fatal error: stuck in JvmtiVTMSTransitionDisabler::VTMS_transition_disable [v3]
Leonid Mesnik
lmesnik at openjdk.org
Wed Dec 13 00:04:44 UTC 2023
On Fri, 8 Dec 2023 11:54:40 GMT, Serguei Spitsyn <sspitsyn at openjdk.org> wrote:
>> This fix is for JDK 23 but the intention is to back port it to 22 in RDP-1 time frame.
>> It is fixing a deadlock issue between `VirtualThread` class critical sections with the `interruptLock` (in methods: `unpark()`, `interrupt()`, `getAndClearInterrupt()`, `threadState()`, `toString()`), `JvmtiVTMSTransitionDisabler` and JVMTI `Suspend/Resume` mechanisms.
>> The deadlocking scenario is well described by Patricio in a bug report comment.
>> In simple words, a virtual thread should not be suspended during 'interruptLock' critical sections.
>>
>> The fix is to record that a virtual thread is in a critical section (`JavaThread`'s `_in_critical_section` bit) by notifying the VM/JVMTI about begin/end of critical section.
>> This bit is used in `HandshakeState::get_op_for_self()` to filter out any `HandshakeOperation` if a target `JavaThread` is in a critical section.
>>
>> Some of new notifications with `notifyJvmtiSync()` method is on a performance critical path. It is why this method has been intrincified.
>>
>> New test was developed by Patricio:
>> `test/hotspot/jtreg/serviceability/jvmti/vthread/SuspendWithInterruptLock`
>> The test is very nice as it reliably in 100% reproduces the deadlock without the fix.
>> The test is never failing with this fix.
>>
>> Testing:
>> - tested with newly added test: `test/hotspot/jtreg/serviceability/jvmti/vthread/SuspendWithInterruptLock`
>> - tested with mach5 tiers 1-6
>
> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision:
>
> review: (1) rename notifyJvmti method; (2) add try-final statements to VirtualThread methods
Changes requested by lmesnik (Reviewer).
src/hotspot/share/prims/jvm.cpp line 4013:
> 4011: // Notification from VirtualThread about entering/exiting sync critical section.
> 4012: // Needed to avoid deadlocks with JVMTI suspend mechanism.
> 4013: JVM_ENTRY(void, JVM_VirtualThreadCriticalLock(JNIEnv* env, jobject vthread, jboolean enter))
the jobject vthread is not used. Can't be the method made static to reduce the number of arguments?
It is the performance-critical code, I don't know if it is optimized by C2.
src/hotspot/share/runtime/javaThread.hpp line 320:
> 318: bool _is_in_VTMS_transition; // thread is in virtual thread mount state transition
> 319: bool _is_in_tmp_VTMS_transition; // thread is in temporary virtual thread mount state transition
> 320: bool _is_in_critical_section; // thread is in a locking critical section
might make sense to add a comment, that his variable Is changed/read only by current thread and no sync is needed.
src/java.base/share/classes/java/lang/VirtualThread.java line 1164:
> 1162:
> 1163: @IntrinsicCandidate
> 1164: private native void notifyJvmtiCriticalLock(boolean enter);
The name is confusing to me, the CriticalLock looks like it is the section is critical and might be taken by a single thread only. Or it's just unclear what is critical here.
However, the purpose is to disable suspend
Wouldn't be 'notifyJvmtiSuspendLock notifyJvmtiDisableSuspend' better name here?
or comment what critical means here.
test/hotspot/jtreg/serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java line 30:
> 28: * @requires vm.continuations
> 29: * @library /testlibrary
> 30: * @run main/othervm -Xint SuspendWithInterruptLock
Doesn't it make sense to add a testcase without -Xint also? Just to give stress testing with compilation.
test/hotspot/jtreg/serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java line 36:
> 34:
> 35: public class SuspendWithInterruptLock {
> 36: static boolean done;
done is accessed from different threads, should be volatile.
test/hotspot/jtreg/serviceability/jvmti/vthread/SuspendWithInterruptLock/SuspendWithInterruptLock.java line 54:
> 52: Thread.yield();
> 53: }
> 54: done = true;
I think it is better to use done to stop all threads and set it to true in the main thread after some time. So you could be sure that the yielder hadn't been completed before the suspender started. But it is just proposal.
-------------
PR Review: https://git.openjdk.org/jdk/pull/17011#pullrequestreview-1778571090
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424694672
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424697179
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424687810
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424662055
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424663078
PR Review Comment: https://git.openjdk.org/jdk/pull/17011#discussion_r1424683585
More information about the serviceability-dev
mailing list