RFR: 8324868: debug agent does not properly handle interrupts of a virtual thread
Chris Plummer
cjplummer at openjdk.org
Fri Feb 23 19:11:13 UTC 2024
On Fri, 23 Feb 2024 19:06:59 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:
> Fix numerous debug agent issues and testing deficiencies related to Thread.interrupt(). As a first read you should look at the description in the CR. More details in first comment below.
>
> Also, this PR is dependent on the fix for [JDK-8325187](https://bugs.openjdk.org/browse/JDK-8325187). The test might generate GHA failures in the meantime.
>
> Tested with all svc tier1, tier2, and tier5 tests.
If Thread.interrupt() is called on a thread that is currently in the debug agent handling a JVMTI event, it can result in a RawMonitorWait() call returning with JVMTI_ERROR_INTERRUPT. This can also happen if the interrupt was already pending when the JVMTI event came in. When this happens the debug agent sets the ThreadNode->pendingInterrupt flag and re-enters the RawMonitorWait(). When the debug agent is done dealing with the event, it calls threadControl_onEventHandlerExit(), which calls doPendingTasks(), which will call JVMTI InterruptThread() to reset the interrupt status of the thread. There were a few bugs in this code related to virtual threads.
1st issue is that the call to JVMTI InterruptThread() resulted in an upcall to the java Thread.interrupt() while while holding threadLock. Problems arise when during this upcall the debug agent is re-entered due to an event. The case I ran into was due to single stepping, which was enabled by the InterruptHangTest.java. When the debug agent is re-entered, it grabs the handlerLock. This creates a deadlock potential because it already holds the threadLock, and should be grabbing the handlerLock before the threadLock. An indeed a deadlock did happen because another thread had already grabbed the handlerLock and was blocked trying to grab the threadLock.
My first attempt to fix this was to simply make threadControl_onEventHandlerExit() always grab both the threadLock and the handlerLock. Therefore when the debug agent was re-entered during the InterruptThread() call, it would already have both locks and not have a locking order issue. This seemed to work, but I didn't like the idea of the debug agent being re-entered with locks already held, so I reworked the code to no longer need the locks held during the InterruptThread() call.
A 2nd issue was found in threadControl_interrupt(). It is called when the ThreadReference.Interrupt JDWP command is issued. If the thread is currently handling an event, it is suppose to set ThreadNode->pendingInterrupt so the interrupt can be raised after handling the event is complete. However, it was calling findThread(&runningThreads), which means it would always fail to find a virtual thread. This ended up not really being an issue, because it would just call InterruptThread() immediatly as a result. As it turns out this is ok even if the thread is already handling an event because as described above, the debug agent already has code in place for this (the RawMonitorWait() retry code). So rather than fix the findThread() call I just made it so this code now always calls InterruptThread(), even if the thread is currently handling an event. Note this is tested by the new InterruptHangTest.java "remote" mode, which uses ThreadReference.interrupt().
A 3rd issue was in threadControl_currentThread(), which is used by handleInterrupt() to decide if threadControl_setPendingInterrupt() should be called to setup the deferred resetting of the interrupt. The problem was that threadControl_currentThread() was using findThread(&runningThreads), which won't find virtual threads. As a result of this, threadControl_setPendingInterrupt() was never being called for virtual threads, so the interrupted was lost. It could have been fixed by using findRunningThread(), but I decided to just instead use JVMTI to get the current thread.
A 4th issue was threadControl_setPendingInterrupt(), which actually has multiple issues itself. The first is the comment, which is dated and no longer correct, so I removed it. Related to that is the assert, which I wasn't sure why it wasn't being triggered for virtual threads. The threadControl_currentThread() issue above ended up being the reason. Once that was fixed, the assert was triggered and needed to be removed. The 3rd is another incorrect call to findThread(&runningThreads), which won't find virtual threads. This was fixed to instead use findRunningThread(), which will find virtual threads. I also removed the check for a NULL node and instead added an assert since it should always be found.
The test has been reworked to have 3 modes now and do a much better job of testing. Read the long comment in the test for details.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17989#issuecomment-1961849739
More information about the serviceability-dev
mailing list