RFR: 8274687: JDWP can deadlock VM if Java thread waits in blockOnDebuggerSuspend
Chris Plummer
cjplummer at openjdk.java.net
Tue Oct 5 20:32:07 UTC 2021
On Tue, 5 Oct 2021 13:54:46 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
>> src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 750:
>>
>>> 748: while (node && node->suspendCount > 0) {
>>> 749: /* Resume requires the event handlerLock so we have to release it */
>>> 750: eventHandler_unlock();
>>
>> I'm not so sure this is always safe. It might be fine in the context of resetting the connection, but not during normal debug agent operations. It allows for another event to be processed when the lock is suppose to keep event processing serialized. What happens for example if we hit the `Thread.resume()` breakpoint again on a different thread?
>>
>> It might be best to limit doing this only when `threadControl_reset()` is currently executing (you could set a flag there), although it seems more like a band aid than a proper fix. I could imagine there might still be scenarios were releasing the lock during reset might be problematic, although probably extremely unlikely to ever be noticed.
>
>> I'm not so sure this is always safe. It might be fine in the context of
>> resetting the connection, but not during normal debug agent operations. It
>> allows for another event to be processed when the lock is suppose to keep
>> event processing serialized. What happens for example if we hit the
>> `Thread.resume()` breakpoint again on a different thread?
>
> I agree that this is probably not safe. E.g. when releasing handlerLock the
> handler chain for EI_BREAKPOINT could be modified which is being iterated in
> the caller function event_callback().
>
> https://github.com/openjdk/jdk/blob/8609ea55acdcc203408f58f7bf96ea9228aef613/src/jdk.jdwp.agent/share/native/libjdwp/eventHandler.c#L647-L673
>
> I should have checked before.
>
>> It might be best to limit doing this only when `threadControl_reset()` is
>> currently executing (you could set a flag there), although it seems more like
>> a band aid than a proper fix. I could imagine there might still be scenarios
>> were releasing the lock during reset might be problematic, although probably
>> extremely unlikely to ever be noticed.
>
> I looked at threadControl_resumeThread(). It appears to me that resuming a
> thread is not possible while some thread is waiting in blockOnDebuggerSuspend()
> because threadControl_resumeThread() locks handlerLock.
>
> https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c#L1485
>
> Am I missing something?
>
> Maybe the code in handleAppResumeBreakpoint() could moved to doPendingTasks()?
Regarding threadControl_resumeThread() it does appear that it would block, as would threadControl_resumeAll(), which seems problematic in that blockOnDebuggerSuspend() won't exit until the suspendCount == 0. So it's unclear to me how this is suppose to work if a resume() can't be done.
-------------
PR: https://git.openjdk.java.net/jdk/pull/5805
More information about the serviceability-dev
mailing list