RFR: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend [v7]
Richard Reingruber
rrich at openjdk.java.net
Wed Oct 20 07:13:08 UTC 2021
On Tue, 19 Oct 2021 20:21:59 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:
>> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
>>
>> Call blockOnDebuggerSuspend() after setup of the resumer tracking.
>
> src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 772:
>
>> 770: debugMonitorExit(threadLock);
>> 771: eventHandler_lock(); /* for proper lock order */
>> 772: debugMonitorEnter(threadLock);
>
> Somewhere we need to mention that `trackAppResume()` exits with threadLock still being held, although with my recommended changes below this would no longer be the case.
If we cannot release `threadLock` before calling `doPendingTasks()` then I would suggest to comment that the caller of `trackAppResume()` is expected to hold `threadLock` and that it will be temporarily released.
> src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 807:
>
>> 805: }
>> 806:
>> 807: eventHandler_unlock();
>
> This is still a bit odd looking because we grabbed this lock for lock ordering purposes, but are now releasing it out of order. But it's starting to sink in with me that the root of our problem here is that lock order dictates grabbing handlerLock first, and we need to eventually wait on threadLock but with handlerLock released, which suggest the lock ordering is reversed. There is probably some design bug here that results in that, but I'd hate to mess with the ordering of these two locks to try to fix it. Maybe a 3rd lock would help (wait on some new lock instead of threadLock). We could grab that lock first in `doPendingTasks()`, and not have to exit `trackAppResume()` with threadLock held, but again this might be more risk than it is worth.
>
> So it we aren't going to change the lock ordering or introduce a 3rd lock, then my suggestion would be (after making the above suggested change to release threadLock before calling `threadControl_onEventHandlerExit()`) to move the locking and unlocking into `doPendingTasks()`. At the start of the `node->handlingAppResume` block, grab handlerLock and threadLock, and explain that handlerLock is being grabbed because trackAppResume() will (indirectly) try to grab it, and grabbing it before threadLock is necessary for lock ordering. Then grab threadLock and hold onto it until the end of the block. Between the `trackAppResume()` and `blockOnDebuggerSuspend()` calls, release handlerLock and explain that it will not be needed anymore, and has to be released before calling `blockOnDebuggerSuspend()`.
To me it does not look odd. I'd think it is ok to release locks out of order.
IMHO adding a new third lock to be waited on in `doPendingTasks()` would make things even more complicated. Accesses to `suspendCount` are synchronized through `threadLock`. Waiting on that lock for change of `suspendCount` is straightforward really.
-------------
PR: https://git.openjdk.java.net/jdk/pull/5849
More information about the serviceability-dev
mailing list