RFR: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend [v7]

Chris Plummer cjplummer at openjdk.java.net
Thu Oct 21 03:37:09 UTC 2021


On Thu, 21 Oct 2021 03:24:40 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> But fields of ThreadNode "node" (aka "resumer") are read and also modified in `doPendingTasks()` and also in `threadControl_onEventHandlerExit()`. IMHO this needs to be synchronized with threadLock. At least it is not obvious that the synchronization is not needed.
>
> You already release threadLock temporarily in `trackAppResume()` and you used to release it temporarily in doPendingTasks(). I'm suggesting to instead release it in `threadControl_onEventHandlerExit()` before calling `doPendingTasks()` and then grab it again (after first grabbing the handlerLock) in `doPendingTasks()` before calling `trackAppResume()`. What can be modified in the ThreadNode during that time that you are concerned about? threadLock would still be held when `blockOnDebuggerSuspend()` is called.  The only interesting lines that are currently executed while holding the threadLock but no longer would be are:
> 
> 2193     if (node->handlingAppResume) {
> 2194         jthread resumer = node->thread;
> 2195         jthread resumee = getResumee(resumer);
> 
> I don't think any of those can change, because only the currently executing thread can change them (when the Thread.resume() breakpoint is hit).
> 
> The suspendCount is only meaningful when in `blockOnDebuggerSuspend()`, and we'll be under the threadLock by that point.

BTW, normally I would not suggest less locking simply because I *thought* it might not be needed. However, you already temporarily release and then regrab the lock. This means there is a period of time where the locking is not needed. What I've suggested is to better define (and expand) that period of time so the locking can be made cleaner. I think the actual expansion of time where the lock would no longer be held is pretty small, with a well defined set of code being executed that to me does not appear to require the lock, and if it did require the lock, then probably releasing the lock in the first place is a bug.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5849


More information about the serviceability-dev mailing list