RFR: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend [v4]

Chris Plummer cjplummer at openjdk.java.net
Thu Oct 14 18:36:52 UTC 2021


On Thu, 14 Oct 2021 07:44:36 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> Ok, so you need to hold threadLock from the time `blockOnDebuggerSuspend()` is done waiting on it until after `trackAppResume()` is done, and since `trackAppResume()` needs to grab the handlerLock, and you need to grab the handerLock before the threadLock, you need to jump through some lock grabbing/release hoops. However, you are in fact releasing the threadLock for a short time in `blockOnDebuggerSuspend()` after the wait completes. Doesn't this expose the resumee to potential suspending after the wait has completed and before `trackAppResume()` has been called?
>
>> Ok, so you need to hold threadLock from the time `blockOnDebuggerSuspend()` is done waiting on it until after `trackAppResume()` is done, and since `trackAppResume()` needs to grab the handlerLock, and you need to grab the handerLock before the threadLock, you need to jump through some lock grabbing/release hoops.
> 
> That's correct.
> 
>> However, you are in fact releasing the threadLock for a short time in `blockOnDebuggerSuspend()` after the wait completes. Doesn't this expose the resumee to potential suspending after the wait has completed and before `trackAppResume()` has been called?
> 
> That's correct too. I wouldn't see an issue with it though. I think this is even a preexisting condition. The current thread can lose the race grabbing threadLock after it was notified to the debugger trying to suspend again (if there wasn't the deadlock of course) and consequently suspendCount can (again) be greater than 0 right after the wait. In that case we simply retry.

Ok. So you just need to reacquire the threadLock before the `findThread()` call and before exiting the while loop, and hold it until after the `trackAppResume()` call. I guess it ok then. But this exiting of the handlerLock here and in `blockOnDebuggerSuspend()` is always going to arouse suspicion for anyone that reads the code. The first question will be if the lock does not need to be held here, why grab it in the first place? We know the answer is this lock ordering issue that turns up when `trackAppResume()` is later called, but this will be far from obvious to the reader. It might be possible to make this a bit clearer with some code restructuring, like maybe combining `blockOnDebuggerSuspend()` and `trackAppResume()` into one function (I'm not sure that this will actually help, but maybe something to consider), but at the very least we need some comments calling out why handlerLock is held in the first place and why it is ok to release it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5849


More information about the serviceability-dev mailing list