RFR: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend [v7]

Chris Plummer cjplummer at openjdk.java.net
Tue Oct 19 20:47:55 UTC 2021


On Mon, 18 Oct 2021 20:02:21 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> This change fixes deadlocks described in the JBS-bug by:
>> 
>> * Releasing `handlerLock` before waiting on `threadLock` in `blockOnDebuggerSuspend()`
>> 
>> * Notifying on `threadLock` in `threadControl_reset()`
>> 
>> Also the actions in handleAppResumeBreakpoint() are moved/deferred until
>> doPendingTasks() runs. This is necessary because an event handler must not
>> release handlerLock first and foremost because handlers are called while
>> iterating the handler chain for an event type which is protected by handlerLock
>> (see https://github.com/openjdk/jdk/pull/5805)
>> 
>> The first commit delays the cleanup actions after leaving the loop in
>> `debugLoop_run()`. It allows to reproduce the deadlock running the dispose003
>> test with the command
>> 
>> 
>> make run-test TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003
>> 
>> 
>> The second commit adds a new test that reproduces the deadlock when calling
>> threadControl_resumeThread() while a thread is waiting in
>> blockOnDebuggerSuspend().
>> 
>> The third commit contains the fix described above. With it the deadlocks
>> cannot be reproduced anymore.
>> 
>> The forth commit removes the diagnostic code introduced with the first commit again.
>> 
>> The fix passed
>> 
>> test/hotspot/jtreg/serviceability/jdwp
>> test/jdk/com/sun/jdi
>> test/hotspot/jtreg/vmTestbase/nsk/jdwp
>> test/hotspot/jtreg/vmTestbase/nsk/jdi
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Call blockOnDebuggerSuspend() after setup of the resumer tracking.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 770:

> 768:      * handlerLock which has to be acquired before threadLock.
> 769:      */
> 770:     debugMonitorExit(threadLock);

I think we can avoid the exit here. threadLock was grabbed in `threadControl_onEventHandlerExit()`. It probably should be released before calling `doPendingTasks()`. My suggestion would be to first release it right after the `findThread()` call, and then in the `ei == EI_THREAD_END` block grab it again at the start of the block and release at the end of the block.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 771:

> 769:      */
> 770:     debugMonitorExit(threadLock);
> 771:     eventHandler_lock(); /* for proper lock order */

"for proper lock order during eventHandler_createInternalThreadOnly() calls"

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 772:

> 770:     debugMonitorExit(threadLock);
> 771:     eventHandler_lock(); /* for proper lock order */
> 772:     debugMonitorEnter(threadLock);

Somewhere we need to mention that `trackAppResume()` exits with threadLock still being held, although with my recommended changes below this would no longer be the case.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 807:

> 805:     }
> 806: 
> 807:     eventHandler_unlock();

This is still a bit odd looking because we grabbed this lock for lock ordering purposes, but are now releasing it out of order. But it's starting to sink in with me that the root of our problem here is that lock order dictates grabbing handlerLock first, and we need to eventually wait on threadLock but with handlerLock released, which suggest the lock ordering is reversed. There is probably some design bug here that results in that, but I'd hate to mess with the ordering of these two locks to try to fix it. Maybe a 3rd lock would help (wait on some new lock instead of threadLock). We could grab that lock first in `doPendingTasks()`, and not have to exit `trackAppResume()` with threadLock held, but again this might be more risk than it is worth.

So it we aren't going to change the lock ordering or introduce a 3rd lock, then my suggestion would be (after making the above suggested change to release threadLock before calling `threadControl_onEventHandlerExit()`) to move the locking and unlocking into `doPendingTasks()`. At the start of the `node->handlingAppResume` block, grab handlerLock and threadLock, and explain that handlerLock is being grabbed because trackAppResume() will (indirectly) try to grab it, and grabbing it before threadLock is necessary for lock ordering. Then grab threadLock and hold onto it until the end of the block. Between the `trackAppResume()` and `blockOnDebuggerSuspend()` calls, release handlerLock and explain that it will not be needed anymore, and has to be released before calling `blockOnDebuggerSuspend()`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5849


More information about the serviceability-dev mailing list