RFR: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend [v4]

Chris Plummer cjplummer at openjdk.java.net
Wed Oct 13 06:47:51 UTC 2021


On Tue, 12 Oct 2021 08:03:22 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> This change fixes deadlocks described in the JBS-bug by:
>> 
>> * Releasing `handlerLock` before waiting on `threadLock` in `blockOnDebuggerSuspend()`
>> 
>> * Notifying on `threadLock` in `threadControl_reset()`
>> 
>> Also the actions in handleAppResumeBreakpoint() are moved/deferred until
>> doPendingTasks() runs. This is necessary because an event handler must not
>> release handlerLock first and foremost because handlers are called while
>> iterating the handler chain for an event type which is protected by handlerLock
>> (see https://github.com/openjdk/jdk/pull/5805)
>> 
>> The first commit delays the cleanup actions after leaving the loop in
>> `debugLoop_run()`. It allows to reproduce the deadlock running the dispose003
>> test with the command
>> 
>> 
>> make run-test TEST=test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003
>> 
>> 
>> The second commit adds a new test that reproduces the deadlock when calling
>> threadControl_resumeThread() while a thread is waiting in
>> blockOnDebuggerSuspend().
>> 
>> The third commit contains the fix described above. With it the deadlocks
>> cannot be reproduced anymore.
>> 
>> The forth commit removes the diagnostic code introduced with the first commit again.
>> 
>> The fix passed
>> 
>> test/hotspot/jtreg/serviceability/jdwp
>> test/jdk/com/sun/jdi
>> test/hotspot/jtreg/vmTestbase/nsk/jdwp
>> test/hotspot/jtreg/vmTestbase/nsk/jdi
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Improve @summary section of test.

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 2200:

> 2198:         /* trackAppResume() needs handlerLock */
> 2199:         debugMonitorExit(threadLock);
> 2200:         eventHandler_lock(); /* for proper lock order */

Is it still necessary for the handlerLock to be held when calling `blockOnDebuggerSuspend()` (presuming you don't also add the new handlerLock related code in it)? It seems that only the threadLock is needed so it can then wait on it.

The main thing you've done to fix this issue is defer the `blockOnDebuggerSuspend()` to be done after `event_callback()` has released the handlerLock, and to make it so `blockOnDebuggerSuspend()` does not block while holding the handlerLock, so is there really any reason to be grabbing it at all?

src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c line 2220:

> 2218:              * suspends a thread it will remain suspended.
> 2219:              */
> 2220:             trackAppResume(resumer);

`trackAppResume()` doesn't really need the handlerLock. However, it will grab it when calling `eventHandler_createInternalThreadOnly()`. Since you want to make sure it is grabbed before threadLock in order to preserve lock ordering, that complicates things if we decided not to grab the handlerLock in the code above. What I'm now thinking is all that is really needed is to hold the threadLock around the call to `blockOnDebuggerSuspend()`, or better yet just grab it from within `blockOnDebuggerSuspend()`. You probably don't need to do anything with handlerLock or threadLock inside of `doPendingTasks()`.

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 81:

> 79:         System.out.println();
> 80:         System.out.println("###(Target,"+ threadName +") " + m);
> 81:         System.out.println();

I'm not sure why you have the two extra `println()` calls. Seems to just add unneeded blank lines in the log file.

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 100:

> 98:         // "resumee" is suspended now because of the breakpoint
> 99:         // Calling Thread.resume() will block this thread.
> 100: 

no need for empty line here

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 159:

> 157:         ThreadReference resumee = bpe.thread();
> 158:         ObjectReference resumeeThreadObj = resumee.frame(1).thisObject();
> 159:         printStack(resumee);

As long as you're printing stacks, I think the stack of the main thread would be useful here, but you need to suspend it first. I don't think that interferes with the test.
``` 
        mainThread.suspend();
        printStack(mainThread);
        mainThread.resume();

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 165:

> 163:         setField(resumeeThreadObj, "reachedBreakpoint", vm().mirrorOf(true));
> 164: 
> 165:         log("Sleeping 500ms shows that the main thread is blocked calling Thread.resume() on \"resumee\" Thread.");

"shows" -> "so"

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 167:

> 165:         log("Sleeping 500ms shows that the main thread is blocked calling Thread.resume() on \"resumee\" Thread.");
> 166:         Thread.sleep(500);
> 167:         log("After sleep.");

And after line 167 is also a good place to print the main thread's stack.

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 176:

> 174:             mainThreadReturnedFromResumeCall = ((PrimitiveValue) v).booleanValue();
> 175:             if (!resumedResumee) {
> 176:                 // main thread should be still blocked.

"...should still be blocked"

test/jdk/com/sun/jdi/ResumeAfterThreadResumeCallTest.java line 222:

> 220:         System.out.println();
> 221:         System.out.println("###(Debugger) " + m);
> 222:         System.out.println();

Same here with the extra `printlin()` calls

-------------

PR: https://git.openjdk.java.net/jdk/pull/5849


More information about the serviceability-dev mailing list