RFR: 8072701: resume001 failed due to ERROR: timeout for waiting for a BreakpintEvent

Chris Plummer cjplummer at openjdk.org
Wed Jul 10 20:07:46 UTC 2024


On Wed, 10 Jul 2024 06:46:56 GMT, Chris Plummer <cjplummer at openjdk.org> wrote:

>> test/hotspot/jtreg/vmTestbase/nsk/jdi/ThreadReference/resume/resume001.java line 356:
>> 
>>> 354:                 }
>>> 355: 
>>> 356:                 // We need to resume the main thread because thread2 might be blocked on it,
>> 
>> This does not look correct to me.
>> This is the last test scenario - thread2.resume should resumes the thread while vm is suspended.
>> thread2 should not be blocked on main thread.
>> Looking at the debuggee I suppose the blocking is possible during logging. I'd suggest to update the debugee and remove any logging between breakpoints 2 and 3
>
>> This does not look correct to me.
>> This is the last test scenario - thread2.resume should resumes the thread while vm is suspended.
>> thread2 should not be blocked on main thread.
>> Looking at the debuggee I suppose the blocking is possible during logging. I'd suggest to update the debugee and remove any logging between breakpoints 2 and 3
> 
> It looks like the debuggee gets as far as the following:
> 
> 
>     public void runt2() {
>         log("method 'runt2' enter 1");
>         i1++;
>         i2--;
>         log("method 'run2t' exit");
>         return;
>     }
> 
> 
> It prints the first log and hits a breakpoint setup on the 2nd line. The debugger resumes thread2 after this, but we never see the 2nd log. Whenever we see this failure, the following logs from the mainThread are always delayed (by a lot):
> 
> 
> debugee.stderr> **> mainThread: mainThread is out of: synchronized (lockingObject) {
> debugee.stderr> **> mainThread: waiting for an instruction from the debugger ...
> 
> 
> I think this delay is resulting in the the mainThread being in the middle of doing one of these logs when the vm.suspend() is done. This leaves mainThread suspended while holding a lock needed for doing logging (logging is just a simple System.err.prinln()). I'm trying to prove this by getting a debuggee thread dump when getting the 3rd Breakpoint event times out, but for some reason once I added this code I could no longer reproduce the problem (still trying though).
> 
> I don't like the idea of avoiding this issue by getting rid of certain problematic logging. It seems error prone. Someone could add some new logging in the future. I'll see if there is a better solution than the vm.resume(). Perhaps I could just resume mainThread. However, I think with virtual threads I/O can be dependent on other threads like an "unparker" thread.
> 
> Another solution might be to have the debugger and debuggee do an additional handshake so we can guarantee that mainThread is done with these two log statements. Currently when we get to the 2nd log statement, that just means the debuggee is waiting for a "quit" command from the debugger. We could at this point have the debuggee send a command to the debugger, and have the debugger wait for this command before it does the vm.suspend().

I was finally able to reproduce the issue with the stack dumping support in place. mainThread is in the middle of printing the 1st of the two logs mention above. mainThead is suspended and is holding a println related lock, and thread2 is stuck on the 2nd log call in runt2 awaiting the same lock.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20088#discussion_r1672682495


More information about the serviceability-dev mailing list