RFR: JDK-4903717: nsk/jdi/ThreadReference/isSuspended/issuspended002 failing

Alex Menkov alexey.menkov at oracle.com
Fri Mar 1 22:56:47 UTC 2019



On 03/01/2019 04:15, Gary Adams wrote:
> I will move the logging statement up.
> 
> Leaving the "docontinue" in the synchronized block
> will guarantee the debugger suspends the debuggee
> main thread without any chance of interference
> with logging commands.
> 
> I'm sure this test was left with verbose logging enabled in the
> hopes of catching more details about the failing sequence of
> events. In this case it was the excessive logging that was causing
> the test to hang.
> 
> Now that the test can run reliably, I think we should turn off
> the verbose logging.

I think it would be better to keep the logging verbose to simplify 
analysis if the test fails in the future.

--alex

> 
> On 2/28/19, 4:44 PM, Alex Menkov wrote:
>> Hi Gary,
>>
>> Now "mainThread is out of: synchronized (lockingObject) {" record is 
>> logged at the end of the scenario and that makes it useless.
>>
>> I think it would be better to call
>> pipe.println("docontinue") after logging of this message:
>>
>>   log1("mainThread is out of: synchronized (lockingObject) {");
>> +  pipe.println("docontinue");
>> +  // wait until the scenario is completed to avoid deadlocks on logging
>> +  synchronized (Threadissuspended002a.waitnotifyObj) {
>> +    Threadissuspended002a.waitnotifyObj.wait();
>> +  }
>>
>> --alex
>>
>> On 02/27/2019 02:29, gary.adams at oracle.com wrote:
>>> Revised webrev : http://cr.openjdk.java.net/~gadams/4903717/webrev.02/
>>>
>>> To prevent the main thread from holding the logging lock when
>>> the debugger suspends the debuggee, an explicit wait/notify is used
>>> so the test thread can safely issue log messages while the main thread
>>> remains suspended.
>>>
>>> Testing is in progress.
>>>
>>> On 2/26/19 3:49 PM, gary.adams at oracle.com wrote:
>>>> New fix will be coming tomorrow...
>>>>
>>>> I dumped the main and thread2 stacks at the
>>>> point of the timeout and found they were
>>>> both contending for the java.io.PrintStream
>>>> writeln() synchronization block.
>>>>
>>>> While writing log messages main thread held the
>>>> lock and was suspended and thread2 was blocked
>>>> trying to report it's progress. So thread2 never reaches the
>>>> breakpoint.
>>>>
>>>> java.nio.Buffer:292 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.CharBuffer:1260 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.CharBuffer:265 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.Buffer:223 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.CharBuffer:284 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.HeapCharBuffer:77 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.nio.CharBuffer:396 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> sun.nio.cs.StreamEncoder:280 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> sun.nio.cs.StreamEncoder:125 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.io.OutputStreamWriter:211 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> java.io.BufferedWriter:120 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> *java.io.PrintStream:653 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)*
>>>> java.io.PrintStream:958 in thread instance of 
>>>> java.lang.Thread(name='main', id=1)
>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:49 in thread 
>>>> instance of java.lang.Thread(name='main', id=1)
>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:124 in thread 
>>>> instance of java.lang.Thread(name='main', id=1)
>>>>
>>>>
>>>> *java.io.PrintStream:650 in thread instance of 
>>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>>> id=589)*
>>>> java.io.PrintStream:958 in thread instance of 
>>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>>> id=589)
>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:53 in thread 
>>>> instance of 
>>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>>> id=589)
>>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a:203 in 
>>>> thread instance of 
>>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>>> id=589)
>>>>
>>>> On 2/22/19 11:21 PM, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Gary,
>>>>>
>>>>> Thank you for cosmetic changes.
>>>>>
>>>>> One thing to notice there is no guard against spurious wakeup for 
>>>>> this:
>>>>>    108 Threadissuspended002a.waitnotifyObj.wait();
>>>>>
>>>>> But I doubt it causes these timeouts as they are very well 
>>>>> reproducible.
>>>>>
>>>>> This time I looked more to the debugger side.
>>>>> The code is poorly structured, it is hard to understand what it is 
>>>>> doing.
>>>>>
>>>>> >   - resumed main thread after breakpoint enabled and thread2
>>>>>      double resume
>>>>>
>>>>> Could you, please, explain more how does this help and why timeouts 
>>>>> are intermittent?
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 2/22/19 11:52 AM, Gary Adams wrote:
>>>>>> After 1000 successful testruns, it seems like a reasonable 
>>>>>> solution is
>>>>>> to resume the main thread to resolve any locks holding back the
>>>>>> thread2 reaching the desired breakpoint.
>>>>>>
>>>>>>   Updated webrev: 
>>>>>> http://cr.openjdk.java.net/~gadams/4903717/webrev.01/
>>>>>>
>>>>>> Changes in this iteration:
>>>>>>   - cosmetic changes
>>>>>>      - spaces around operators
>>>>>>      - removed extra spaces around parens
>>>>>>      - removed extra blank lines
>>>>>>      - fixed typo in ERRROR
>>>>>>      - fixed curl braces indenting
>>>>>>   - resumed main thread after breakpoint enabled and thread2
>>>>>>      double resume
>>>>>>
>>>>>>
>>>>>> On 2/22/19, 6:19 AM, gary.adams at oracle.com wrote:
>>>>>>> I'll make a pass over the code for cosmetic fixes after the test
>>>>>>> is working correctly.
>>>>>>>
>>>>>>> I think the purpose of the lockingObject is to ensure the debugger
>>>>>>> has time to enable the breakpoint and perform the double resume
>>>>>>> before debuggee thread2 is released to run to the expected 
>>>>>>> breakpoint.
>>>>>>>
>>>>>>> I'm assuming the timeout and break point location are correct, 
>>>>>>> because
>>>>>>> the test passes most of the time. I'll dump the values to make sure.
>>>>>>>
>>>>>>> I'll also try to capture a thread dump at the time of the timeout,
>>>>>>> to see if the debuggee is hung somewhere unexpected.
>>>>>>>
>>>>>>> The test leaves the main thread suspended while thread2 is
>>>>>>> is resumed. I'll see if resuming the main thread unblocks thread2.
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> On 2/21/19 10:18 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>>> Hi Gary,
>>>>>>>>
>>>>>>>> I do not see any cause of this timeout yet.
>>>>>>>> However, the synchronization on the lockingObject looks 
>>>>>>>> completely wasteful.
>>>>>>>> Moreover, it adds unreasonable complexity and makes the test 
>>>>>>>> harder to understand.
>>>>>>>>
>>>>>>>> So, I'd suggested to do a small cleanup that includes:
>>>>>>>>   - removing the lockingObject field and sync around it
>>>>>>>>   - fix wrong indents in the fragment:
>>>>>>>>   161         synchronized (waitnotifyObj)                    {
>>>>>>>>   162             log("entered into block:  synchronized 
>>>>>>>> (waitnotifyObj)");
>>>>>>>>   163             waitnotifyObj.notify();                     }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Serguei
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/21/19 10:57, gary.adams at oracle.com wrote:
>>>>>>>>> The commented out "docontinue" was done to match the style
>>>>>>>>> used in the other tests. I'll remove it for this test.
>>>>>>>>>
>>>>>>>>> In any event, the test still is failing 8 / 500 testruns on
>>>>>>>>> macosx and windows debug builds.
>>>>>>>>>
>>>>>>>>> The change is not sufficient.
>>>>>>>>>
>>>>>>>>> Here's a recent failure. You can see in the log that the 
>>>>>>>>> debugger did
>>>>>>>>> wait until the new "docontinue" arrives outside the lockingObject
>>>>>>>>> synchronized block.
>>>>>>>>>
>>>>>>>>> It appears with the main thread suspended and the thread2 
>>>>>>>>> double resumed,
>>>>>>>>> it still does not continue executing up to the breakpoint in 
>>>>>>>>> thread2.
>>>>>>>>> After the eventQueue.remove(timeout), when the vm is resumed
>>>>>>>>> you can see the debuggee messages expected leading up to the 
>>>>>>>>> breakpoint.
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> --> debugger:       suspending VM with vm.suspend();
>>>>>>>>> *debugee.stderr> **> mainThread: mainThread is out of: 
>>>>>>>>> synchronized (lockingObject) {*
>>>>>>>>> --> debugger:          thread2.suspendCount() == 2
>>>>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>>>>> --> debugger:          mainThread.suspendCount() == 1
>>>>>>>>> --> debugger:      :   mainThread.isSuspended()
>>>>>>>>> --> debugger:       double resuming thread2 with thread2.resume();
>>>>>>>>> --> debugger:          thread2.suspendCount() == 1
>>>>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>>>>> --> debugger:      enabling breakpRequest1
>>>>>>>>> --> debugger:          thread2.suspendCount() == 0
>>>>>>>>> --> debugger:      :  !thread2.isAtBreakpoint() before call to 
>>>>>>>>> breakpoint()
>>>>>>>>> --> debugger:        waiting for BreakpointEvent
>>>>>>>>> *--> debugger: new: eventSet = eventQueue.remove();*
>>>>>>>>> # ERROR: ##> debugger: ERROR:  timeout for waiting for a 
>>>>>>>>> BreakpintEvent
>>>>>>>>> The following stacktrace is for failure analysis.
>>>>>>>>> nsk.share.TestFailure: ##> debugger: ERROR:  timeout for 
>>>>>>>>> waiting for a BreakpintEvent
>>>>>>>>>     at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:428)
>>>>>>>>>     at nsk.share.Log.complain(Log.java:399)
>>>>>>>>>     at 
>>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.log3(issuspended002.java:104) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.breakpoint(issuspended002.java:555) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.runThis(issuspended002.java:389) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.run(issuspended002.java:90) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.main(issuspended002.java:85) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>>>>> Method)
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
>>>>>>>>>
>>>>>>>>>     at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>>>>     at 
>>>>>>>>> PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>>>>> Method)
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>>>>>>>>>
>>>>>>>>>     at 
>>>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
>>>>>>>>>
>>>>>>>>>     at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>>>>     at 
>>>>>>>>> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) 
>>>>>>>>>
>>>>>>>>>     at java.base/java.lang.Thread.run(Thread.java:835)
>>>>>>>>> *--> debugger: resuming both second and main thread 
>>>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from 
>>>>>>>>> the debugger ... --> debugger: the end of testing 
>>>>>>>>> debugee.stderr> **> thread2: entered into block: synchronized 
>>>>>>>>> (lockingObject) debugee.stderr> **> thread2: exited from block: 
>>>>>>>>> synchronized (lockingObject) debugee.stderr> **> thread2: call 
>>>>>>>>> to the method 'runt1' debugee.stderr> **> thread2: method 
>>>>>>>>> 'runt1' enter*
>>>>>>>>> --> debugger:      : returned string is 'checkend'
>>>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from 
>>>>>>>>> the debugger ...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/20/19 5:15 PM, Alex Menkov wrote:
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> But could you please remove old "pipe.println("docontinue")" 
>>>>>>>>>> statement instead of commenting it out (no new webrev is 
>>>>>>>>>> required)
>>>>>>>>>>
>>>>>>>>>> --alex
>>>>>>>>>>
>>>>>>>>>> On 02/20/2019 09:58, Chris Plummer wrote:
>>>>>>>>>>> Looks good.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>> On 2/20/19 7:57 AM, Gary Adams wrote:
>>>>>>>>>>>> The issuspended002 has been on the ProblemList since the 
>>>>>>>>>>>> tests were moved to
>>>>>>>>>>>> the open repos. The proposed changeset applies the same fix 
>>>>>>>>>>>> that was used in issuspended001.
>>>>>>>>>>>>
>>>>>>>>>>>> The current test fails when the debuggee replies with the 
>>>>>>>>>>>> "docontinue" response,
>>>>>>>>>>>> while it is still holding a lock in a syncrhonized block. 
>>>>>>>>>>>> When the debugger then
>>>>>>>>>>>> suspends the vm the debuggee test thread fails to proceed to 
>>>>>>>>>>>> the expected breakpoint.
>>>>>>>>>>>> Testing in progress.
>>>>>>>>>>>>
>>>>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-4903717
>>>>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.00/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
> 


More information about the serviceability-dev mailing list