RFR: JDK-4903717: nsk/jdi/ThreadReference/isSuspended/issuspended002 failing

Gary Adams gary.adams at oracle.com
Fri Mar 1 12:15:51 UTC 2019


I will move the logging statement up.

Leaving the "docontinue" in the synchronized block
will guarantee the debugger suspends the debuggee
main thread without any chance of interference
with logging commands.

I'm sure this test was left with verbose logging enabled in the
hopes of catching more details about the failing sequence of
events. In this case it was the excessive logging that was causing
the test to hang.

Now that the test can run reliably, I think we should turn off
the verbose logging.

On 2/28/19, 4:44 PM, Alex Menkov wrote:
> Hi Gary,
>
> Now "mainThread is out of: synchronized (lockingObject) {" record is 
> logged at the end of the scenario and that makes it useless.
>
> I think it would be better to call
> pipe.println("docontinue") after logging of this message:
>
>   log1("mainThread is out of: synchronized (lockingObject) {");
> +  pipe.println("docontinue");
> +  // wait until the scenario is completed to avoid deadlocks on logging
> +  synchronized (Threadissuspended002a.waitnotifyObj) {
> +    Threadissuspended002a.waitnotifyObj.wait();
> +  }
>
> --alex
>
> On 02/27/2019 02:29, gary.adams at oracle.com wrote:
>> Revised webrev : http://cr.openjdk.java.net/~gadams/4903717/webrev.02/
>>
>> To prevent the main thread from holding the logging lock when
>> the debugger suspends the debuggee, an explicit wait/notify is used
>> so the test thread can safely issue log messages while the main thread
>> remains suspended.
>>
>> Testing is in progress.
>>
>> On 2/26/19 3:49 PM, gary.adams at oracle.com wrote:
>>> New fix will be coming tomorrow...
>>>
>>> I dumped the main and thread2 stacks at the
>>> point of the timeout and found they were
>>> both contending for the java.io.PrintStream
>>> writeln() synchronization block.
>>>
>>> While writing log messages main thread held the
>>> lock and was suspended and thread2 was blocked
>>> trying to report it's progress. So thread2 never reaches the
>>> breakpoint.
>>>
>>> java.nio.Buffer:292 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.CharBuffer:1260 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.CharBuffer:265 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.Buffer:223 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.CharBuffer:284 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.HeapCharBuffer:77 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.nio.CharBuffer:396 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> sun.nio.cs.StreamEncoder:280 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> sun.nio.cs.StreamEncoder:125 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.io.OutputStreamWriter:211 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> java.io.BufferedWriter:120 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> *java.io.PrintStream:653 in thread instance of 
>>> java.lang.Thread(name='main', id=1)*
>>> java.io.PrintStream:958 in thread instance of 
>>> java.lang.Thread(name='main', id=1)
>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:49 in thread 
>>> instance of java.lang.Thread(name='main', id=1)
>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:124 in thread 
>>> instance of java.lang.Thread(name='main', id=1)
>>>
>>>
>>> *java.io.PrintStream:650 in thread instance of 
>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>> id=589)*
>>> java.io.PrintStream:958 in thread instance of 
>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>> id=589)
>>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:53 in thread 
>>> instance of 
>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>> id=589)
>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a:203 in 
>>> thread instance of 
>>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
>>> id=589)
>>>
>>> On 2/22/19 11:21 PM, serguei.spitsyn at oracle.com wrote:
>>>> Hi Gary,
>>>>
>>>> Thank you for cosmetic changes.
>>>>
>>>> One thing to notice there is no guard against spurious wakeup for 
>>>> this:
>>>>    108 Threadissuspended002a.waitnotifyObj.wait();
>>>>
>>>> But I doubt it causes these timeouts as they are very well 
>>>> reproducible.
>>>>
>>>> This time I looked more to the debugger side.
>>>> The code is poorly structured, it is hard to understand what it is 
>>>> doing.
>>>>
>>>> >   - resumed main thread after breakpoint enabled and thread2
>>>>      double resume
>>>>
>>>> Could you, please, explain more how does this help and why timeouts 
>>>> are intermittent?
>>>>
>>>> Thanks,
>>>> Serguei
>>>>
>>>>
>>>> On 2/22/19 11:52 AM, Gary Adams wrote:
>>>>> After 1000 successful testruns, it seems like a reasonable 
>>>>> solution is
>>>>> to resume the main thread to resolve any locks holding back the
>>>>> thread2 reaching the desired breakpoint.
>>>>>
>>>>>   Updated webrev: 
>>>>> http://cr.openjdk.java.net/~gadams/4903717/webrev.01/
>>>>>
>>>>> Changes in this iteration:
>>>>>   - cosmetic changes
>>>>>      - spaces around operators
>>>>>      - removed extra spaces around parens
>>>>>      - removed extra blank lines
>>>>>      - fixed typo in ERRROR
>>>>>      - fixed curl braces indenting
>>>>>   - resumed main thread after breakpoint enabled and thread2
>>>>>      double resume
>>>>>
>>>>>
>>>>> On 2/22/19, 6:19 AM, gary.adams at oracle.com wrote:
>>>>>> I'll make a pass over the code for cosmetic fixes after the test
>>>>>> is working correctly.
>>>>>>
>>>>>> I think the purpose of the lockingObject is to ensure the debugger
>>>>>> has time to enable the breakpoint and perform the double resume
>>>>>> before debuggee thread2 is released to run to the expected 
>>>>>> breakpoint.
>>>>>>
>>>>>> I'm assuming the timeout and break point location are correct, 
>>>>>> because
>>>>>> the test passes most of the time. I'll dump the values to make sure.
>>>>>>
>>>>>> I'll also try to capture a thread dump at the time of the timeout,
>>>>>> to see if the debuggee is hung somewhere unexpected.
>>>>>>
>>>>>> The test leaves the main thread suspended while thread2 is
>>>>>> is resumed. I'll see if resuming the main thread unblocks thread2.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> On 2/21/19 10:18 PM, serguei.spitsyn at oracle.com wrote:
>>>>>>> Hi Gary,
>>>>>>>
>>>>>>> I do not see any cause of this timeout yet.
>>>>>>> However, the synchronization on the lockingObject looks 
>>>>>>> completely wasteful.
>>>>>>> Moreover, it adds unreasonable complexity and makes the test 
>>>>>>> harder to understand.
>>>>>>>
>>>>>>> So, I'd suggested to do a small cleanup that includes:
>>>>>>>   - removing the lockingObject field and sync around it
>>>>>>>   - fix wrong indents in the fragment:
>>>>>>>   161         synchronized (waitnotifyObj)                    {
>>>>>>>   162             log("entered into block:  synchronized 
>>>>>>> (waitnotifyObj)");
>>>>>>>   163             waitnotifyObj.notify();                     }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Serguei
>>>>>>>
>>>>>>>
>>>>>>> On 2/21/19 10:57, gary.adams at oracle.com wrote:
>>>>>>>> The commented out "docontinue" was done to match the style
>>>>>>>> used in the other tests. I'll remove it for this test.
>>>>>>>>
>>>>>>>> In any event, the test still is failing 8 / 500 testruns on
>>>>>>>> macosx and windows debug builds.
>>>>>>>>
>>>>>>>> The change is not sufficient.
>>>>>>>>
>>>>>>>> Here's a recent failure. You can see in the log that the 
>>>>>>>> debugger did
>>>>>>>> wait until the new "docontinue" arrives outside the lockingObject
>>>>>>>> synchronized block.
>>>>>>>>
>>>>>>>> It appears with the main thread suspended and the thread2 
>>>>>>>> double resumed,
>>>>>>>> it still does not continue executing up to the breakpoint in 
>>>>>>>> thread2.
>>>>>>>> After the eventQueue.remove(timeout), when the vm is resumed
>>>>>>>> you can see the debuggee messages expected leading up to the 
>>>>>>>> breakpoint.
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> --> debugger:       suspending VM with vm.suspend();
>>>>>>>> *debugee.stderr> **> mainThread: mainThread is out of: 
>>>>>>>> synchronized (lockingObject) {*
>>>>>>>> --> debugger:          thread2.suspendCount() == 2
>>>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>>>> --> debugger:          mainThread.suspendCount() == 1
>>>>>>>> --> debugger:      :   mainThread.isSuspended()
>>>>>>>> --> debugger:       double resuming thread2 with thread2.resume();
>>>>>>>> --> debugger:          thread2.suspendCount() == 1
>>>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>>>> --> debugger:      enabling breakpRequest1
>>>>>>>> --> debugger:          thread2.suspendCount() == 0
>>>>>>>> --> debugger:      :  !thread2.isAtBreakpoint() before call to 
>>>>>>>> breakpoint()
>>>>>>>> --> debugger:        waiting for BreakpointEvent
>>>>>>>> *--> debugger: new: eventSet = eventQueue.remove();*
>>>>>>>> # ERROR: ##> debugger: ERROR:  timeout for waiting for a 
>>>>>>>> BreakpintEvent
>>>>>>>> The following stacktrace is for failure analysis.
>>>>>>>> nsk.share.TestFailure: ##> debugger: ERROR:  timeout for 
>>>>>>>> waiting for a BreakpintEvent
>>>>>>>>     at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:428)
>>>>>>>>     at nsk.share.Log.complain(Log.java:399)
>>>>>>>>     at 
>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.log3(issuspended002.java:104)
>>>>>>>>     at 
>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.breakpoint(issuspended002.java:555)
>>>>>>>>     at 
>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.runThis(issuspended002.java:389)
>>>>>>>>     at 
>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.run(issuspended002.java:90)
>>>>>>>>     at 
>>>>>>>> nsk.jdi.ThreadReference.isSuspended.issuspended002.main(issuspended002.java:85)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>>>> Method)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>>>     at 
>>>>>>>> PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>>>> Method)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>     at 
>>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>     at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>>>     at 
>>>>>>>> com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
>>>>>>>>     at java.base/java.lang.Thread.run(Thread.java:835)
>>>>>>>> *--> debugger: resuming both second and main thread 
>>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from 
>>>>>>>> the debugger ... --> debugger: the end of testing 
>>>>>>>> debugee.stderr> **> thread2: entered into block: synchronized 
>>>>>>>> (lockingObject) debugee.stderr> **> thread2: exited from block: 
>>>>>>>> synchronized (lockingObject) debugee.stderr> **> thread2: call 
>>>>>>>> to the method 'runt1' debugee.stderr> **> thread2: method 
>>>>>>>> 'runt1' enter*
>>>>>>>> --> debugger:      : returned string is 'checkend'
>>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from 
>>>>>>>> the debugger ...
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/20/19 5:15 PM, Alex Menkov wrote:
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> But could you please remove old "pipe.println("docontinue")" 
>>>>>>>>> statement instead of commenting it out (no new webrev is 
>>>>>>>>> required)
>>>>>>>>>
>>>>>>>>> --alex
>>>>>>>>>
>>>>>>>>> On 02/20/2019 09:58, Chris Plummer wrote:
>>>>>>>>>> Looks good.
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On 2/20/19 7:57 AM, Gary Adams wrote:
>>>>>>>>>>> The issuspended002 has been on the ProblemList since the 
>>>>>>>>>>> tests were moved to
>>>>>>>>>>> the open repos. The proposed changeset applies the same fix 
>>>>>>>>>>> that was used in issuspended001.
>>>>>>>>>>>
>>>>>>>>>>> The current test fails when the debuggee replies with the 
>>>>>>>>>>> "docontinue" response,
>>>>>>>>>>> while it is still holding a lock in a syncrhonized block. 
>>>>>>>>>>> When the debugger then
>>>>>>>>>>> suspends the vm the debuggee test thread fails to proceed to 
>>>>>>>>>>> the expected breakpoint.
>>>>>>>>>>> Testing in progress.
>>>>>>>>>>>
>>>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-4903717
>>>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.00/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>



More information about the serviceability-dev mailing list