RFR: JDK-4903717: nsk/jdi/ThreadReference/isSuspended/issuspended002 failing

gary.adams at oracle.com gary.adams at oracle.com
Wed Feb 27 10:29:02 UTC 2019


Revised webrev : http://cr.openjdk.java.net/~gadams/4903717/webrev.02/

To prevent the main thread from holding the logging lock when
the debugger suspends the debuggee, an explicit wait/notify is used
so the test thread can safely issue log messages while the main thread
remains suspended.

Testing is in progress.

On 2/26/19 3:49 PM, gary.adams at oracle.com wrote:
> New fix will be coming tomorrow...
>
> I dumped the main and thread2 stacks at the
> point of the timeout and found they were
> both contending for the java.io.PrintStream
> writeln() synchronization block.
>
> While writing log messages main thread held the
> lock and was suspended and thread2 was blocked
> trying to report it's progress. So thread2 never reaches the
> breakpoint.
>
> java.nio.Buffer:292 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.CharBuffer:1260 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.CharBuffer:265 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.Buffer:223 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.CharBuffer:284 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.HeapCharBuffer:77 in thread instance of java.lang.Thread(name='main', id=1)
> java.nio.CharBuffer:396 in thread instance of java.lang.Thread(name='main', id=1)
> sun.nio.cs.StreamEncoder:280 in thread instance of java.lang.Thread(name='main', id=1)
> sun.nio.cs.StreamEncoder:125 in thread instance of java.lang.Thread(name='main', id=1)
> java.io.OutputStreamWriter:211 in thread instance of java.lang.Thread(name='main', id=1)
> java.io.BufferedWriter:120 in thread instance of java.lang.Thread(name='main', id=1)
> *java.io.PrintStream:653 in thread instance of 
> java.lang.Thread(name='main', id=1)*
> java.io.PrintStream:958 in thread instance of java.lang.Thread(name='main', id=1)
> nsk.jdi.ThreadReference.isSuspended.issuspended002a:49 in thread instance of java.lang.Thread(name='main', id=1)
> nsk.jdi.ThreadReference.isSuspended.issuspended002a:124 in thread instance of java.lang.Thread(name='main', id=1)
>
>
> *java.io.PrintStream:650 in thread instance of 
> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', 
> id=589)*
> java.io.PrintStream:958 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
> nsk.jdi.ThreadReference.isSuspended.issuspended002a:53 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a:203 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
>
> On 2/22/19 11:21 PM, serguei.spitsyn at oracle.com wrote:
>> Hi Gary,
>>
>> Thank you for cosmetic changes.
>>
>> One thing to notice there is no guard against spurious wakeup for this:
>>    108 Threadissuspended002a.waitnotifyObj.wait();
>>
>> But I doubt it causes these timeouts as they are very well reproducible.
>>
>> This time I looked more to the debugger side.
>> The code is poorly structured, it is hard to understand what it is doing.
>>
>> >   - resumed main thread after breakpoint enabled and thread2
>>      double resume
>>
>> Could you, please, explain more how does this help and why timeouts 
>> are intermittent?
>>
>> Thanks,
>> Serguei
>>
>>
>> On 2/22/19 11:52 AM, Gary Adams wrote:
>>> After 1000 successful testruns, it seems like a reasonable solution is
>>> to resume the main thread to resolve any locks holding back the
>>> thread2 reaching the desired breakpoint.
>>>
>>>   Updated webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.01/
>>>
>>> Changes in this iteration:
>>>   - cosmetic changes
>>>      - spaces around operators
>>>      - removed extra spaces around parens
>>>      - removed extra blank lines
>>>      - fixed typo in ERRROR
>>>      - fixed curl braces indenting
>>>   - resumed main thread after breakpoint enabled and thread2
>>>      double resume
>>>
>>>
>>> On 2/22/19, 6:19 AM, gary.adams at oracle.com wrote:
>>>> I'll make a pass over the code for cosmetic fixes after the test
>>>> is working correctly.
>>>>
>>>> I think the purpose of the lockingObject is to ensure the debugger
>>>> has time to enable the breakpoint and perform the double resume
>>>> before debuggee thread2 is released to run to the expected breakpoint.
>>>>
>>>> I'm assuming the timeout and break point location are correct, because
>>>> the test passes most of the time. I'll dump the values to make sure.
>>>>
>>>> I'll also try to capture a thread dump at the time of the timeout,
>>>> to see if the debuggee is hung somewhere unexpected.
>>>>
>>>> The test leaves the main thread suspended while thread2 is
>>>> is resumed. I'll see if resuming the main thread unblocks thread2.
>>>>
>>>> ...
>>>>
>>>> On 2/21/19 10:18 PM, serguei.spitsyn at oracle.com wrote:
>>>>> Hi Gary,
>>>>>
>>>>> I do not see any cause of this timeout yet.
>>>>> However, the synchronization on the lockingObject looks completely 
>>>>> wasteful.
>>>>> Moreover, it adds unreasonable complexity and makes the test 
>>>>> harder to understand.
>>>>>
>>>>> So, I'd suggested to do a small cleanup that includes:
>>>>>   - removing the lockingObject field and sync around it
>>>>>   - fix wrong indents in the fragment:
>>>>>   161         synchronized (waitnotifyObj)                    {
>>>>>   162             log("entered into block:  synchronized (waitnotifyObj)");
>>>>>   163             waitnotifyObj.notify();                     }
>>>>>
>>>>> Thanks,
>>>>> Serguei
>>>>>
>>>>>
>>>>> On 2/21/19 10:57, gary.adams at oracle.com wrote:
>>>>>> The commented out "docontinue" was done to match the style
>>>>>> used in the other tests. I'll remove it for this test.
>>>>>>
>>>>>> In any event, the test still is failing 8 / 500 testruns on
>>>>>> macosx and windows debug builds.
>>>>>>
>>>>>> The change is not sufficient.
>>>>>>
>>>>>> Here's a recent failure. You can see in the log that the debugger did
>>>>>> wait until the new "docontinue" arrives outside the lockingObject
>>>>>> synchronized block.
>>>>>>
>>>>>> It appears with the main thread suspended and the thread2 double 
>>>>>> resumed,
>>>>>> it still does not continue executing up to the breakpoint in thread2.
>>>>>> After the eventQueue.remove(timeout), when the vm is resumed
>>>>>> you can see the debuggee messages expected leading up to the 
>>>>>> breakpoint.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> --> debugger:       suspending VM with vm.suspend();
>>>>>> *debugee.stderr> **> mainThread: mainThread is out of: 
>>>>>> synchronized (lockingObject) {*
>>>>>> --> debugger:          thread2.suspendCount() == 2
>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>> --> debugger:          mainThread.suspendCount() == 1
>>>>>> --> debugger:      :   mainThread.isSuspended()
>>>>>> --> debugger:       double resuming thread2 with thread2.resume();
>>>>>> --> debugger:          thread2.suspendCount() == 1
>>>>>> --> debugger:      :   thread2.isSuspended()
>>>>>> --> debugger:      enabling breakpRequest1
>>>>>> --> debugger:          thread2.suspendCount() == 0
>>>>>> --> debugger:      :  !thread2.isAtBreakpoint() before call to breakpoint()
>>>>>> --> debugger:        waiting for BreakpointEvent
>>>>>> *--> debugger: new: eventSet = eventQueue.remove();*
>>>>>> # ERROR: ##> debugger: ERROR:  timeout for waiting for a BreakpintEvent
>>>>>> The following stacktrace is for failure analysis.
>>>>>> nsk.share.TestFailure: ##> debugger: ERROR:  timeout for waiting for a BreakpintEvent
>>>>>> 	at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:428)
>>>>>> 	at nsk.share.Log.complain(Log.java:399)
>>>>>> 	at nsk.jdi.ThreadReference.isSuspended.issuspended002.log3(issuspended002.java:104)
>>>>>> 	at nsk.jdi.ThreadReference.isSuspended.issuspended002.breakpoint(issuspended002.java:555)
>>>>>> 	at nsk.jdi.ThreadReference.isSuspended.issuspended002.runThis(issuspended002.java:389)
>>>>>> 	at nsk.jdi.ThreadReference.isSuspended.issuspended002.run(issuspended002.java:90)
>>>>>> 	at nsk.jdi.ThreadReference.isSuspended.issuspended002.main(issuspended002.java:85)
>>>>>> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>> 	at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
>>>>>> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>> 	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
>>>>>> 	at java.base/java.lang.Thread.run(Thread.java:835)
>>>>>> *--> debugger: resuming both second and main thread 
>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from 
>>>>>> the debugger ... --> debugger: the end of testing debugee.stderr> 
>>>>>> **> thread2: entered into block: synchronized (lockingObject) 
>>>>>> debugee.stderr> **> thread2: exited from block: synchronized 
>>>>>> (lockingObject) debugee.stderr> **> thread2: call to the method 
>>>>>> 'runt1' debugee.stderr> **> thread2: method 'runt1' enter*
>>>>>> --> debugger:      : returned string is 'checkend'
>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from the debugger ...
>>>>>>
>>>>>>
>>>>>> On 2/20/19 5:15 PM, Alex Menkov wrote:
>>>>>>> +1
>>>>>>>
>>>>>>> But could you please remove old "pipe.println("docontinue")" 
>>>>>>> statement instead of commenting it out (no new webrev is required)
>>>>>>>
>>>>>>> --alex
>>>>>>>
>>>>>>> On 02/20/2019 09:58, Chris Plummer wrote:
>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 2/20/19 7:57 AM, Gary Adams wrote:
>>>>>>>>> The issuspended002 has been on the ProblemList since the tests 
>>>>>>>>> were moved to
>>>>>>>>> the open repos. The proposed changeset applies the same fix 
>>>>>>>>> that was used in issuspended001.
>>>>>>>>>
>>>>>>>>> The current test fails when the debuggee replies with the 
>>>>>>>>> "docontinue" response,
>>>>>>>>> while it is still holding a lock in a syncrhonized block. When 
>>>>>>>>> the debugger then
>>>>>>>>> suspends the vm the debuggee test thread fails to proceed to 
>>>>>>>>> the expected breakpoint.
>>>>>>>>> Testing in progress.
>>>>>>>>>
>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-4903717
>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.00/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20190227/5b63793d/attachment-0001.html>


More information about the serviceability-dev mailing list