RFR: JDK-4903717: nsk/jdi/ThreadReference/isSuspended/issuspended002 failing
Alex Menkov
alexey.menkov at oracle.com
Thu Feb 28 21:44:26 UTC 2019
Hi Gary,
Now "mainThread is out of: synchronized (lockingObject) {" record is
logged at the end of the scenario and that makes it useless.
I think it would be better to call
pipe.println("docontinue") after logging of this message:
log1("mainThread is out of: synchronized (lockingObject) {");
+ pipe.println("docontinue");
+ // wait until the scenario is completed to avoid deadlocks on logging
+ synchronized (Threadissuspended002a.waitnotifyObj) {
+ Threadissuspended002a.waitnotifyObj.wait();
+ }
--alex
On 02/27/2019 02:29, gary.adams at oracle.com wrote:
> Revised webrev : http://cr.openjdk.java.net/~gadams/4903717/webrev.02/
>
> To prevent the main thread from holding the logging lock when
> the debugger suspends the debuggee, an explicit wait/notify is used
> so the test thread can safely issue log messages while the main thread
> remains suspended.
>
> Testing is in progress.
>
> On 2/26/19 3:49 PM, gary.adams at oracle.com wrote:
>> New fix will be coming tomorrow...
>>
>> I dumped the main and thread2 stacks at the
>> point of the timeout and found they were
>> both contending for the java.io.PrintStream
>> writeln() synchronization block.
>>
>> While writing log messages main thread held the
>> lock and was suspended and thread2 was blocked
>> trying to report it's progress. So thread2 never reaches the
>> breakpoint.
>>
>> java.nio.Buffer:292 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.CharBuffer:1260 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.CharBuffer:265 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.Buffer:223 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.CharBuffer:284 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.HeapCharBuffer:77 in thread instance of java.lang.Thread(name='main', id=1)
>> java.nio.CharBuffer:396 in thread instance of java.lang.Thread(name='main', id=1)
>> sun.nio.cs.StreamEncoder:280 in thread instance of java.lang.Thread(name='main', id=1)
>> sun.nio.cs.StreamEncoder:125 in thread instance of java.lang.Thread(name='main', id=1)
>> java.io.OutputStreamWriter:211 in thread instance of java.lang.Thread(name='main', id=1)
>> java.io.BufferedWriter:120 in thread instance of java.lang.Thread(name='main', id=1)
>> *java.io.PrintStream:653 in thread instance of
>> java.lang.Thread(name='main', id=1)*
>> java.io.PrintStream:958 in thread instance of java.lang.Thread(name='main', id=1)
>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:49 in thread instance of java.lang.Thread(name='main', id=1)
>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:124 in thread instance of java.lang.Thread(name='main', id=1)
>>
>>
>> *java.io.PrintStream:650 in thread instance of
>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread',
>> id=589)*
>> java.io.PrintStream:958 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
>> nsk.jdi.ThreadReference.isSuspended.issuspended002a:53 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
>> nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a:203 in thread instance of nsk.jdi.ThreadReference.isSuspended.Threadissuspended002a(name='testedThread', id=589)
>>
>> On 2/22/19 11:21 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi Gary,
>>>
>>> Thank you for cosmetic changes.
>>>
>>> One thing to notice there is no guard against spurious wakeup for this:
>>> 108 Threadissuspended002a.waitnotifyObj.wait();
>>>
>>> But I doubt it causes these timeouts as they are very well reproducible.
>>>
>>> This time I looked more to the debugger side.
>>> The code is poorly structured, it is hard to understand what it is doing.
>>>
>>> > - resumed main thread after breakpoint enabled and thread2
>>> double resume
>>>
>>> Could you, please, explain more how does this help and why timeouts
>>> are intermittent?
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 2/22/19 11:52 AM, Gary Adams wrote:
>>>> After 1000 successful testruns, it seems like a reasonable solution is
>>>> to resume the main thread to resolve any locks holding back the
>>>> thread2 reaching the desired breakpoint.
>>>>
>>>> Updated webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.01/
>>>>
>>>> Changes in this iteration:
>>>> - cosmetic changes
>>>> - spaces around operators
>>>> - removed extra spaces around parens
>>>> - removed extra blank lines
>>>> - fixed typo in ERRROR
>>>> - fixed curl braces indenting
>>>> - resumed main thread after breakpoint enabled and thread2
>>>> double resume
>>>>
>>>>
>>>> On 2/22/19, 6:19 AM, gary.adams at oracle.com wrote:
>>>>> I'll make a pass over the code for cosmetic fixes after the test
>>>>> is working correctly.
>>>>>
>>>>> I think the purpose of the lockingObject is to ensure the debugger
>>>>> has time to enable the breakpoint and perform the double resume
>>>>> before debuggee thread2 is released to run to the expected breakpoint.
>>>>>
>>>>> I'm assuming the timeout and break point location are correct, because
>>>>> the test passes most of the time. I'll dump the values to make sure.
>>>>>
>>>>> I'll also try to capture a thread dump at the time of the timeout,
>>>>> to see if the debuggee is hung somewhere unexpected.
>>>>>
>>>>> The test leaves the main thread suspended while thread2 is
>>>>> is resumed. I'll see if resuming the main thread unblocks thread2.
>>>>>
>>>>> ...
>>>>>
>>>>> On 2/21/19 10:18 PM, serguei.spitsyn at oracle.com wrote:
>>>>>> Hi Gary,
>>>>>>
>>>>>> I do not see any cause of this timeout yet.
>>>>>> However, the synchronization on the lockingObject looks completely
>>>>>> wasteful.
>>>>>> Moreover, it adds unreasonable complexity and makes the test
>>>>>> harder to understand.
>>>>>>
>>>>>> So, I'd suggested to do a small cleanup that includes:
>>>>>> - removing the lockingObject field and sync around it
>>>>>> - fix wrong indents in the fragment:
>>>>>> 161 synchronized (waitnotifyObj) {
>>>>>> 162 log("entered into block: synchronized (waitnotifyObj)");
>>>>>> 163 waitnotifyObj.notify(); }
>>>>>>
>>>>>> Thanks,
>>>>>> Serguei
>>>>>>
>>>>>>
>>>>>> On 2/21/19 10:57, gary.adams at oracle.com wrote:
>>>>>>> The commented out "docontinue" was done to match the style
>>>>>>> used in the other tests. I'll remove it for this test.
>>>>>>>
>>>>>>> In any event, the test still is failing 8 / 500 testruns on
>>>>>>> macosx and windows debug builds.
>>>>>>>
>>>>>>> The change is not sufficient.
>>>>>>>
>>>>>>> Here's a recent failure. You can see in the log that the debugger did
>>>>>>> wait until the new "docontinue" arrives outside the lockingObject
>>>>>>> synchronized block.
>>>>>>>
>>>>>>> It appears with the main thread suspended and the thread2 double
>>>>>>> resumed,
>>>>>>> it still does not continue executing up to the breakpoint in thread2.
>>>>>>> After the eventQueue.remove(timeout), when the vm is resumed
>>>>>>> you can see the debuggee messages expected leading up to the
>>>>>>> breakpoint.
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> --> debugger: suspending VM with vm.suspend();
>>>>>>> *debugee.stderr> **> mainThread: mainThread is out of:
>>>>>>> synchronized (lockingObject) {*
>>>>>>> --> debugger: thread2.suspendCount() == 2
>>>>>>> --> debugger: : thread2.isSuspended()
>>>>>>> --> debugger: mainThread.suspendCount() == 1
>>>>>>> --> debugger: : mainThread.isSuspended()
>>>>>>> --> debugger: double resuming thread2 with thread2.resume();
>>>>>>> --> debugger: thread2.suspendCount() == 1
>>>>>>> --> debugger: : thread2.isSuspended()
>>>>>>> --> debugger: enabling breakpRequest1
>>>>>>> --> debugger: thread2.suspendCount() == 0
>>>>>>> --> debugger: : !thread2.isAtBreakpoint() before call to breakpoint()
>>>>>>> --> debugger: waiting for BreakpointEvent
>>>>>>> *--> debugger: new: eventSet = eventQueue.remove();*
>>>>>>> # ERROR: ##> debugger: ERROR: timeout for waiting for a BreakpintEvent
>>>>>>> The following stacktrace is for failure analysis.
>>>>>>> nsk.share.TestFailure: ##> debugger: ERROR: timeout for waiting for a BreakpintEvent
>>>>>>> at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:428)
>>>>>>> at nsk.share.Log.complain(Log.java:399)
>>>>>>> at nsk.jdi.ThreadReference.isSuspended.issuspended002.log3(issuspended002.java:104)
>>>>>>> at nsk.jdi.ThreadReference.isSuspended.issuspended002.breakpoint(issuspended002.java:555)
>>>>>>> at nsk.jdi.ThreadReference.isSuspended.issuspended002.runThis(issuspended002.java:389)
>>>>>>> at nsk.jdi.ThreadReference.isSuspended.issuspended002.run(issuspended002.java:90)
>>>>>>> at nsk.jdi.ThreadReference.isSuspended.issuspended002.main(issuspended002.java:85)
>>>>>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>> at PropertyResolvingWrapper.main(PropertyResolvingWrapper.java:104)
>>>>>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
>>>>>>> at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
>>>>>>> at java.base/java.lang.Thread.run(Thread.java:835)
>>>>>>> *--> debugger: resuming both second and main thread
>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from
>>>>>>> the debugger ... --> debugger: the end of testing debugee.stderr>
>>>>>>> **> thread2: entered into block: synchronized (lockingObject)
>>>>>>> debugee.stderr> **> thread2: exited from block: synchronized
>>>>>>> (lockingObject) debugee.stderr> **> thread2: call to the method
>>>>>>> 'runt1' debugee.stderr> **> thread2: method 'runt1' enter*
>>>>>>> --> debugger: : returned string is 'checkend'
>>>>>>> debugee.stderr> **> mainThread: waiting for an instruction from the debugger ...
>>>>>>>
>>>>>>>
>>>>>>> On 2/20/19 5:15 PM, Alex Menkov wrote:
>>>>>>>> +1
>>>>>>>>
>>>>>>>> But could you please remove old "pipe.println("docontinue")"
>>>>>>>> statement instead of commenting it out (no new webrev is required)
>>>>>>>>
>>>>>>>> --alex
>>>>>>>>
>>>>>>>> On 02/20/2019 09:58, Chris Plummer wrote:
>>>>>>>>> Looks good.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On 2/20/19 7:57 AM, Gary Adams wrote:
>>>>>>>>>> The issuspended002 has been on the ProblemList since the tests
>>>>>>>>>> were moved to
>>>>>>>>>> the open repos. The proposed changeset applies the same fix
>>>>>>>>>> that was used in issuspended001.
>>>>>>>>>>
>>>>>>>>>> The current test fails when the debuggee replies with the
>>>>>>>>>> "docontinue" response,
>>>>>>>>>> while it is still holding a lock in a syncrhonized block. When
>>>>>>>>>> the debugger then
>>>>>>>>>> suspends the vm the debuggee test thread fails to proceed to
>>>>>>>>>> the expected breakpoint.
>>>>>>>>>> Testing in progress.
>>>>>>>>>>
>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-4903717
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~gadams/4903717/webrev.00/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list