RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again

Chris Plummer chris.plummer at oracle.com
Fri Sep 28 18:15:41 UTC 2018


Yep. My suggestion was to use this version of receiveReply(), but also 
have receiveReply() not call waitForPrompt() if count was 0. I think it 
would be good to test with this change in place, but with a smaller 
timeout so you can reproduce the timeout error and test this 
receiveReply() call. Other than that the changes look fine to me.

thanks,

Chris

On 9/28/18 11:04 AM, Alex Menkov wrote:
> Hi Gary,
>
> receiveReply(startPos, false, 0)
> calls
> waitForPrompt(startPos, compoundPromptOnly, count);
>
> and waitForPrompt has:
>         if (count <= 0) {
>             throw new TestBug("Wrong number of prompts count in 
> Jdb.waitForPrompt(): " + count);
>         }
>
> So We will get "Wrong number of prompts count" failure?
>
> --alex
>
> On 09/28/2018 04:47, Gary Adams wrote:
>> Revised webrev:
>>
>>    Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev.01/
>>
>> The final fix includes
>>      - updated the timeout for the test (should handle sparc debug 
>> slowness)
>>      - wait for explicit prompts from cont command (avoids confusion 
>> from "int[2]")
>>      - fixed a typo in an exclude pattern ("jdk.*")
>>      - on wait for message timeout, don't wait for prompt
>>         when dumping current
>>
>> Should have another reviewer in addition to Chris.
>>
>> On 9/27/18, 3:12 PM, Chris Plummer wrote:
>>> The extra check after timing out doesn't seem like it should help. 
>>> You've already called findMessage() 2100 times at 200ms intervals. 
>>> Why would one more call after that help? I think it might be the 
>>> receiveReply() call that is fixing it. It does a waitForPrompt(), so 
>>> this probably gives us another 420000 ms for the prompt to come in. 
>>> This call to receiveReply() is actually a bug itself since we are 
>>> doing it just to print the current buffer, not the buffer after 
>>> waiting for a prompt to come in.
>>>
>>> In any case, looks like this prompt is taking more than 420200 
>>> milliseconds to come in, but does eventually come in, and extra 
>>> waiting in receiveReply() is what is causing you to eventually see 
>>> the prompt. I think bumping up the timeout to 600 and the waittime 
>>> to 10 is the proper fix here.
>>>
>>> And to address the receiveReply() issue, I'd suggest calling it 
>>> using receiveReply(startPos, false, 0), where 0 is the prompt count, 
>>> and have receiveReply() not wait for a prompt when the count is 0.
>>>
>>> Chris
>>>
>>> On 9/27/18 11:44 AM, Gary Adams wrote:
>>>> Speaking of not being bullet proof, during testing of the fix to
>>>> wait for a specific prompt an intermittent failure was observed.
>>>> ...
>>>>
>>>> Sending command: trace methods 0x2a9
>>>> reply[0]: MyThread-0[1]
>>>> Sending command: cont
>>>> WARNING: message not recieved: MyThread-0[1]
>>>> Remaining debugger output follows:
>>>> reply[0]:>
>>>> reply[1]: Method exited: return value =<void value>, 
>>>> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(), 
>>>> line=93 bci=14
>>>> reply[2]: 93        }
>>>> reply[3]:
>>>> reply[4]: MyThread-0[1]
>>>> # ERROR: Caught unexpected exception while executing the test: 
>>>> nsk.share.Failure: Expected message not received during 420200 
>>>> milliseconds:
>>>> ...
>>>>
>>>> The wait for message times out looking for "MyThread-0[1]".
>>>> A WARNING is printed and the "remaining debugger output"
>>>> shows that "MyThread-0[1]" is in the buffer.
>>>>
>>>> I'm still investigating why the message match is not found.
>>>>
>>>> Adding a final check before failing the wait for message
>>>> seems to workaround the problem.
>>>>
>>>> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java 
>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> @@ -515,10 +515,11 @@
>>>>          long delta = 200; // time in milliseconds to wait at every 
>>>> iteration.
>>>>          long total = 0;    // total time has waited.
>>>>          long max = 
>>>> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000; // 
>>>> maximum time to wait.
>>>> +        int found = 0;
>>>>
>>>>          Object dummy = new Object();
>>>>          while ((total += delta) <= max) {
>>>> -            int found = 0;
>>>> +            found = 0;
>>>>
>>>>              // search for message
>>>>              {
>>>> @@ -553,6 +554,12 @@
>>>>          log.display("WARNING: message not recieved: " + message);
>>>>          log.display("Remaining debugger output follows:");
>>>>          receiveReply(startPos);
>>>> +
>>>> +        // One last chance
>>>> +        found = findMessage(startPos, message);
>>>> +        if (found > 0) {
>>>> +            return found;
>>>> +        }
>>>>          throw new Failure("Expected message not received during " 
>>>> + total + " milliseconds:"
>>>>                              + "\n\t" + message);
>>>>      }
>>>>
>>>>
>>>> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>>>>> Looks good. Still not bullet proof, but I'm not sure it's possible 
>>>>> to write tests like this in a way that will work no matter what 
>>>>> output is produced by the method enter/exit events.
>>>>>
>>>>> Chris
>>>>>
>>>>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>>>>> The test failure has been identified due to the "int[2]"
>>>>>> being misrecognized as a compound prompt.  This caused a cont
>>>>>> command to be sent prematurely.
>>>>>>
>>>>>> The proposed fix waits for the correct prompt before
>>>>>> advancing to the next command.
>>>>>>
>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>>>>
>>>>>> Testing is in progress.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>




More information about the serviceability-dev mailing list