RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again

Gary Adams gary.adams at oracle.com
Fri Sep 28 11:47:42 UTC 2018


Revised webrev:

   Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev.01/

The final fix includes
     - updated the timeout for the test (should handle sparc debug slowness)
     - wait for explicit prompts from cont command (avoids confusion 
from "int[2]")
     - fixed a typo in an exclude pattern ("jdk.*")
     - on wait for message timeout, don't wait for prompt
        when dumping current

Should have another reviewer in addition to Chris.

On 9/27/18, 3:12 PM, Chris Plummer wrote:
> The extra check after timing out doesn't seem like it should help. 
> You've already called findMessage() 2100 times at 200ms intervals. Why 
> would one more call after that help? I think it might be the 
> receiveReply() call that is fixing it. It does a waitForPrompt(), so 
> this probably gives us another 420000 ms for the prompt to come in. 
> This call to receiveReply() is actually a bug itself since we are 
> doing it just to print the current buffer, not the buffer after 
> waiting for a prompt to come in.
>
> In any case, looks like this prompt is taking more than 420200 
> milliseconds to come in, but does eventually come in, and extra 
> waiting in receiveReply() is what is causing you to eventually see the 
> prompt. I think bumping up the timeout to 600 and the waittime to 10 
> is the proper fix here.
>
> And to address the receiveReply() issue, I'd suggest calling it using 
> receiveReply(startPos, false, 0), where 0 is the prompt count, and 
> have receiveReply() not wait for a prompt when the count is 0.
>
> Chris
>
> On 9/27/18 11:44 AM, Gary Adams wrote:
>> Speaking of not being bullet proof, during testing of the fix to
>> wait for a specific prompt an intermittent failure was observed.
>> ...
>>
>> Sending command: trace methods 0x2a9
>> reply[0]: MyThread-0[1]
>> Sending command: cont
>> WARNING: message not recieved: MyThread-0[1]
>> Remaining debugger output follows:
>> reply[0]:>
>> reply[1]: Method exited: return value =<void value>, 
>> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(), 
>> line=93 bci=14
>> reply[2]: 93        }
>> reply[3]:
>> reply[4]: MyThread-0[1]
>> # ERROR: Caught unexpected exception while executing the test: 
>> nsk.share.Failure: Expected message not received during 420200 
>> milliseconds:
>> ...
>>
>> The wait for message times out looking for "MyThread-0[1]".
>> A WARNING is printed and the "remaining debugger output"
>> shows that "MyThread-0[1]" is in the buffer.
>>
>> I'm still investigating why the message match is not found.
>>
>> Adding a final check before failing the wait for message
>> seems to workaround the problem.
>>
>> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java 
>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>> @@ -515,10 +515,11 @@
>>          long delta = 200; // time in milliseconds to wait at every 
>> iteration.
>>          long total = 0;    // total time has waited.
>>          long max = 
>> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000; // 
>> maximum time to wait.
>> +        int found = 0;
>>
>>          Object dummy = new Object();
>>          while ((total += delta) <= max) {
>> -            int found = 0;
>> +            found = 0;
>>
>>              // search for message
>>              {
>> @@ -553,6 +554,12 @@
>>          log.display("WARNING: message not recieved: " + message);
>>          log.display("Remaining debugger output follows:");
>>          receiveReply(startPos);
>> +
>> +        // One last chance
>> +        found = findMessage(startPos, message);
>> +        if (found > 0) {
>> +            return found;
>> +        }
>>          throw new Failure("Expected message not received during " + 
>> total + " milliseconds:"
>>                              + "\n\t" + message);
>>      }
>>
>>
>> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>>> Looks good. Still not bullet proof, but I'm not sure it's possible 
>>> to write tests like this in a way that will work no matter what 
>>> output is produced by the method enter/exit events.
>>>
>>> Chris
>>>
>>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>>> The test failure has been identified due to the "int[2]"
>>>> being misrecognized as a compound prompt.  This caused a cont
>>>> command to be sent prematurely.
>>>>
>>>> The proposed fix waits for the correct prompt before
>>>> advancing to the next command.
>>>>
>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>>
>>>> Testing is in progress.
>>>
>>>
>>>
>>
>
>



More information about the serviceability-dev mailing list