RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again

Chris Plummer chris.plummer at oracle.com
Fri Sep 28 18:23:48 UTC 2018


I'm fine with keeping the Jdb changes out of this exclude001 fix, but I 
don't see what it has to do with kill002. IIRC the bug with that test is 
that it is only waiting for the first of two prompts that come in, so 
sometimes it returns the reply too early.

Chris

On 9/28/18 11:20 AM, Gary Adams wrote:
> I think I'd pref to just submit the exclude001 fix at this time 
> without the
> Jdb.java changes.
>
> We can revisit this wait for message diagnostic clean up along with 
> the kill002
> fix.
>
> On 9/28/18, 2:15 PM, Chris Plummer wrote:
>> Yep. My suggestion was to use this version of receiveReply(), but 
>> also have receiveReply() not call waitForPrompt() if count was 0. I 
>> think it would be good to test with this change in place, but with a 
>> smaller timeout so you can reproduce the timeout error and test this 
>> receiveReply() call. Other than that the changes look fine to me.
>>
>> thanks,
>>
>> Chris
>>
>> On 9/28/18 11:04 AM, Alex Menkov wrote:
>>> Hi Gary,
>>>
>>> receiveReply(startPos, false, 0)
>>> calls
>>> waitForPrompt(startPos, compoundPromptOnly, count);
>>>
>>> and waitForPrompt has:
>>>         if (count <= 0) {
>>>             throw new TestBug("Wrong number of prompts count in 
>>> Jdb.waitForPrompt(): " + count);
>>>         }
>>>
>>> So We will get "Wrong number of prompts count" failure?
>>>
>>> --alex
>>>
>>> On 09/28/2018 04:47, Gary Adams wrote:
>>>> Revised webrev:
>>>>
>>>>    Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev.01/
>>>>
>>>> The final fix includes
>>>>      - updated the timeout for the test (should handle sparc debug 
>>>> slowness)
>>>>      - wait for explicit prompts from cont command (avoids 
>>>> confusion from "int[2]")
>>>>      - fixed a typo in an exclude pattern ("jdk.*")
>>>>      - on wait for message timeout, don't wait for prompt
>>>>         when dumping current
>>>>
>>>> Should have another reviewer in addition to Chris.
>>>>
>>>> On 9/27/18, 3:12 PM, Chris Plummer wrote:
>>>>> The extra check after timing out doesn't seem like it should help. 
>>>>> You've already called findMessage() 2100 times at 200ms intervals. 
>>>>> Why would one more call after that help? I think it might be the 
>>>>> receiveReply() call that is fixing it. It does a waitForPrompt(), 
>>>>> so this probably gives us another 420000 ms for the prompt to come 
>>>>> in. This call to receiveReply() is actually a bug itself since we 
>>>>> are doing it just to print the current buffer, not the buffer 
>>>>> after waiting for a prompt to come in.
>>>>>
>>>>> In any case, looks like this prompt is taking more than 420200 
>>>>> milliseconds to come in, but does eventually come in, and extra 
>>>>> waiting in receiveReply() is what is causing you to eventually see 
>>>>> the prompt. I think bumping up the timeout to 600 and the waittime 
>>>>> to 10 is the proper fix here.
>>>>>
>>>>> And to address the receiveReply() issue, I'd suggest calling it 
>>>>> using receiveReply(startPos, false, 0), where 0 is the prompt 
>>>>> count, and have receiveReply() not wait for a prompt when the 
>>>>> count is 0.
>>>>>
>>>>> Chris
>>>>>
>>>>> On 9/27/18 11:44 AM, Gary Adams wrote:
>>>>>> Speaking of not being bullet proof, during testing of the fix to
>>>>>> wait for a specific prompt an intermittent failure was observed.
>>>>>> ...
>>>>>>
>>>>>> Sending command: trace methods 0x2a9
>>>>>> reply[0]: MyThread-0[1]
>>>>>> Sending command: cont
>>>>>> WARNING: message not recieved: MyThread-0[1]
>>>>>> Remaining debugger output follows:
>>>>>> reply[0]:>
>>>>>> reply[1]: Method exited: return value =<void value>, 
>>>>>> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(), 
>>>>>> line=93 bci=14
>>>>>> reply[2]: 93        }
>>>>>> reply[3]:
>>>>>> reply[4]: MyThread-0[1]
>>>>>> # ERROR: Caught unexpected exception while executing the test: 
>>>>>> nsk.share.Failure: Expected message not received during 420200 
>>>>>> milliseconds:
>>>>>> ...
>>>>>>
>>>>>> The wait for message times out looking for "MyThread-0[1]".
>>>>>> A WARNING is printed and the "remaining debugger output"
>>>>>> shows that "MyThread-0[1]" is in the buffer.
>>>>>>
>>>>>> I'm still investigating why the message match is not found.
>>>>>>
>>>>>> Adding a final check before failing the wait for message
>>>>>> seems to workaround the problem.
>>>>>>
>>>>>> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java 
>>>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>> @@ -515,10 +515,11 @@
>>>>>>          long delta = 200; // time in milliseconds to wait at 
>>>>>> every iteration.
>>>>>>          long total = 0;    // total time has waited.
>>>>>>          long max = 
>>>>>> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000; 
>>>>>> // maximum time to wait.
>>>>>> +        int found = 0;
>>>>>>
>>>>>>          Object dummy = new Object();
>>>>>>          while ((total += delta) <= max) {
>>>>>> -            int found = 0;
>>>>>> +            found = 0;
>>>>>>
>>>>>>              // search for message
>>>>>>              {
>>>>>> @@ -553,6 +554,12 @@
>>>>>>          log.display("WARNING: message not recieved: " + message);
>>>>>>          log.display("Remaining debugger output follows:");
>>>>>>          receiveReply(startPos);
>>>>>> +
>>>>>> +        // One last chance
>>>>>> +        found = findMessage(startPos, message);
>>>>>> +        if (found > 0) {
>>>>>> +            return found;
>>>>>> +        }
>>>>>>          throw new Failure("Expected message not received during 
>>>>>> " + total + " milliseconds:"
>>>>>>                              + "\n\t" + message);
>>>>>>      }
>>>>>>
>>>>>>
>>>>>> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>>>>>>> Looks good. Still not bullet proof, but I'm not sure it's 
>>>>>>> possible to write tests like this in a way that will work no 
>>>>>>> matter what output is produced by the method enter/exit events.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>>>>>>> The test failure has been identified due to the "int[2]"
>>>>>>>> being misrecognized as a compound prompt.  This caused a cont
>>>>>>>> command to be sent prematurely.
>>>>>>>>
>>>>>>>> The proposed fix waits for the correct prompt before
>>>>>>>> advancing to the next command.
>>>>>>>>
>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>>>>>>
>>>>>>>> Testing is in progress.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>




More information about the serviceability-dev mailing list