RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again

Gary Adams gary.adams at oracle.com
Fri Sep 28 18:29:25 UTC 2018


The kill002 logs I saw returned after the matching message e.g. "xxx killed"
When the pending output was dumped it included next prompt,
but the next prompt was not returned. This could have been a
similar situation to the exclude timeout, which then ended up
doing a wait for prompt as part of gathering the pending
output.

On 9/28/18, 2:23 PM, Chris Plummer wrote:
> I'm fine with keeping the Jdb changes out of this exclude001 fix, but 
> I don't see what it has to do with kill002. IIRC the bug with that 
> test is that it is only waiting for the first of two prompts that come 
> in, so sometimes it returns the reply too early.
>
> Chris
>
> On 9/28/18 11:20 AM, Gary Adams wrote:
>> I think I'd pref to just submit the exclude001 fix at this time 
>> without the
>> Jdb.java changes.
>>
>> We can revisit this wait for message diagnostic clean up along with 
>> the kill002
>> fix.
>>
>> On 9/28/18, 2:15 PM, Chris Plummer wrote:
>>> Yep. My suggestion was to use this version of receiveReply(), but 
>>> also have receiveReply() not call waitForPrompt() if count was 0. I 
>>> think it would be good to test with this change in place, but with a 
>>> smaller timeout so you can reproduce the timeout error and test this 
>>> receiveReply() call. Other than that the changes look fine to me.
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 9/28/18 11:04 AM, Alex Menkov wrote:
>>>> Hi Gary,
>>>>
>>>> receiveReply(startPos, false, 0)
>>>> calls
>>>> waitForPrompt(startPos, compoundPromptOnly, count);
>>>>
>>>> and waitForPrompt has:
>>>>         if (count <= 0) {
>>>>             throw new TestBug("Wrong number of prompts count in 
>>>> Jdb.waitForPrompt(): " + count);
>>>>         }
>>>>
>>>> So We will get "Wrong number of prompts count" failure?
>>>>
>>>> --alex
>>>>
>>>> On 09/28/2018 04:47, Gary Adams wrote:
>>>>> Revised webrev:
>>>>>
>>>>>    Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev.01/
>>>>>
>>>>> The final fix includes
>>>>>      - updated the timeout for the test (should handle sparc debug 
>>>>> slowness)
>>>>>      - wait for explicit prompts from cont command (avoids 
>>>>> confusion from "int[2]")
>>>>>      - fixed a typo in an exclude pattern ("jdk.*")
>>>>>      - on wait for message timeout, don't wait for prompt
>>>>>         when dumping current
>>>>>
>>>>> Should have another reviewer in addition to Chris.
>>>>>
>>>>> On 9/27/18, 3:12 PM, Chris Plummer wrote:
>>>>>> The extra check after timing out doesn't seem like it should 
>>>>>> help. You've already called findMessage() 2100 times at 200ms 
>>>>>> intervals. Why would one more call after that help? I think it 
>>>>>> might be the receiveReply() call that is fixing it. It does a 
>>>>>> waitForPrompt(), so this probably gives us another 420000 ms for 
>>>>>> the prompt to come in. This call to receiveReply() is actually a 
>>>>>> bug itself since we are doing it just to print the current 
>>>>>> buffer, not the buffer after waiting for a prompt to come in.
>>>>>>
>>>>>> In any case, looks like this prompt is taking more than 420200 
>>>>>> milliseconds to come in, but does eventually come in, and extra 
>>>>>> waiting in receiveReply() is what is causing you to eventually 
>>>>>> see the prompt. I think bumping up the timeout to 600 and the 
>>>>>> waittime to 10 is the proper fix here.
>>>>>>
>>>>>> And to address the receiveReply() issue, I'd suggest calling it 
>>>>>> using receiveReply(startPos, false, 0), where 0 is the prompt 
>>>>>> count, and have receiveReply() not wait for a prompt when the 
>>>>>> count is 0.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 9/27/18 11:44 AM, Gary Adams wrote:
>>>>>>> Speaking of not being bullet proof, during testing of the fix to
>>>>>>> wait for a specific prompt an intermittent failure was observed.
>>>>>>> ...
>>>>>>>
>>>>>>> Sending command: trace methods 0x2a9
>>>>>>> reply[0]: MyThread-0[1]
>>>>>>> Sending command: cont
>>>>>>> WARNING: message not recieved: MyThread-0[1]
>>>>>>> Remaining debugger output follows:
>>>>>>> reply[0]:>
>>>>>>> reply[1]: Method exited: return value =<void value>, 
>>>>>>> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(), 
>>>>>>> line=93 bci=14
>>>>>>> reply[2]: 93        }
>>>>>>> reply[3]:
>>>>>>> reply[4]: MyThread-0[1]
>>>>>>> # ERROR: Caught unexpected exception while executing the test: 
>>>>>>> nsk.share.Failure: Expected message not received during 420200 
>>>>>>> milliseconds:
>>>>>>> ...
>>>>>>>
>>>>>>> The wait for message times out looking for "MyThread-0[1]".
>>>>>>> A WARNING is printed and the "remaining debugger output"
>>>>>>> shows that "MyThread-0[1]" is in the buffer.
>>>>>>>
>>>>>>> I'm still investigating why the message match is not found.
>>>>>>>
>>>>>>> Adding a final check before failing the wait for message
>>>>>>> seems to workaround the problem.
>>>>>>>
>>>>>>> diff --git 
>>>>>>> a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java 
>>>>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> @@ -515,10 +515,11 @@
>>>>>>>          long delta = 200; // time in milliseconds to wait at 
>>>>>>> every iteration.
>>>>>>>          long total = 0;    // total time has waited.
>>>>>>>          long max = 
>>>>>>> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000; 
>>>>>>> // maximum time to wait.
>>>>>>> +        int found = 0;
>>>>>>>
>>>>>>>          Object dummy = new Object();
>>>>>>>          while ((total += delta) <= max) {
>>>>>>> -            int found = 0;
>>>>>>> +            found = 0;
>>>>>>>
>>>>>>>              // search for message
>>>>>>>              {
>>>>>>> @@ -553,6 +554,12 @@
>>>>>>>          log.display("WARNING: message not recieved: " + message);
>>>>>>>          log.display("Remaining debugger output follows:");
>>>>>>>          receiveReply(startPos);
>>>>>>> +
>>>>>>> +        // One last chance
>>>>>>> +        found = findMessage(startPos, message);
>>>>>>> +        if (found > 0) {
>>>>>>> +            return found;
>>>>>>> +        }
>>>>>>>          throw new Failure("Expected message not received during 
>>>>>>> " + total + " milliseconds:"
>>>>>>>                              + "\n\t" + message);
>>>>>>>      }
>>>>>>>
>>>>>>>
>>>>>>> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>>>>>>>> Looks good. Still not bullet proof, but I'm not sure it's 
>>>>>>>> possible to write tests like this in a way that will work no 
>>>>>>>> matter what output is produced by the method enter/exit events.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>>>>>>>> The test failure has been identified due to the "int[2]"
>>>>>>>>> being misrecognized as a compound prompt.  This caused a cont
>>>>>>>>> command to be sent prematurely.
>>>>>>>>>
>>>>>>>>> The proposed fix waits for the correct prompt before
>>>>>>>>> advancing to the next command.
>>>>>>>>>
>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>>>>>>>
>>>>>>>>> Testing is in progress.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>
>
>



More information about the serviceability-dev mailing list