RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again
Gary Adams
gary.adams at oracle.com
Fri Sep 28 18:29:25 UTC 2018
The kill002 logs I saw returned after the matching message e.g. "xxx killed"
When the pending output was dumped it included next prompt,
but the next prompt was not returned. This could have been a
similar situation to the exclude timeout, which then ended up
doing a wait for prompt as part of gathering the pending
output.
On 9/28/18, 2:23 PM, Chris Plummer wrote:
> I'm fine with keeping the Jdb changes out of this exclude001 fix, but
> I don't see what it has to do with kill002. IIRC the bug with that
> test is that it is only waiting for the first of two prompts that come
> in, so sometimes it returns the reply too early.
>
> Chris
>
> On 9/28/18 11:20 AM, Gary Adams wrote:
>> I think I'd pref to just submit the exclude001 fix at this time
>> without the
>> Jdb.java changes.
>>
>> We can revisit this wait for message diagnostic clean up along with
>> the kill002
>> fix.
>>
>> On 9/28/18, 2:15 PM, Chris Plummer wrote:
>>> Yep. My suggestion was to use this version of receiveReply(), but
>>> also have receiveReply() not call waitForPrompt() if count was 0. I
>>> think it would be good to test with this change in place, but with a
>>> smaller timeout so you can reproduce the timeout error and test this
>>> receiveReply() call. Other than that the changes look fine to me.
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 9/28/18 11:04 AM, Alex Menkov wrote:
>>>> Hi Gary,
>>>>
>>>> receiveReply(startPos, false, 0)
>>>> calls
>>>> waitForPrompt(startPos, compoundPromptOnly, count);
>>>>
>>>> and waitForPrompt has:
>>>> if (count <= 0) {
>>>> throw new TestBug("Wrong number of prompts count in
>>>> Jdb.waitForPrompt(): " + count);
>>>> }
>>>>
>>>> So We will get "Wrong number of prompts count" failure?
>>>>
>>>> --alex
>>>>
>>>> On 09/28/2018 04:47, Gary Adams wrote:
>>>>> Revised webrev:
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev.01/
>>>>>
>>>>> The final fix includes
>>>>> - updated the timeout for the test (should handle sparc debug
>>>>> slowness)
>>>>> - wait for explicit prompts from cont command (avoids
>>>>> confusion from "int[2]")
>>>>> - fixed a typo in an exclude pattern ("jdk.*")
>>>>> - on wait for message timeout, don't wait for prompt
>>>>> when dumping current
>>>>>
>>>>> Should have another reviewer in addition to Chris.
>>>>>
>>>>> On 9/27/18, 3:12 PM, Chris Plummer wrote:
>>>>>> The extra check after timing out doesn't seem like it should
>>>>>> help. You've already called findMessage() 2100 times at 200ms
>>>>>> intervals. Why would one more call after that help? I think it
>>>>>> might be the receiveReply() call that is fixing it. It does a
>>>>>> waitForPrompt(), so this probably gives us another 420000 ms for
>>>>>> the prompt to come in. This call to receiveReply() is actually a
>>>>>> bug itself since we are doing it just to print the current
>>>>>> buffer, not the buffer after waiting for a prompt to come in.
>>>>>>
>>>>>> In any case, looks like this prompt is taking more than 420200
>>>>>> milliseconds to come in, but does eventually come in, and extra
>>>>>> waiting in receiveReply() is what is causing you to eventually
>>>>>> see the prompt. I think bumping up the timeout to 600 and the
>>>>>> waittime to 10 is the proper fix here.
>>>>>>
>>>>>> And to address the receiveReply() issue, I'd suggest calling it
>>>>>> using receiveReply(startPos, false, 0), where 0 is the prompt
>>>>>> count, and have receiveReply() not wait for a prompt when the
>>>>>> count is 0.
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 9/27/18 11:44 AM, Gary Adams wrote:
>>>>>>> Speaking of not being bullet proof, during testing of the fix to
>>>>>>> wait for a specific prompt an intermittent failure was observed.
>>>>>>> ...
>>>>>>>
>>>>>>> Sending command: trace methods 0x2a9
>>>>>>> reply[0]: MyThread-0[1]
>>>>>>> Sending command: cont
>>>>>>> WARNING: message not recieved: MyThread-0[1]
>>>>>>> Remaining debugger output follows:
>>>>>>> reply[0]:>
>>>>>>> reply[1]: Method exited: return value =<void value>,
>>>>>>> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(),
>>>>>>> line=93 bci=14
>>>>>>> reply[2]: 93 }
>>>>>>> reply[3]:
>>>>>>> reply[4]: MyThread-0[1]
>>>>>>> # ERROR: Caught unexpected exception while executing the test:
>>>>>>> nsk.share.Failure: Expected message not received during 420200
>>>>>>> milliseconds:
>>>>>>> ...
>>>>>>>
>>>>>>> The wait for message times out looking for "MyThread-0[1]".
>>>>>>> A WARNING is printed and the "remaining debugger output"
>>>>>>> shows that "MyThread-0[1]" is in the buffer.
>>>>>>>
>>>>>>> I'm still investigating why the message match is not found.
>>>>>>>
>>>>>>> Adding a final check before failing the wait for message
>>>>>>> seems to workaround the problem.
>>>>>>>
>>>>>>> diff --git
>>>>>>> a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>>>>> @@ -515,10 +515,11 @@
>>>>>>> long delta = 200; // time in milliseconds to wait at
>>>>>>> every iteration.
>>>>>>> long total = 0; // total time has waited.
>>>>>>> long max =
>>>>>>> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000;
>>>>>>> // maximum time to wait.
>>>>>>> + int found = 0;
>>>>>>>
>>>>>>> Object dummy = new Object();
>>>>>>> while ((total += delta) <= max) {
>>>>>>> - int found = 0;
>>>>>>> + found = 0;
>>>>>>>
>>>>>>> // search for message
>>>>>>> {
>>>>>>> @@ -553,6 +554,12 @@
>>>>>>> log.display("WARNING: message not recieved: " + message);
>>>>>>> log.display("Remaining debugger output follows:");
>>>>>>> receiveReply(startPos);
>>>>>>> +
>>>>>>> + // One last chance
>>>>>>> + found = findMessage(startPos, message);
>>>>>>> + if (found > 0) {
>>>>>>> + return found;
>>>>>>> + }
>>>>>>> throw new Failure("Expected message not received during
>>>>>>> " + total + " milliseconds:"
>>>>>>> + "\n\t" + message);
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>>>>>>>> Looks good. Still not bullet proof, but I'm not sure it's
>>>>>>>> possible to write tests like this in a way that will work no
>>>>>>>> matter what output is produced by the method enter/exit events.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>>>>>>>> The test failure has been identified due to the "int[2]"
>>>>>>>>> being misrecognized as a compound prompt. This caused a cont
>>>>>>>>> command to be sent prematurely.
>>>>>>>>>
>>>>>>>>> The proposed fix waits for the correct prompt before
>>>>>>>>> advancing to the next command.
>>>>>>>>>
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>>>>>>>
>>>>>>>>> Testing is in progress.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>
>
>
More information about the serviceability-dev
mailing list