RFR: JDK-8208473: [TESTBUG] nsk/jdb/exclude/exclude001/exclude001.java is timing out on solaris-sparc again
Chris Plummer
chris.plummer at oracle.com
Thu Sep 27 19:12:12 UTC 2018
The extra check after timing out doesn't seem like it should help.
You've already called findMessage() 2100 times at 200ms intervals. Why
would one more call after that help? I think it might be the
receiveReply() call that is fixing it. It does a waitForPrompt(), so
this probably gives us another 420000 ms for the prompt to come in. This
call to receiveReply() is actually a bug itself since we are doing it
just to print the current buffer, not the buffer after waiting for a
prompt to come in.
In any case, looks like this prompt is taking more than 420200
milliseconds to come in, but does eventually come in, and extra waiting
in receiveReply() is what is causing you to eventually see the prompt. I
think bumping up the timeout to 600 and the waittime to 10 is the proper
fix here.
And to address the receiveReply() issue, I'd suggest calling it using
receiveReply(startPos, false, 0), where 0 is the prompt count, and have
receiveReply() not wait for a prompt when the count is 0.
Chris
On 9/27/18 11:44 AM, Gary Adams wrote:
> Speaking of not being bullet proof, during testing of the fix to
> wait for a specific prompt an intermittent failure was observed.
> ...
>
> Sending command: trace methods 0x2a9
> reply[0]: MyThread-0[1]
> Sending command: cont
> WARNING: message not recieved: MyThread-0[1]
> Remaining debugger output follows:
> reply[0]:>
> reply[1]: Method exited: return value =<void value>,
> "thread=MyThread-0", nsk.jdb.exclude.exclude001.MyThread.run(),
> line=93 bci=14
> reply[2]: 93 }
> reply[3]:
> reply[4]: MyThread-0[1]
> # ERROR: Caught unexpected exception while executing the test:
> nsk.share.Failure: Expected message not received during 420200
> milliseconds:
> ...
>
> The wait for message times out looking for "MyThread-0[1]".
> A WARNING is printed and the "remaining debugger output"
> shows that "MyThread-0[1]" is in the buffer.
>
> I'm still investigating why the message match is not found.
>
> Adding a final check before failing the wait for message
> seems to workaround the problem.
>
> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
> @@ -515,10 +515,11 @@
> long delta = 200; // time in milliseconds to wait at every
> iteration.
> long total = 0; // total time has waited.
> long max =
> getLauncher().getJdbArgumentHandler().getWaitTime() * 60 * 1000; //
> maximum time to wait.
> + int found = 0;
>
> Object dummy = new Object();
> while ((total += delta) <= max) {
> - int found = 0;
> + found = 0;
>
> // search for message
> {
> @@ -553,6 +554,12 @@
> log.display("WARNING: message not recieved: " + message);
> log.display("Remaining debugger output follows:");
> receiveReply(startPos);
> +
> + // One last chance
> + found = findMessage(startPos, message);
> + if (found > 0) {
> + return found;
> + }
> throw new Failure("Expected message not received during " +
> total + " milliseconds:"
> + "\n\t" + message);
> }
>
>
> On 9/20/18, 5:47 PM, Chris Plummer wrote:
>> Looks good. Still not bullet proof, but I'm not sure it's possible to
>> write tests like this in a way that will work no matter what output
>> is produced by the method enter/exit events.
>>
>> Chris
>>
>> On 9/20/18 10:59 AM, Gary Adams wrote:
>>> The test failure has been identified due to the "int[2]"
>>> being misrecognized as a compound prompt. This caused a cont
>>> command to be sent prematurely.
>>>
>>> The proposed fix waits for the correct prompt before
>>> advancing to the next command.
>>>
>>> Webrev: http://cr.openjdk.java.net/~gadams/8208473/webrev/
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8208473
>>>
>>> Testing is in progress.
>>
>>
>>
>
More information about the serviceability-dev
mailing list