RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR: Cannot find boolVar with expected value: false
Gary Adams
gary.adams at oracle.com
Wed Jun 6 18:52:27 UTC 2018
On 6/6/18, 2:20 PM, Chris Plummer wrote:
> If that were the case, shouldn't the delay be added to sendCommand()
> instead?
>
> I'm still a bit unsure about how this sequence is suppose to work in
> the first place:
>
> sendCommand(command);
> return receiveReply(startPos, compoundPromptOnly, count);
Send command "cont"
Receive the reply "main[1]"
At this point the delay allows the event handler to run in the debuggee
process.
>
> I guess the assumption is that the first prompt will always be a
> simple one? Is that the case? If so, then the expectation is that we
> will see:
>
> main[1] cont
> >
> "Breakpoint hit:" <location>
> main[1]
>
> But then we can still see, even after your fix:
>
> main[1] cont
> "Breakpoint hit:" > <location>
> main[1]
I'm not clear on what you are saying in this example.
>
> However, I think maybe the issue is different for step commands, which
> seem to result in two compound prompts. Remember your example:
>
>> Here's another recent failure with eval001
>> 1. cont and Break hit
>> 2. step and Step completed (partial reply)
>> 3. eval and error with second half of Step completed
>>
>> *====================================*
>> Sending command: cont
>>
>> reply[0]: >
>> reply[1]:
>> Breakpoint hit: "thread=main",
>> nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
>> reply[2]:
>> 23 static void lastBreak () {}
>> reply[3]:
>>
>> reply[4]: *
>> ====================================
>>
>> *main[1]
>> Sending command: step
>>
>> reply[0]:
>> reply[1]:
>> Step completed: main[1]
>> *====================================*
>> Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
>> *reply[0]: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(),
>> line=36 bci=53*
>> reply[1]:
>> 36 }
>> reply[2]:
>>
>> reply[3]:
>>
>> *====================================*
>> main[1]
>> # ERROR: jdb failed to report value of expression:
>> nsk.jdb.eval.eval001.eval001a.myStaticField
>> <snip>
>> # ERROR: expected : -2147483648 ;
>> *# ERROR: reported: "thread=main",
>> nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53*
>>
>>
>> Finally the sequence from the jdb.session file. Notice the main[1] prompt
>> embedded in the Step completed reply.
>>
>> *main[1]* cont
>> >
>> Breakpoint hit: "thread=main",
>> nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
>> 23 static void lastBreak () {}
>>
>> *main[1] *step
>>
>> Step completed: *main[1]* "thread=main",
>> nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
>> 36 }
>>
>> *main[1]* eval nsk.jdb.eval.eval001.eval001a.myStaticField
> The part that uses "cont" is fine. The problems started when the step
> command was issued. Rather than receiving:
>
> Step completed: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(),
> line=36 bci=53
>
> We got:
>
> Step completed: main[1]
> *====================================*
> Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
> reply[0]: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(),
> line=36 bci=53
>
> I don't see your change addressing this. The step command was
> considered to be done as soon as the main[] prompt arrived, even
> though it came right in the middle of the full "Step completed: ..."
> message. Your delay won't change this. Yes, it will make sure the rest
> of the reply has arrived, but the part after main[1] will not be
> considered part of the step reply because it will terminate at the
> main[1] prompt.
With the proposed change to delay after receiving a reply after command,
we would see
*====================================*
Sending command: cont
reply[0]: >
*==================================== delay after command*
Breakpoint hit: "thread=main",
nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
23 static void lastBreak () {}*
*main[1]
*====================================*
Sending command: step
main[1]
*==================================== delay after command*
Step completed: *"thread=main", nsk.jdb.eval.eval001.eval001a.runIt(),
line=36 bci=53*
36 }
*====================================*
Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
main[1]
*==================================== delay after command*
...
I'll try some testing over night.
>
> Chris
>
> On 6/6/18 10:44 AM, Gary Adams wrote:
>> On 6/6/18, 1:26 PM, Chris Plummer wrote:
>>> Hi Gary,
>>>
>>> I thought the issue was having 1 command like "cont" or "step"
>>> producing two prompts, and the first prompt ending up in the middle
>>> of the output for the breakpoint. So instead of:
>>>
>>> main[1] cont
>>> main[1]
>>> "Breakpoint hit:" <location>
>>> main[1]
>>>
>>> We sometimes get:
>>>
>>> main[1] cont
>>> "Breakpoint hit:" main[1] <location>
>>> main[1]
>> If you look at the commands sent :
>>
>> stop at location1
>> stop at location2
>> cont
>> cont
>>
>> the expectation is that breakpoint will be encountered and then
>> the test will continue til another breakpoint is encountered.
>>
>> The problem only occurs because the commands are sent
>> so quickly a race condition is introduced between the
>> command processing and the event handling in the debuggee
>> process. Since prompts are written both from the command
>> thread and the event handler, the output can end up interspersed.
>>
>>>
>>> I don't think a delay on the receiving end like you have added can
>>> effect this. You need to make sure the main[1] produced for the
>>> "cont" command is in the output stream before handling the
>>> breakpoint event. Dan's solution was a bit different in that it
>>> actually delayed reaching the breakpoint. I don't think your
>>> solution does the same.
>> My goal is to delay the sending of the next command,
>> because the debuggee is not actually ready for the command.
>>>
>>> Chris
>>>
>>> On 6/6/18 10:07 AM, Gary Adams wrote:
>>>> It may be sufficient when the command prompt is delivered
>>>> to introduce a brief pause to allow an asynchronous event to
>>>> be processed before another command can be sent. e.g.
>>>> throttling the stream of commands being sent
>>>>
>>>> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> @@ -348,7 +348,17 @@
>>>>
>>>> int startPos = stdoutBuffer.length();
>>>> sendCommand(command);
>>>> - return receiveReply(startPos, compoundPromptOnly, count);
>>>> + String[] reply = receiveReply(startPos,
>>>> compoundPromptOnly, count);
>>>> +
>>>> + if (asyncCommand(command)) {
>>>> + // delay to allow debuggee time to process async command
>>>> + try {
>>>> + Thread.sleep(100);
>>>> + } catch (InterruptedException e) {
>>>> + // ignored
>>>> + }
>>>> + }
>>>> + return reply;
>>>> }
>>>>
>>>>
>>>> On 6/6/18, 11:51 AM, Daniel D. Daugherty wrote:
>>>>> The two prompts are for two different things so I don't think
>>>>> that removing one of the prompts is the right solution either.
>>>>>
>>>>> In the case of an asynchronous command like 'cont', you get
>>>>> the '<prompt=after-cont-command>' right away because the command
>>>>> processor is ready for another command. You will also get the
>>>>> '<prompt-after-breakpoint-event-location1>' prompt when the
>>>>> breakpoint is hit because that identifies the thread that posted
>>>>> the breakpoint event. If it takes a long time to hit the breakpoint,
>>>>> then it will be a long time before the breakpoint prompt is seen.
>>>>> Likewise, if the breakpoint is hit quickly, then the breakpoint
>>>>> prompt will be seen more quickly and will sometimes interfere with
>>>>> the command processor prompt. That's normal for an async command.
>>>>>
>>>>> One possible way to fix this kind of issue is to structure the
>>>>> test flow to slightly delay the target thread so it takes longer
>>>>> to get to the breakpoint. '<prompt=after-cont-command>' will have
>>>>> a better chance to be printed before (and not colliding with) the
>>>>> '<prompt-after-breakpoint-event-location1>' prompt.
>>>>>
>>>>> So if the test program has:
>>>>>
>>>>> <some line of code>; // breakpoint here
>>>>>
>>>>> change it to something like:
>>>>>
>>>>> try { Thread.sleep(100); } catch (InterruptedException ie) {};
>>>>> <some line of code>; // breakpoint here
>>>>>
>>>>> Of course, I'm assuming here that the test driver code is looking
>>>>> for the breakpoint to be hit (and not for
>>>>> '<prompt=after-cont-command>').
>>>>> Jerry and I did find a few jdb based tests looking for the wrong
>>>>> prompt...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 6/2/18 7:02 AM, gary.adams at oracle.com wrote:
>>>>>> On 6/1/18 6:54 PM, Chris Plummer wrote:
>>>>>>> Hi Gary,
>>>>>>>
>>>>>>> I'm a bit unclear on one point. According to your description,
>>>>>>> should we be seeing two prompts for commands like "cont", and
>>>>>>> "step"? When you say we are expecting:
>>>>>>>
>>>>>>> "Breakpoint hit:" <location>
>>>>>>> <prompt>
>>>>>>>
>>>>>>> Isn't it really:
>>>>>>>
>>>>>>> <prompt>
>>>>>>> "Breakpoint hit:" <location>
>>>>>>> <prompt>
>>>>>> Let me stretch out the log a little bit
>>>>>> The <prompt> is main[1] or Thread-1[1], etc.
>>>>>> and not distinguished for command or event completion.
>>>>>>
>>>>>> <prompt>
>>>>>> stop at <location1>
>>>>>> <prompt-after-stop-command>
>>>>>> stop at <location2>
>>>>>> <prompt-after-stop-command>
>>>>>> cont
>>>>>> <prompt=after-cont-command>
>>>>>> cont
>>>>>> "Breakpoint hit:" <location1>
>>>>>> <prompt-after-cont-command>
>>>>>> "Breakpoint hit:"
>>>>>> <prompt-after-breakpoint-event-location1>
>>>>>> <location2>
>>>>>> <prompt-after-breakpoint-event-location2>
>>>>>>
>>>>>> The source of the command or event prompts
>>>>>> was verified with a stacktrace in the printPrompt()
>>>>>> method.
>>>>>>>
>>>>>>> I figure this must be the case, because your fix removes the
>>>>>>> printing of a prompt, and we still need one prompt, so there
>>>>>>> must have been two to start with. I think in the above the last
>>>>>>> <prompt> is being produced sequentially right after the
>>>>>>> "Breakpoint hit:" output, so is never an issue. The first
>>>>>>> <prompt> comes from the printPrompt() call after issuing the
>>>>>>> "cont" command, and it's possible to hit the breakpoint and
>>>>>>> start producing more output before printPrompt() has completed,
>>>>>>> thus causing the problem you are seeing.
>>>>>> printPrompt() is a very quick operation and is not being
>>>>>> interrupted.
>>>>>>
>>>>>> But the output from the event handler is produced incrementally.
>>>>>> - event header, e.g. "Breakpont hit:", "Step completed:", etc.
>>>>>> - from vmInterrupted()
>>>>>> - printCurrentLocation()
>>>>>> - monitor command output (if any)
>>>>>> - printPrompt()
>>>>>>>
>>>>>>> There seems to be some purpose to printing this first prompt and
>>>>>>> I'm not so sure eliminating it is the right thing to do. I think
>>>>>>> it might be there to indicate that the command has been
>>>>>>> initiated, because there could be some delay before the
>>>>>>> breakpoint is hit. So the issue then is how to make sure this
>>>>>>> prompt appears in the input stream before any of the "Step
>>>>>>> completed:" or "Breakpoint hit:" output. Possibly event
>>>>>>> processing should be blocked until executeCommand() completes.
>>>>>> The primary purposes of prompts is for "user interactive" nudge
>>>>>> to type another command.
>>>>>> But the failures are from tests that are using the presence of a
>>>>>> prompt as an ACK of the
>>>>>> last command. This "prompt-protocol", does not include any
>>>>>> sequence numbers ,
>>>>>> does not pair prompt to command or event, and does not block during
>>>>>> any multipart message formation.
>>>>>>
>>>>>> To prevent the error we are observing we would have to buffer all
>>>>>> of the event output
>>>>>>
>>>>>> "Breakpoint hit:" <location> <monitored-commands>
>>>>>> <prompt for breakpoint event>
>>>>>>
>>>>>> Another way to look at the current failure is to say the event
>>>>>> handler output
>>>>>> has been interrupted by a command sent prematurely. This would never
>>>>>> be seen at user speeds typing in commands.
>>>>>>
>>>>>> I should also mention any other output from the user application
>>>>>> can also
>>>>>> interfere with the command and event output streams. I have seen
>>>>>> stray prompts in the logs from System.out.println(). That was the
>>>>>> reason earlier I was looking at changing the tests to look for
>>>>>> specific prompts before continuing to the next command.
>>>>>>
>>>>>> \gra
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 5/26/18 3:50 AM, gary.adams at oracle.com wrote:
>>>>>>>> This is a review request for a previously closed test bug.
>>>>>>>> The test was recently moved to the open repos, and the
>>>>>>>> proposed fix is in the open code.
>>>>>>>>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8169718/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Forwarded Message --------
>>>>>>>> Subject: RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR:
>>>>>>>> Cannot find boolVar with expected value: false
>>>>>>>> Date: Fri, 25 May 2018 11:35:10 -0400
>>>>>>>> From: Gary Adams <gary.adams at oracle.com>
>>>>>>>> Reply-To: gary.adams at oracle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The jdb tests use stdin to send commands to a jdb process
>>>>>>>> and parses the stdout to determine if a command was
>>>>>>>> successful and when the process is prompting for new commands
>>>>>>>> to be sent.
>>>>>>>>
>>>>>>>> Some commands are synchronous, so when the command is completed
>>>>>>>> a new prompt is sent back immediately.
>>>>>>>>
>>>>>>>> Some commands are asynchronous, so there could be a delay
>>>>>>>> until a breakpoint is reached. The event handler then sends a
>>>>>>>> prompt
>>>>>>>> when the application thread is stopped and new jdb commands can
>>>>>>>> be sent.
>>>>>>>>
>>>>>>>> The problem causing the intermittent failures was a corruption
>>>>>>>> in the
>>>>>>>> output stream when prompts were being sent at the wrong times.
>>>>>>>>
>>>>>>>> Instead of receiving
>>>>>>>> "Breakpoint hit:" <location>
>>>>>>>> <prompt>
>>>>>>>>
>>>>>>>> the log contained
>>>>>>>> "Breakpoint hit:" <prompt> <location>
>>>>>>>>
>>>>>>>> Once out of sync, jdb commands were being sent prematurely
>>>>>>>> and the wrong values were being compared against expected
>>>>>>>> behavior.
>>>>>>>> The simple fix proposed here recognizes that commands like "cont",
>>>>>>>> "step" and "next" are asynchronous commands and should not send
>>>>>>>> back
>>>>>>>> a prompt immediately. Instead. the event handler will deliver
>>>>>>>> the next prompt
>>>>>>>> when the next "Breakpoint hit:" or "Step completed:" state
>>>>>>>> change occurs.
>>>>>>>>
>>>>>>>> The bulk of the testing was done on windows-x64-debug builds
>>>>>>>> where the
>>>>>>>> intermittent failures were observed in ~5 in 1000 testruns. The
>>>>>>>> fix has
>>>>>>>> also been tested on linux-x64-debug, solaris-sparcv9-debug,
>>>>>>>> and macosx-x64-debug, even though the failures have never been
>>>>>>>> reported
>>>>>>>> against those platforms.
>>>>>>>>
>>>>>>>> Failures have been observed in many of the nsk/jdb tests with
>>>>>>>> similar corrupted
>>>>>>>> output streams, but never directly associated with this issue
>>>>>>>> before.
>>>>>>>>
>>>>>>>> redefine001, caught_exception002, locals002, eval001, next001,
>>>>>>>> stop_at003, step002, print002, trace001, step_up001, read001,
>>>>>>>> clear004, kill001, set001
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180606/e1d1f290/attachment-0001.html>
More information about the serviceability-dev
mailing list