RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR: Cannot find boolVar with expected value: false

Wed Jun 6 18:52:27 UTC 2018

On 6/6/18, 2:20 PM, Chris Plummer wrote:
> If that were the case, shouldn't the delay be added to sendCommand() 
> instead?
>
> I'm still a bit unsure about how this sequence is suppose to work in 
> the first place:
>
>          sendCommand(command);
>          return receiveReply(startPos, compoundPromptOnly, count);
Send command "cont"
Receive the reply "main[1]"

At this point the delay allows the event handler to run in the debuggee 
process.
>
> I guess the assumption is that the first prompt will always be a 
> simple one? Is that the case? If so, then the expectation is that we 
> will see:
>
> main[1] cont
> >
> "Breakpoint hit:" <location>
> main[1]
>
> But then we can still see, even after your fix:
>
> main[1] cont
> "Breakpoint hit:" > <location>
> main[1]

I'm not clear on what you are saying in this example.
>
> However, I think maybe the issue is different for step commands, which 
> seem to result in two compound prompts. Remember your example:
>
>> Here's another recent failure with eval001
>>  1. cont and Break hit
>>  2. step and Step completed (partial reply)
>>  3. eval and error with second half of Step completed
>>
>> *====================================*
>> Sending command: cont
>>
>> reply[0]: >
>> reply[1]:
>> Breakpoint hit: "thread=main", 
>> nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
>> reply[2]:
>> 23        static void lastBreak () {}
>> reply[3]:
>>
>> reply[4]: *
>> ====================================
>>
>> *main[1]
>> Sending command: step
>>
>> reply[0]:
>> reply[1]:
>> Step completed: main[1]
>> *====================================*
>> Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
>> *reply[0]: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(), 
>> line=36 bci=53*
>> reply[1]:
>> 36            }
>> reply[2]:
>>
>> reply[3]:
>>
>> *====================================*
>> main[1]
>> # ERROR: jdb failed to report value of expression: 
>> nsk.jdb.eval.eval001.eval001a.myStaticField
>> <snip>
>> # ERROR: expected : -2147483648 ;
>> *# ERROR: reported: "thread=main", 
>> nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53*
>>
>>
>> Finally the sequence from the jdb.session file. Notice the main[1] prompt
>> embedded in the Step completed reply.
>>
>> *main[1]* cont
>> >
>> Breakpoint hit: "thread=main", 
>> nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
>> 23        static void lastBreak () {}
>>
>> *main[1] *step
>>
>> Step completed: *main[1]* "thread=main", 
>> nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
>> 36            }
>>
>> *main[1]* eval nsk.jdb.eval.eval001.eval001a.myStaticField
> The part that uses "cont" is fine. The problems started when the step 
> command was issued. Rather than receiving:
>
> Step completed: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(), 
> line=36 bci=53
>
> We got:
>
> Step completed: main[1]
> *====================================*
> Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
> reply[0]: "thread=main", nsk.jdb.eval.eval001.eval001a.runIt(), 
> line=36 bci=53
>
> I don't see your change addressing this. The step command was 
> considered to be done as soon as the main[] prompt arrived, even 
> though it came right in the middle of the full "Step completed: ..." 
> message. Your delay won't change this. Yes, it will make sure the rest 
> of the reply has arrived, but the part after main[1] will not be 
> considered part of the step reply because it will terminate at the 
> main[1] prompt.
With the proposed change to delay after receiving a reply after command, 
we would see


*====================================*
Sending command: cont

reply[0]: >
*==================================== delay after command*
Breakpoint hit: "thread=main", 
nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
23        static void lastBreak () {}*

*main[1]
*====================================*
Sending command: step

main[1]
*==================================== delay after command*
Step completed: *"thread=main", nsk.jdb.eval.eval001.eval001a.runIt(), 
line=36 bci=53*
36            }

*====================================*
Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField

main[1]
*==================================== delay after command*

...

I'll try some testing over night.
>
> Chris
>
> On 6/6/18 10:44 AM, Gary Adams wrote:
>> On 6/6/18, 1:26 PM, Chris Plummer wrote:
>>> Hi Gary,
>>>
>>> I thought the issue was having 1 command like "cont" or "step" 
>>> producing two prompts, and the first prompt ending up in the middle 
>>> of the output for the breakpoint. So instead of:
>>>
>>> main[1] cont
>>> main[1]
>>> "Breakpoint hit:" <location>
>>> main[1]
>>>
>>> We sometimes get:
>>>
>>> main[1] cont
>>> "Breakpoint hit:" main[1] <location>
>>> main[1]
>> If you look at the commands sent :
>>
>> stop at location1
>> stop at location2
>> cont
>> cont
>>
>> the expectation is that breakpoint will be encountered and then
>> the test will continue til another breakpoint is encountered.
>>
>> The problem only occurs because the commands are sent
>> so quickly a race condition is introduced between the
>> command processing and the event handling in the debuggee
>> process. Since prompts are written both from the command
>> thread and the event handler, the output can end up interspersed.
>>
>>>
>>> I don't think a delay on the receiving end like you have added can 
>>> effect this. You need to make sure the main[1] produced for the 
>>> "cont" command is in the output stream before handling the 
>>> breakpoint event. Dan's solution was a bit different in that it 
>>> actually delayed reaching the breakpoint. I don't think your 
>>> solution does the same.
>> My goal is to delay the sending of the next command,
>> because the debuggee is not actually ready for the command.
>>>
>>> Chris
>>>
>>> On 6/6/18 10:07 AM, Gary Adams wrote:
>>>> It may be sufficient when the command prompt is delivered
>>>> to introduce a brief pause to allow an asynchronous event to
>>>> be processed before another command can be sent. e.g.
>>>> throttling the stream of commands being sent
>>>>
>>>> diff --git a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java 
>>>> b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> --- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> +++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
>>>> @@ -348,7 +348,17 @@
>>>>
>>>>          int startPos = stdoutBuffer.length();
>>>>          sendCommand(command);
>>>> -        return receiveReply(startPos, compoundPromptOnly, count);
>>>> +        String[] reply = receiveReply(startPos, 
>>>> compoundPromptOnly, count);
>>>> +
>>>> +        if (asyncCommand(command)) {
>>>> +            // delay to allow debuggee time to process async command
>>>> +            try {
>>>> +                Thread.sleep(100);
>>>> +            } catch (InterruptedException e) {
>>>> +                // ignored
>>>> +            }
>>>> +        }
>>>> +        return reply;
>>>>      }
>>>>
>>>>
>>>> On 6/6/18, 11:51 AM, Daniel D. Daugherty wrote:
>>>>> The two prompts are for two different things so I don't think
>>>>> that removing one of the prompts is the right solution either.
>>>>>
>>>>> In the case of an asynchronous command like 'cont', you get
>>>>> the '<prompt=after-cont-command>' right away because the command
>>>>> processor is ready for another command. You will also get the
>>>>> '<prompt-after-breakpoint-event-location1>' prompt when the
>>>>> breakpoint is hit because that identifies the thread that posted
>>>>> the breakpoint event. If it takes a long time to hit the breakpoint,
>>>>> then it will be a long time before the breakpoint prompt is seen.
>>>>> Likewise, if the breakpoint is hit quickly, then the breakpoint
>>>>> prompt will be seen more quickly and will sometimes interfere with
>>>>> the command processor prompt. That's normal for an async command.
>>>>>
>>>>> One possible way to fix this kind of issue is to structure the
>>>>> test flow to slightly delay the target thread so it takes longer
>>>>> to get to the breakpoint. '<prompt=after-cont-command>' will have
>>>>> a better chance to be printed before (and not colliding with) the
>>>>> '<prompt-after-breakpoint-event-location1>' prompt.
>>>>>
>>>>> So if the test program has:
>>>>>
>>>>> <some line of code>;  // breakpoint here
>>>>>
>>>>> change it to something like:
>>>>>
>>>>>     try { Thread.sleep(100); } catch (InterruptedException ie) {};
>>>>> <some line of code>;  // breakpoint here
>>>>>
>>>>> Of course, I'm assuming here that the test driver code is looking
>>>>> for the breakpoint to be hit (and not for 
>>>>> '<prompt=after-cont-command>').
>>>>> Jerry and I did find a few jdb based tests looking for the wrong 
>>>>> prompt...
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>> On 6/2/18 7:02 AM, gary.adams at oracle.com wrote:
>>>>>> On 6/1/18 6:54 PM, Chris Plummer wrote:
>>>>>>> Hi Gary,
>>>>>>>
>>>>>>> I'm a bit unclear on one point. According to your description, 
>>>>>>> should we be seeing two prompts for commands like "cont", and 
>>>>>>> "step"? When you say we are expecting:
>>>>>>>
>>>>>>>   "Breakpoint hit:" <location>
>>>>>>> <prompt>
>>>>>>>
>>>>>>> Isn't it really:
>>>>>>>
>>>>>>> <prompt>
>>>>>>>   "Breakpoint hit:" <location>
>>>>>>> <prompt>
>>>>>> Let me stretch out the log a little bit
>>>>>> The <prompt> is main[1] or Thread-1[1], etc.
>>>>>> and not distinguished for command or event completion.
>>>>>>
>>>>>> <prompt>
>>>>>>    stop at <location1>
>>>>>> <prompt-after-stop-command>
>>>>>>    stop at <location2>
>>>>>> <prompt-after-stop-command>
>>>>>>    cont
>>>>>> <prompt=after-cont-command>
>>>>>>    cont
>>>>>>    "Breakpoint hit:" <location1>
>>>>>> <prompt-after-cont-command>
>>>>>>   "Breakpoint hit:"
>>>>>> <prompt-after-breakpoint-event-location1>
>>>>>> <location2>
>>>>>> <prompt-after-breakpoint-event-location2>
>>>>>>
>>>>>> The source of the command or event prompts
>>>>>> was verified with a stacktrace in the printPrompt()
>>>>>> method.
>>>>>>>
>>>>>>> I figure this must be the case, because your fix removes the 
>>>>>>> printing of a prompt, and we still need one prompt, so there 
>>>>>>> must have been two to start with. I think in the above the last 
>>>>>>> <prompt> is being produced sequentially right after the 
>>>>>>> "Breakpoint hit:" output, so is never an issue. The first 
>>>>>>> <prompt> comes from the printPrompt() call after issuing the 
>>>>>>> "cont" command, and it's possible to hit the breakpoint and 
>>>>>>> start producing more output before printPrompt() has completed, 
>>>>>>> thus causing the problem you are seeing.
>>>>>> printPrompt() is a very quick operation and is not being 
>>>>>> interrupted.
>>>>>>
>>>>>> But the output from the event handler is produced incrementally.
>>>>>>    - event header, e.g. "Breakpont hit:", "Step completed:", etc.
>>>>>>    - from vmInterrupted()
>>>>>>      - printCurrentLocation()
>>>>>>      - monitor command output (if any)
>>>>>>      - printPrompt()
>>>>>>>
>>>>>>> There seems to be some purpose to printing this first prompt and 
>>>>>>> I'm not so sure eliminating it is the right thing to do. I think 
>>>>>>> it might be there to indicate that the command has been 
>>>>>>> initiated, because there could be some delay before the 
>>>>>>> breakpoint is hit. So the issue then is how to make sure this 
>>>>>>> prompt appears in the input stream before any of the "Step 
>>>>>>> completed:" or "Breakpoint hit:" output. Possibly event 
>>>>>>> processing should be blocked until executeCommand() completes.
>>>>>> The primary purposes of prompts is for "user interactive" nudge 
>>>>>> to type another command.
>>>>>> But the failures are from tests that are using the presence of a 
>>>>>> prompt as an ACK of the
>>>>>> last command. This "prompt-protocol", does not include any 
>>>>>> sequence numbers ,
>>>>>> does not pair prompt to command or event, and does not block during
>>>>>> any multipart message formation.
>>>>>>
>>>>>> To prevent the error we are observing we would have to buffer all 
>>>>>> of the event output
>>>>>>
>>>>>>    "Breakpoint hit:" <location> <monitored-commands>
>>>>>> <prompt for breakpoint event>
>>>>>>
>>>>>> Another way to look at the current failure is to say the event 
>>>>>> handler output
>>>>>> has been interrupted by a command sent prematurely. This would never
>>>>>> be seen at user speeds typing in commands.
>>>>>>
>>>>>> I should also mention any other output from the user application 
>>>>>> can also
>>>>>> interfere with the command and event output streams. I have seen
>>>>>> stray prompts in the logs from System.out.println(). That was the
>>>>>> reason earlier I was looking at changing the tests to look for
>>>>>> specific prompts before continuing to the next command.
>>>>>>
>>>>>> \gra
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 5/26/18 3:50 AM, gary.adams at oracle.com wrote:
>>>>>>>> This is a review request for a previously closed test bug.
>>>>>>>> The test was recently moved to the open repos, and the
>>>>>>>> proposed fix is in the open code.
>>>>>>>>
>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8169718/webrev/
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Forwarded Message --------
>>>>>>>> Subject:     RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR: 
>>>>>>>> Cannot find boolVar with expected value: false
>>>>>>>> Date:     Fri, 25 May 2018 11:35:10 -0400
>>>>>>>> From:     Gary Adams <gary.adams at oracle.com>
>>>>>>>> Reply-To: gary.adams at oracle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The jdb tests use stdin to send commands to a jdb process
>>>>>>>> and parses the stdout to determine if a command was
>>>>>>>> successful and when the process is prompting for new commands
>>>>>>>> to be sent.
>>>>>>>>
>>>>>>>> Some commands are synchronous, so when the command is completed
>>>>>>>> a new prompt is sent back immediately.
>>>>>>>>
>>>>>>>> Some commands are asynchronous, so there could be a delay
>>>>>>>> until a breakpoint is reached. The event handler then sends a 
>>>>>>>> prompt
>>>>>>>> when the application thread is stopped and new jdb commands can 
>>>>>>>> be sent.
>>>>>>>>
>>>>>>>> The problem causing the intermittent failures was a corruption 
>>>>>>>> in the
>>>>>>>> output stream when prompts were being sent at the wrong times.
>>>>>>>>
>>>>>>>> Instead of receiving
>>>>>>>>    "Breakpoint hit:" <location>
>>>>>>>> <prompt>
>>>>>>>>
>>>>>>>> the log contained
>>>>>>>>    "Breakpoint hit:" <prompt> <location>
>>>>>>>>
>>>>>>>> Once out of sync, jdb commands were being sent prematurely
>>>>>>>> and the wrong values were being compared against expected 
>>>>>>>> behavior.
>>>>>>>> The simple fix proposed here recognizes that commands like "cont",
>>>>>>>> "step" and "next" are asynchronous commands and should not send 
>>>>>>>> back
>>>>>>>> a prompt immediately. Instead. the event handler will deliver 
>>>>>>>> the next prompt
>>>>>>>> when the next "Breakpoint hit:" or "Step completed:" state 
>>>>>>>> change occurs.
>>>>>>>>
>>>>>>>> The bulk of the testing was done on windows-x64-debug builds 
>>>>>>>> where the
>>>>>>>> intermittent failures were observed in ~5 in 1000 testruns. The 
>>>>>>>> fix has
>>>>>>>> also been tested on linux-x64-debug, solaris-sparcv9-debug,
>>>>>>>> and macosx-x64-debug, even though the failures have never been 
>>>>>>>> reported
>>>>>>>> against those platforms.
>>>>>>>>
>>>>>>>> Failures have been observed in many of the nsk/jdb tests with 
>>>>>>>> similar corrupted
>>>>>>>> output streams, but never directly associated with this issue 
>>>>>>>> before.
>>>>>>>>
>>>>>>>>   redefine001, caught_exception002, locals002, eval001, next001,
>>>>>>>>   stop_at003, step002, print002, trace001, step_up001, read001,
>>>>>>>>   clear004, kill001, set001
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180606/e1d1f290/attachment-0001.html>