RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR: Cannot find boolVar with expected value: false

Wed Jun 6 17:06:35 UTC 2018

On 6/6/18 8:51 AM, Daniel D. Daugherty wrote:
> The two prompts are for two different things so I don't think
> that removing one of the prompts is the right solution either.
>
> In the case of an asynchronous command like 'cont', you get
> the '<prompt=after-cont-command>' right away because the command
> processor is ready for another command. You will also get the
> '<prompt-after-breakpoint-event-location1>' prompt when the
> breakpoint is hit because that identifies the thread that posted
> the breakpoint event. If it takes a long time to hit the breakpoint,
> then it will be a long time before the breakpoint prompt is seen.
> Likewise, if the breakpoint is hit quickly, then the breakpoint
> prompt will be seen more quickly and will sometimes interfere with
> the command processor prompt. That's normal for an async command.
>
That was the point I was trying to make. I also felt that printing both 
prompts was appropriate.
> One possible way to fix this kind of issue is to structure the
> test flow to slightly delay the target thread so it takes longer
> to get to the breakpoint. '<prompt=after-cont-command>' will have
> a better chance to be printed before (and not colliding with) the
> '<prompt-after-breakpoint-event-location1>' prompt.
>
> So if the test program has:
>
>     <some line of code>;  // breakpoint here
>
> change it to something like:
>
>     try { Thread.sleep(100); } catch (InterruptedException ie) {};
>     <some line of code>;  // breakpoint here
>
> Of course, I'm assuming here that the test driver code is looking
> for the breakpoint to be hit (and not for '<prompt=after-cont-command>').
> Jerry and I did find a few jdb based tests looking for the wrong 
> prompt...
The issue is really with the delivery of the first prompt. It can be 
intermingled with the output from the breakpoint. Yes, your solution 
might work most of the time (still not 100% guaranteed) when dealing 
with breakpoints, but is not a solution when dealing with this same 
problem occurring while single stepping.

My suggestion was to have processing of breakpoints and single stepping 
events block until executeCommand() completed and had printed the first 
prompt. I'm not sure if this is feasible or practical.

thanks,

Chris
>
> Dan
>
>
> On 6/2/18 7:02 AM, gary.adams at oracle.com wrote:
>> On 6/1/18 6:54 PM, Chris Plummer wrote:
>>> Hi Gary,
>>>
>>> I'm a bit unclear on one point. According to your description, 
>>> should we be seeing two prompts for commands like "cont", and 
>>> "step"? When you say we are expecting:
>>>
>>>   "Breakpoint hit:" <location>
>>>    <prompt>
>>>
>>> Isn't it really:
>>>
>>>    <prompt>
>>>   "Breakpoint hit:" <location>
>>>    <prompt>
>> Let me stretch out the log a little bit
>> The <prompt> is main[1] or Thread-1[1], etc.
>> and not distinguished for command or event completion.
>>
>> <prompt>
>>    stop at <location1>
>> <prompt-after-stop-command>
>>    stop at <location2>
>> <prompt-after-stop-command>
>>    cont
>> <prompt=after-cont-command>
>>    cont
>>    "Breakpoint hit:" <location1>
>> <prompt-after-cont-command>
>>   "Breakpoint hit:"
>> <prompt-after-breakpoint-event-location1>
>>   <location2>
>> <prompt-after-breakpoint-event-location2>
>>
>> The source of the command or event prompts
>> was verified with a stacktrace in the printPrompt()
>> method.
>>>
>>> I figure this must be the case, because your fix removes the 
>>> printing of a prompt, and we still need one prompt, so there must 
>>> have been two to start with. I think in the above the last <prompt> 
>>> is being produced sequentially right after the "Breakpoint hit:" 
>>> output, so is never an issue. The first <prompt> comes from the 
>>> printPrompt() call after issuing the "cont" command, and it's 
>>> possible to hit the breakpoint and start producing more output 
>>> before printPrompt() has completed, thus causing the problem you are 
>>> seeing.
>> printPrompt() is a very quick operation and is not being interrupted.
>>
>> But the output from the event handler is produced incrementally.
>>    - event header, e.g. "Breakpont hit:", "Step completed:", etc.
>>    - from vmInterrupted()
>>      - printCurrentLocation()
>>      - monitor command output (if any)
>>      - printPrompt()
>>>
>>> There seems to be some purpose to printing this first prompt and I'm 
>>> not so sure eliminating it is the right thing to do. I think it 
>>> might be there to indicate that the command has been initiated, 
>>> because there could be some delay before the breakpoint is hit. So 
>>> the issue then is how to make sure this prompt appears in the input 
>>> stream before any of the "Step completed:" or "Breakpoint hit:" 
>>> output. Possibly event processing should be blocked until 
>>> executeCommand() completes.
>> The primary purposes of prompts is for "user interactive" nudge to 
>> type another command.
>> But the failures are from tests that are using the presence of a 
>> prompt as an ACK of the
>> last command. This "prompt-protocol", does not include any sequence 
>> numbers ,
>> does not pair prompt to command or event, and does not block during
>> any multipart message formation.
>>
>> To prevent the error we are observing we would have to buffer all of 
>> the event output
>>
>>    "Breakpoint hit:" <location> <monitored-commands>
>>     <prompt for breakpoint event>
>>
>> Another way to look at the current failure is to say the event 
>> handler output
>> has been interrupted by a command sent prematurely. This would never
>> be seen at user speeds typing in commands.
>>
>> I should also mention any other output from the user application can 
>> also
>> interfere with the command and event output streams. I have seen
>> stray prompts in the logs from System.out.println(). That was the
>> reason earlier I was looking at changing the tests to look for
>> specific prompts before continuing to the next command.
>>
>> \gra
>>> thanks,
>>>
>>> Chris
>>>
>>> On 5/26/18 3:50 AM, gary.adams at oracle.com wrote:
>>>> This is a review request for a previously closed test bug.
>>>> The test was recently moved to the open repos, and the
>>>> proposed fix is in the open code.
>>>>
>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8169718/webrev/
>>>>
>>>>
>>>> -------- Forwarded Message --------
>>>> Subject:     RFR: JDK-8169718: nsk/jdb/locals/locals002: ERROR: 
>>>> Cannot find boolVar with expected value: false
>>>> Date:     Fri, 25 May 2018 11:35:10 -0400
>>>> From:     Gary Adams <gary.adams at oracle.com>
>>>> Reply-To:     gary.adams at oracle.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The jdb tests use stdin to send commands to a jdb process
>>>> and parses the stdout to determine if a command was
>>>> successful and when the process is prompting for new commands
>>>> to be sent.
>>>>
>>>> Some commands are synchronous, so when the command is completed
>>>> a new prompt is sent back immediately.
>>>>
>>>> Some commands are asynchronous, so there could be a delay
>>>> until a breakpoint is reached. The event handler then sends a prompt
>>>> when the application thread is stopped and new jdb commands can be 
>>>> sent.
>>>>
>>>> The problem causing the intermittent failures was a corruption in the
>>>> output stream when prompts were being sent at the wrong times.
>>>>
>>>> Instead of receiving
>>>>    "Breakpoint hit:" <location>
>>>>     <prompt>
>>>>
>>>> the log contained
>>>>    "Breakpoint hit:" <prompt> <location>
>>>>
>>>> Once out of sync, jdb commands were being sent prematurely
>>>> and the wrong values were being compared against expected behavior.
>>>> The simple fix proposed here recognizes that commands like "cont",
>>>> "step" and "next" are asynchronous commands and should not send back
>>>> a prompt immediately. Instead. the event handler will deliver the 
>>>> next prompt
>>>> when the next "Breakpoint hit:" or "Step completed:" state change 
>>>> occurs.
>>>>
>>>> The bulk of the testing was done on windows-x64-debug builds where the
>>>> intermittent failures were observed in ~5 in 1000 testruns. The fix 
>>>> has
>>>> also been tested on linux-x64-debug, solaris-sparcv9-debug,
>>>> and macosx-x64-debug, even though the failures have never been 
>>>> reported
>>>> against those platforms.
>>>>
>>>> Failures have been observed in many of the nsk/jdb tests with 
>>>> similar corrupted
>>>> output streams, but never directly associated with this issue before.
>>>>
>>>>   redefine001, caught_exception002, locals002, eval001, next001,
>>>>   stop_at003, step002, print002, trace001, step_up001, read001,
>>>>   clear004, kill001, set001
>>>>
>>>>
>>>
>>>
>>
>