RFR JDK-8170089: nsk/jdi/EventSet/resume/resume008: ERROR: suspendCounts don't match for : Common-Cleaner
Chris Plummer
chris.plummer at oracle.com
Wed Jul 18 20:47:09 UTC 2018
Hi Gary
Ok, so shouldRunAfterBreakpoint() is the code that does the
eventHandler.wait(), so it gets the eventHandler.notifyAll()
notification from the BreakpointEvent handler.
And as a side note, I see now that resumption of execution after the
breakpoint at main() is done by:
// after waitForClassPrepared() main debuggee thread is
suspended, resume it before test start
display("RESUME DEBUGGEE VM");
vm.resume();
testRun();
shouldRunAfterBreakpoint() is returning true until the end of the test
when the debuggee is executes "instruction = end". That's why runTests()
does a "break" when shouldRunAfterBreakpoint() returns false. So this
means the code that is checking shouldRunAfterBreakpoint() is not
resuming execution for the first few (probably 3)
methodForCommunication() breakpoints. However, it does make sure that
runTests() blocks until the BreakPointEvent has been processed.
You point out the vm.resume() at the bottom of the loop in runTests(),
but that's only after a bunch of ThreadStartEvent processing above it
has been done already. The ThreadStartEvent would never get generated if
there was not a resume some point earlier. I think it is happening
during the eventHandler.waitForRequestedEventSet() call, which does a
vm.resume().
So if I understand the order of things now:
-shouldRunAfterBreakpoint() returns after first methodForCommunication()
is hit. At this point we know the first thread has been created, but no
attempt to start it yet. The debuggee is suspended at this point.
-runTests() requests ThreadStartEvents with SUSPEND_NONE. This also does
a vm.resume().
-The debuggee starts the thread and then does another
methodForCommunication() (this 2nd one is actually after the 2nd thread
has been created, but not yet started). Now we have a race. Do we get
the ThreadStartEvent first or the BreakpointEvent. This is because when
the ThreadStartEvent is generated, the thread is not suspended due to
SUSPEND_NONE. Even if the ThreadStartEvent comes in first, the async
handling of the BreakpointEvent can cause problems during the
ThreadStartEvent processing.
-You added a 100ms delay after the thread has started, but before
methodForCommunication(), hoping it will make it so the ThreadStartEvent
can be received and fully processed before the BreakpointEvent is.
I think it would be preferable to fix this by doing better
sychronization. After all, that is the approach the test originally
took. It could have been written with a bunch of sleep() delays instead,
but that in general is not a very good approach.
What if you added a shouldRunAfterBreakpoint() call after getting the
ThreadStartEvent arrives. At this point you would know that the vm is
suspended due to the breakpoint, so no need for:
display("......checking up on EventSet.resume()");
display("......--> vm.suspend();");
vm.suspend();
You might then also need to add another methodForCommunication() call at
the end of case 0 and 1 in the debuggee, although I think you could
instead just change the shouldRunAfterBreakpoint() at the start of the
loop. I think that check actually belongs at the end of the loop, and
only for case 2. In fact it would be an error if
shouldRunAfterBreakpoint() did not return true in that case. Then you
also need to add a shouldRunAfterBreakpoint() at the start of case 0 to
get things rolling (and I think at the start of case 1 also).
Chris
On 7/18/18 12:45 PM, Gary Adams wrote:
> Answers below ...
>
> On 7/18/18, 2:50 PM, Chris Plummer wrote:
>> Hi Gary,
>>
>> Who does the resume for the breakpoint event?
>>
>> eventHandler.addListener(
>> new EventHandler.EventListener() {
>> public boolean eventReceived(Event event) {
>> if (event instanceof BreakpointEvent &&
>> bpRequest.equals(event.request())) {
>> synchronized(eventHandler) {
>> display("Received communication
>> breakpoint event.");
>> bpCount++;
>> eventHandler.notifyAll();
>> }
>> return true;
>> }
>> return false;
>> }
>> }
>> );
> I believe you are looking for this sequence.
> At the top of the loop a check is made if
> resume() should be called "shouldRunAfterBreakpoint".
> lines 96-99 is an early termination. And at the
> bottom of the loop, line 240, is the normal
> continue the test to the next case.
>
> resume008.java :
> ...
> 94 for (int i = 0; ; i++) {
> 95
>
> 96 if (!shouldRunAfterBreakpoint()) {
> 97 vm.resume();
> 98 break;
> 99 }
>
> 100
> 101
> 102 display(":::::: case: # " + i);
> 103
> 104 switch (i) {
> 105
> 106 case 0:
> 107 eventRequest = settingThreadStartRequest (
> 108 SUSPEND_NONE,
> "ThreadStartRequest1");
> ...
> 238
> 239 display("......--> vm.resume()");
> 240 vm.resume();
> 241 }
>>
>> Also:
>>
>>> 1. On a thread start event the debugee is suspended, line 141
>> That's not true for the first ThreadStartEvent since SUSPEND_NONE was
>> used.
> The thread start event is set to SUSPEND_NONE for thread0, but when
> the thread start event is observed the resume008 test suspends the vm
> immediately after fetching the "number" property.
My point is that the Debuggee continues to run after the
ThreadStartEvent is sent, and relies on the debugger to stop it after
receiving the event. But in the meantime the debuggee has advanced to
the next breakpoint, but only sometimes, thus the bug you are seeing.
>
> 132 if ( !(newEvent instanceof ThreadStartEvent)) {
> 133 setFailedStatus("ERROR: new event is not
> ThreadStartEvent");
> 134 } else {
> 135
> 136 String property = (String)
> newEvent.request().getProperty("number");
> 137 display(" got new ThreadStartEvent
> with propety 'number' == " + property);
> 138
> 139 display("......checking up on
> EventSet.resume()");
> 140 display("......--> vm.suspend();");
> 141 vm.suspend();
>
>
>>
>> Chris
>>
>> On 7/18/18 4:52 AM, Gary Adams wrote:
>>> There is nothing wrong with the breakpoint in methodForCommunication.
>>> The test uses it to make sure the threads are each tested separately.
>>> The breakpoint eventhandler just displays a message, increments a
>>> counter
>>> and returns.
>>>
>>> Let me step through resume008a the debugee to help clarify ...
>>>
>>> 1. The test thread is created and the synchronized break point is
>>> observed. lines 101-102
>>> 2. The thread is started. lines 104,135-137
>>> 2a. The main thread blocks on a local object. lines 133, 139
>>> 2b. The test thread is started. lines 137,
>>> A run entered message is displayed, line 159
>>> The main thread lock object is notified, line 167
>>> 2b1. The main thread continues. line 167, 146
>>> The next test thread is created. line 106
>>> The synchronized breakpoint is observed, line 107
>>> 2b2. A run exited message is displayed, line 169
>>>
>>> On the resume008 debugger side ...
>>> 1. On a thread start event the debugee is suspended, line 141
>>> 2. Messages are displayed and a first set of thread suspend counts
>>> is acquired. lines 143-151
>>> 3. The threads are resumed, line 152
>>> --->
>>> 4. Messages are displayed and a second set of thread suspend
>>> counts is acquired. lines 154-159
>>>
>>> The way the test is written the expectation is the debugger steps
>>> 2,3,4 will all happen
>>> while the test thread is running.
>>>
>>> When the debugger resumes the debuggee threads (debugger step 3)
>>> the debuggee continues from where it left off (debuggee steps
>>> 2b,2b1,2b2)
>>>
>>> If we complete debuggee step 2b1 (line 107) before the debugger
>>> completes step 4 line 159,
>>> then the synchronized breakpoint will suspend the vm and the counts
>>> will not match
>>> for the SUSPEND_NONE test thread start.
>>>
>>> resume008a.java:
>>>
>>> 100 case 0:
>>> 101 thread0 = new
>>> Threadresume008a("thread0");
>>> 102 methodForCommunication();
>>> 103
>>> 104 threadStart(thread0);
>>> 105
>>> 106 thread1 = new
>>> Threadresume008a("thread1");
>>> 107 methodForCommunication();
>>> 108 break;
>>>
>>> ...
>>> 135 static int threadStart(Thread t) {
>>> 136 synchronized (waitnotifyObj) {
>>> 137 t.start();
>>> 138 try {
>>> 139 waitnotifyObj.wait();
>>> 140 } catch ( Exception e) {
>>> 141 exitCode = FAILED;
>>> 142 logErr(" Exception : " + e );
>>> 143 return FAILED;
>>> 144 }
>>> 145 }
>>> 146 return PASSED;
>>> 147 }
>>>
>>> 149 static class Threadresume008a extends Thread {
>>> ...
>>> 157
>>> 158 public void run() {
>>> 159 log1(" 'run': enter :: threadName == " +
>>> tName);
>>>
>>> This is the proposed fix that will let the debugger complete it's
>>> second
>>> acquisition of suspend counts while the test thread is still running.
>>>
>>> 160 // Yield, so the start thread event processing
>>> can be completed.
>>> 161 try {
>>> 162 Thread.sleep(100);
>>> 163 } catch (InterruptedException e) {
>>> 164 // ignored
>>> 165 }
>>>
>>> 166 synchronized (waitnotifyObj) {
>>> 167 waitnotifyObj.notify();
>>> 168 }
>>> 169 log1(" 'run': exit :: threadName == " +
>>> tName);
>>> 170 return;
>>> 171 }
>>> 172 }
>>> 150
>>> 151 String tName = null;
>>> 152
>>> 153 public Threadresume008a(String threadName) {
>>> 154 super(threadName);
>>> 155 tName = threadName;
>>> 156 }
>>> 157
>>> 158 public void run() {
>>> 159 log1(" 'run': enter :: threadName == " +
>>> tName);
>>> 160 // Yield, so the start thread event processing
>>> can be completed.
>>> 161 try {
>>> 162 Thread.sleep(100);
>>> 163 } catch (InterruptedException e) {
>>> 164 // ignored
>>> 165 }
>>> 166 synchronized (waitnotifyObj) {
>>> 167 waitnotifyObj.notify();
>>> 168 }
>>> 169 log1(" 'run': exit :: threadName == " +
>>> tName);
>>> 170 return;
>>> 171 }
>>> 172 }
>>>
>>>
>>>
>>> On 7/18/18, 2:38 AM, Chris Plummer wrote:
>>>> Hi Gary,
>>>>
>>>> I've been having trouble following the control flow of this test.
>>>> One thing I've stumbled across is the following:
>>>>
>>>> /* A debuggee class must define 'methodForCommunication'
>>>> * method and invoke it in points of synchronization
>>>> * with a debugger.
>>>> */
>>>> setCommunicationBreakpoint(debuggeeClass,"methodForCommunication");
>>>>
>>>> So why isn't this mode of synchronization good enough? Is it
>>>> because it was not designed with the understanding that the
>>>> debugger might be doing suspended thread counts, and suspending all
>>>> threads at the breakpoint messes up the test?
>>>>
>>>> From what I can tell of the test, after the debuggee is started and
>>>> hits the default breakpoint at the start of main(), the debugger
>>>> then does a vm.resume() at the start of the for loop in the
>>>> runTest() method. The debuggee then creates a thread and calls
>>>> methodForCommunication(). There is already a breakpoint set there
>>>> by the above debuggee code. It's unclear to me what happens as a
>>>> result of this breakpoint and how it serves the test. Also unclear
>>>> to me who is responsible for the vm.resume() after the breakpoint
>>>> is hit.
>>>>
>>>> The debugger then requests all ThreadStart events, requesting that
>>>> no threads be disabled when it is sent. I think you are saying that
>>>> when the ThreadStart event comes in, sometimes we are at the
>>>> methodForCommunication breakpoint, with all threads disabled, and
>>>> this messes up the thread suspend counts. You want to delay 100ms
>>>> so the breakpoint event can be processed and threads resumed again
>>>> (although I can't see who actually resumes the thread after hitting
>>>> the methodForCommunication breakpoint).
>>>>
>>>> Chris
>>>>
>>>> On 7/17/18 8:33 AM, Gary Adams wrote:
>>>>> A race condition exists between the debugger and the debuggee.
>>>>>
>>>>> The first test thread is started with SUSPEND_NONE policy set.
>>>>> While processing the thread start event the debugger captures
>>>>> an initial set of thread suspend counts and resumes the
>>>>> debuggee vm. If the debuggee advances quickly it reaches
>>>>> the breakpoint set for methodForCommunication. Since the breakpoint
>>>>> carries with it SUSPEND_ALL policy, when the debugger captures a
>>>>> second
>>>>> set of suspend counts, it will not match the expected counts for
>>>>> a SUSPEND_NONE scenario.
>>>>>
>>>>> The proposed fix introduces a yield in the debuggee test thread
>>>>> run method
>>>>> to allow the debugger to get the expected sampled values.
>>>>>
>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8170089
>>>>> Webrev: http://cr.openjdk.java.net/~gadams/8170089/webrev.00/
>>>>>
>>>>>
>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/TestDebuggerType1.java:
>>>>> ...
>>>>> 186 private void
>>>>> setCommunicationBreakpoint(ReferenceType refType, String
>>>>> methodName) {
>>>>> 187 Method method = debuggee.methodByName(refType,
>>>>> methodName);
>>>>> 188 Location location = null;
>>>>> 189 try {
>>>>> 190 location = method.allLineLocations().get(0);
>>>>> 191 } catch (AbsentInformationException e) {
>>>>> 192 throw new Failure(e);
>>>>> 193 }
>>>>> 194 bpRequest = debuggee.makeBreakpoint(location);
>>>>> 195
>>>>>
>>>>> 196 bpRequest.setSuspendPolicy(EventRequest.SUSPEND_ALL);
>>>>>
>>>>> 197 bpRequest.putProperty("number", "zero");
>>>>> 198 bpRequest.enable();
>>>>> 199
>>>>> 200 eventHandler.addListener(
>>>>> 201 new EventHandler.EventListener() {
>>>>> 202 public boolean eventReceived(Event
>>>>> event) {
>>>>> 203 if (event instanceof BreakpointEvent
>>>>> && bpRequest.equals(event.request())) {
>>>>> 204 synchronized(eventHandler) {
>>>>> 205 display("Received
>>>>> communication breakpoint event.");
>>>>> 206 bpCount++;
>>>>> 207 eventHandler.notifyAll();
>>>>> 208 }
>>>>> 209 return true;
>>>>> 210 }
>>>>> 211 return false;
>>>>> 212 }
>>>>> 213 }
>>>>> 214 );
>>>>> 215 }
>>>>>
>>>>>
>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/EventSet/resume/resume008.java:
>>>>> ...
>>>>> 140 display("......--> vm.suspend();");
>>>>> 141 vm.suspend();
>>>>> 142
>>>>> 143 display(" getting : Map<String,
>>>>> Integer> suspendsCounts1");
>>>>> 144
>>>>> 145 Map<String, Integer> suspendsCounts1 =
>>>>> new HashMap<String, Integer>();
>>>>> 146 for (ThreadReference threadReference :
>>>>> vm.allThreads()) {
>>>>> 147 suspendsCounts1.put(threadReference.name(),
>>>>> threadReference.suspendCount());
>>>>> 148 }
>>>>> 149 display(suspendsCounts1.toString());
>>>>> 150
>>>>> 151 display(" eventSet.resume;");
>>>>> 152 eventSet.resume();
>>>>> 153
>>>>> 154 display(" getting : Map<String,
>>>>> Integer> suspendsCounts2");
>>>>>
>>>>> This is where the breakpoint is encountered before the second set
>>>>> of suspend counts is acquired.
>>>>>
>>>>> 155 Map<String, Integer> suspendsCounts2 =
>>>>> new HashMap<String, Integer>();
>>>>> 156 for (ThreadReference threadReference :
>>>>> vm.allThreads()) {
>>>>> 157 suspendsCounts2.put(threadReference.name(),
>>>>> threadReference.suspendCount());
>>>>> 158 }
>>>>> 159 display(suspendsCounts2.toString());
>>>>>
>>>>
>>>
>>
>
More information about the serviceability-dev
mailing list