RFR JDK-8170089: nsk/jdi/EventSet/resume/resume008: ERROR: suspendCounts don't match for : Common-Cleaner

Chris Plummer chris.plummer at oracle.com
Wed Jul 18 20:47:09 UTC 2018


Hi Gary

Ok, so shouldRunAfterBreakpoint() is the code that does the 
eventHandler.wait(), so it gets the eventHandler.notifyAll() 
notification from the BreakpointEvent handler.

And as a side note, I see now that resumption of execution after the 
breakpoint at main() is done by:

             // after waitForClassPrepared() main debuggee thread is 
suspended, resume it before test start
             display("RESUME DEBUGGEE VM");
             vm.resume();

             testRun();

shouldRunAfterBreakpoint() is returning true until the end of the test 
when the debuggee is executes "instruction = end". That's why runTests() 
does a "break" when shouldRunAfterBreakpoint() returns false. So this 
means the code that is checking shouldRunAfterBreakpoint() is not 
resuming execution for the first few (probably 3) 
methodForCommunication() breakpoints. However, it does make sure that 
runTests() blocks until the BreakPointEvent has been processed.

You point out the vm.resume() at the bottom of the loop in runTests(), 
but that's only after a bunch of ThreadStartEvent processing above it 
has been done already. The ThreadStartEvent would never get generated if 
there was not a resume some point earlier. I think it is happening 
during the eventHandler.waitForRequestedEventSet() call, which does a 
vm.resume().

So if I understand the order of things now:

-shouldRunAfterBreakpoint() returns after first methodForCommunication() 
is hit. At this point we know the first thread has been created, but no 
attempt to start it yet. The debuggee is suspended at this point.
-runTests() requests ThreadStartEvents with SUSPEND_NONE. This also does 
a vm.resume().
-The debuggee starts the thread and then does another 
methodForCommunication() (this 2nd one is actually after the 2nd thread 
has been created, but not yet started). Now we have a race. Do we get 
the ThreadStartEvent first or the BreakpointEvent. This is because when 
the ThreadStartEvent is generated, the thread is not suspended due to 
SUSPEND_NONE. Even if the ThreadStartEvent comes in first, the async 
handling of the BreakpointEvent can cause problems during the 
ThreadStartEvent processing.
-You added a 100ms delay after the thread has started, but before 
methodForCommunication(), hoping it will make it so the ThreadStartEvent 
can be received and fully processed before the BreakpointEvent is.

I think it would be preferable to fix this by doing better 
sychronization. After all, that is the approach the test originally 
took. It could have been written with a bunch of sleep() delays instead, 
but that in general is not a very good approach.

What if you added a shouldRunAfterBreakpoint() call after getting the 
ThreadStartEvent arrives. At this point you would know that the vm is 
suspended due to the breakpoint, so no need for:

                 display("......checking up on EventSet.resume()");
                 display("......--> vm.suspend();");
                 vm.suspend();

You might then also need to add another methodForCommunication() call at 
the end of case 0 and 1 in the debuggee, although I think you could 
instead just change the shouldRunAfterBreakpoint() at the start of the 
loop. I think that check actually belongs at the end of the loop, and 
only for case 2. In fact it would be an error if 
shouldRunAfterBreakpoint() did not return true in that case. Then you 
also need to add a shouldRunAfterBreakpoint() at the start of case 0 to 
get things rolling (and I think at the start of case 1 also).

Chris


On 7/18/18 12:45 PM, Gary Adams wrote:
> Answers below  ...
>
> On 7/18/18, 2:50 PM, Chris Plummer wrote:
>> Hi Gary,
>>
>> Who does the resume for the breakpoint event?
>>
>>         eventHandler.addListener(
>>              new EventHandler.EventListener() {
>>                  public boolean eventReceived(Event event) {
>>                     if (event instanceof BreakpointEvent && 
>> bpRequest.equals(event.request())) {
>>                         synchronized(eventHandler) {
>>                             display("Received communication 
>> breakpoint event.");
>>                             bpCount++;
>>                             eventHandler.notifyAll();
>>                         }
>>                         return true;
>>                     }
>>                     return false;
>>                  }
>>              }
>>         );
> I believe you are looking for this sequence.
> At the top of the loop a check is made if
> resume() should be called "shouldRunAfterBreakpoint".
> lines 96-99 is an early termination. And at the
> bottom of the loop, line 240, is the normal
> continue the test to the next case.
>
> resume008.java :
> ...
>     94            for (int i = 0; ; i++) {
>     95
>
>     96                if (!shouldRunAfterBreakpoint()) {
>     97                    vm.resume();
>     98                    break;
>     99                }
>
> 100
>    101
>    102                display(":::::: case: # " + i);
>    103
>    104                switch (i) {
>    105
>    106                    case 0:
>    107                    eventRequest = settingThreadStartRequest (
>    108                                           SUSPEND_NONE, 
> "ThreadStartRequest1");
> ...
>   238
>    239                display("......--> vm.resume()");
>    240                vm.resume();
>    241            }
>>
>> Also:
>>
>>>   1. On a thread start event the debugee is suspended, line 141 
>> That's not true for the first ThreadStartEvent since SUSPEND_NONE was 
>> used.
> The thread start event is set to SUSPEND_NONE for thread0, but when
> the thread start event is observed the resume008 test suspends the vm
> immediately after fetching the "number" property.
My point is that the Debuggee continues to run after the 
ThreadStartEvent is sent, and relies on the debugger to stop it after 
receiving the event. But in the meantime the debuggee has advanced to 
the next breakpoint, but only sometimes, thus the bug you are seeing.
>
>    132                if ( !(newEvent instanceof ThreadStartEvent)) {
>    133                    setFailedStatus("ERROR: new event is not 
> ThreadStartEvent");
>    134                } else {
>    135
>    136                    String property = (String) 
> newEvent.request().getProperty("number");
>    137                    display("       got new ThreadStartEvent 
> with propety 'number' == " + property);
>    138
>    139                    display("......checking up on 
> EventSet.resume()");
>    140                    display("......--> vm.suspend();");
>    141                    vm.suspend();
>
>
>>
>> Chris
>>
>> On 7/18/18 4:52 AM, Gary Adams wrote:
>>> There is nothing wrong with the breakpoint in methodForCommunication.
>>> The test uses it to make sure the threads are each tested separately.
>>> The breakpoint eventhandler just displays a message, increments a 
>>> counter
>>> and returns.
>>>
>>> Let me step through resume008a the debugee to help clarify ...
>>>
>>> 1. The test thread is created and the synchronized break point is 
>>> observed. lines 101-102
>>> 2. The thread is started. lines 104,135-137
>>>     2a. The main thread blocks on a local object. lines 133, 139
>>>     2b. The test thread is started. lines 137,
>>>            A run entered message is displayed, line 159
>>>            The main thread lock object is notified, line 167
>>>           2b1. The main thread continues. line 167, 146
>>>                   The next test thread is created. line 106
>>>                   The synchronized breakpoint is observed, line 107
>>>           2b2. A run exited message is displayed, line 169
>>>
>>> On the resume008 debugger side  ...
>>>   1. On a thread start event the debugee is suspended, line 141
>>>   2. Messages are displayed and a first set of thread suspend counts 
>>> is acquired. lines 143-151
>>>   3. The threads are resumed, line 152
>>> --->
>>>   4.  Messages are displayed and a second set of thread suspend 
>>> counts is acquired. lines 154-159
>>>
>>> The way the test is written the expectation is the debugger steps 
>>> 2,3,4 will all happen
>>> while the test thread is running.
>>>
>>> When the debugger resumes the debuggee threads (debugger step 3)
>>> the debuggee continues from where it left off (debuggee steps 
>>> 2b,2b1,2b2)
>>>
>>> If we complete debuggee step 2b1 (line 107) before the debugger 
>>> completes step 4 line 159,
>>> then the synchronized breakpoint will suspend the vm and the counts 
>>> will not match
>>> for the SUSPEND_NONE test thread start.
>>>
>>> resume008a.java:
>>>
>>>    100                        case 0:
>>>    101                                thread0 = new 
>>> Threadresume008a("thread0");
>>>    102 methodForCommunication();
>>>    103
>>>    104                                threadStart(thread0);
>>>    105
>>>    106                                thread1 = new 
>>> Threadresume008a("thread1");
>>>    107 methodForCommunication();
>>>    108                                break;
>>>
>>>    ...
>>>    135        static int threadStart(Thread t) {
>>>    136            synchronized (waitnotifyObj) {
>>>    137                t.start();
>>>    138                try {
>>>    139                    waitnotifyObj.wait();
>>>    140                } catch ( Exception e) {
>>>    141                    exitCode = FAILED;
>>>    142                    logErr("       Exception : " + e );
>>>    143                    return FAILED;
>>>    144                }
>>>    145            }
>>>    146            return PASSED;
>>>    147        }
>>>
>>>    149        static class Threadresume008a extends Thread {
>>>    ...
>>>    157
>>>    158            public void run() {
>>>    159                log1("  'run': enter  :: threadName == " + 
>>> tName);
>>>
>>> This is the proposed fix that will let the debugger complete it's 
>>> second
>>> acquisition of suspend counts while the test thread is still running.
>>>
>>>    160                // Yield, so the start thread event processing 
>>> can be completed.
>>>    161                try {
>>>    162                    Thread.sleep(100);
>>>    163                } catch (InterruptedException e) {
>>>    164                    // ignored
>>>    165                }
>>>
>>>    166                synchronized (waitnotifyObj) {
>>>    167                        waitnotifyObj.notify();
>>>    168                }
>>>    169                log1("  'run': exit   :: threadName == " + 
>>> tName);
>>>    170                return;
>>>    171            }
>>>    172        }
>>>    150
>>>    151            String tName = null;
>>>    152
>>>    153            public Threadresume008a(String threadName) {
>>>    154                super(threadName);
>>>    155                tName = threadName;
>>>    156            }
>>>    157
>>>    158            public void run() {
>>>    159                log1("  'run': enter  :: threadName == " + 
>>> tName);
>>>    160                // Yield, so the start thread event processing 
>>> can be completed.
>>>    161                try {
>>>    162                    Thread.sleep(100);
>>>    163                } catch (InterruptedException e) {
>>>    164                    // ignored
>>>    165                }
>>>    166                synchronized (waitnotifyObj) {
>>>    167                        waitnotifyObj.notify();
>>>    168                }
>>>    169                log1("  'run': exit   :: threadName == " + 
>>> tName);
>>>    170                return;
>>>    171            }
>>>    172        }
>>>
>>>
>>>
>>> On 7/18/18, 2:38 AM, Chris Plummer wrote:
>>>> Hi Gary,
>>>>
>>>> I've been having trouble following the control flow of this test. 
>>>> One thing I've stumbled across is the following:
>>>>
>>>>             /* A debuggee class must define 'methodForCommunication'
>>>>              * method and invoke it in points of synchronization
>>>>              * with a debugger.
>>>>              */
>>>> setCommunicationBreakpoint(debuggeeClass,"methodForCommunication");
>>>>
>>>> So why isn't this mode of synchronization good enough? Is it 
>>>> because it was not designed with the understanding that the 
>>>> debugger might be doing suspended thread counts, and suspending all 
>>>> threads at the breakpoint messes up the test?
>>>>
>>>> From what I can tell of the test, after the debuggee is started and 
>>>> hits the default breakpoint at the start of main(), the debugger 
>>>> then does a vm.resume() at the start of the for loop in the 
>>>> runTest() method. The debuggee then creates a thread and calls 
>>>> methodForCommunication(). There is already a breakpoint set there 
>>>> by the above debuggee code. It's unclear to me what happens as a 
>>>> result of this breakpoint and how it serves the test. Also unclear 
>>>> to me who is responsible for the vm.resume() after the breakpoint 
>>>> is hit.
>>>>
>>>> The debugger then requests all ThreadStart events, requesting that 
>>>> no threads be disabled when it is sent. I think you are saying that 
>>>> when the ThreadStart event comes in, sometimes we are at the 
>>>> methodForCommunication breakpoint, with all threads disabled, and 
>>>> this messes up the thread suspend counts. You want to delay 100ms 
>>>> so the breakpoint event can be processed and threads resumed again 
>>>> (although I can't see who actually resumes the thread after hitting 
>>>> the methodForCommunication breakpoint).
>>>>
>>>> Chris
>>>>
>>>> On 7/17/18 8:33 AM, Gary Adams wrote:
>>>>> A race condition exists between the debugger and the debuggee.
>>>>>
>>>>> The first test thread is started with SUSPEND_NONE policy set.
>>>>> While processing the thread start event the debugger captures
>>>>> an initial set of thread suspend counts and resumes the
>>>>> debuggee vm. If the debuggee advances quickly it reaches
>>>>> the breakpoint set for methodForCommunication. Since the breakpoint
>>>>> carries with it SUSPEND_ALL policy, when the debugger captures a 
>>>>> second
>>>>> set of suspend counts, it will not match the expected counts for
>>>>> a SUSPEND_NONE scenario.
>>>>>
>>>>> The proposed fix introduces a yield in the debuggee test thread 
>>>>> run method
>>>>> to allow the debugger to get the expected sampled values.
>>>>>
>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8170089
>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8170089/webrev.00/
>>>>>
>>>>>
>>>>> test/hotspot/jtreg/vmTestbase/nsk/share/jdi/TestDebuggerType1.java:
>>>>> ...
>>>>>    186        private void 
>>>>> setCommunicationBreakpoint(ReferenceType refType, String 
>>>>> methodName) {
>>>>>    187            Method method = debuggee.methodByName(refType, 
>>>>> methodName);
>>>>>    188            Location location = null;
>>>>>    189            try {
>>>>>    190                location = method.allLineLocations().get(0);
>>>>>    191            } catch (AbsentInformationException e) {
>>>>>    192                throw new Failure(e);
>>>>>    193            }
>>>>>    194            bpRequest = debuggee.makeBreakpoint(location);
>>>>>    195
>>>>>
>>>>>    196 bpRequest.setSuspendPolicy(EventRequest.SUSPEND_ALL);
>>>>>
>>>>>    197            bpRequest.putProperty("number", "zero");
>>>>>    198            bpRequest.enable();
>>>>>    199
>>>>>    200            eventHandler.addListener(
>>>>>    201                 new EventHandler.EventListener() {
>>>>>    202                     public boolean eventReceived(Event 
>>>>> event) {
>>>>>    203                        if (event instanceof BreakpointEvent 
>>>>> && bpRequest.equals(event.request())) {
>>>>>    204 synchronized(eventHandler) {
>>>>>    205                                display("Received 
>>>>> communication breakpoint event.");
>>>>>    206                                bpCount++;
>>>>>    207 eventHandler.notifyAll();
>>>>>    208                            }
>>>>>    209                            return true;
>>>>>    210                        }
>>>>>    211                        return false;
>>>>>    212                     }
>>>>>    213                 }
>>>>>    214            );
>>>>>    215        }
>>>>>
>>>>>
>>>>> test/hotspot/jtreg/vmTestbase/nsk/jdi/EventSet/resume/resume008.java:
>>>>> ...
>>>>>    140                    display("......--> vm.suspend();");
>>>>>    141                    vm.suspend();
>>>>>    142
>>>>>    143                    display("        getting : Map<String, 
>>>>> Integer> suspendsCounts1");
>>>>>    144
>>>>>    145                    Map<String, Integer> suspendsCounts1 = 
>>>>> new HashMap<String, Integer>();
>>>>>    146                    for (ThreadReference threadReference : 
>>>>> vm.allThreads()) {
>>>>>    147 suspendsCounts1.put(threadReference.name(), 
>>>>> threadReference.suspendCount());
>>>>>    148                    }
>>>>>    149 display(suspendsCounts1.toString());
>>>>>    150
>>>>>    151                    display(" eventSet.resume;");
>>>>>    152                    eventSet.resume();
>>>>>    153
>>>>>    154                    display("        getting : Map<String, 
>>>>> Integer> suspendsCounts2");
>>>>>
>>>>> This is where the breakpoint is encountered before the second set 
>>>>> of suspend counts is acquired.
>>>>>
>>>>>    155                    Map<String, Integer> suspendsCounts2 = 
>>>>> new HashMap<String, Integer>();
>>>>>    156                    for (ThreadReference threadReference : 
>>>>> vm.allThreads()) {
>>>>>    157 suspendsCounts2.put(threadReference.name(), 
>>>>> threadReference.suspendCount());
>>>>>    158                    }
>>>>>    159 display(suspendsCounts2.toString());
>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list