RFR: 6815126 intermittent SimulResumerTest.java failure

shanliang shanliang.jiang at oracle.com
Wed Apr 2 08:01:58 UTC 2014


Hope to get reviewed and to push this fix:

1) this is a fix for a bug labeled with "svc-nightly"

2) The current test must be useful. Yes the test could not be 100% sure 
to test the bug JDK-6751643, but with its 2*10000 resume repeatings it 
would have big chance to hit the bug conditions, the failure the patch 
to fix happened exactly in the condition the bug JDK-6751643 could happen.

3) there is possibly someway to realize the synchronization logic 
between the thread invoking the operations and the thread resuming, I 
could see to add code into the method "resume" to do waiting for this 
test, but I could not see an easy and practical way to do that.

4) we can create a new bug to fix this synchronization issue if necessary.

Thanks,
Shanliang

shanliang wrote:
> Jaroslav Bachorik wrote:
>> Thanks Shanliang, it is clear now.
>>
>> The patch will get rid off the IOOBE but I have my doubts about what 
>> the test actually tests. It is supposed to make sure that certain 
>> operations will not throw NPE when the debugged thread is resumed 
>> (from a concurrent debugger thread) before the operation has managed 
>> to finish. However, there seems to be no synchronization logic 
>> between the thread invoking the operations and the thread resuming 
>> the paused debugged thread, relying only on hitting this condition by 
>> chance.
>>
>> This test seems to be a good candidate for a thorough revision/rewrite.
> Not sure how to make the checking happen during a "resuming" window, 
> the test creates 2 threads and each repeats "resume"10000 times, and 
> one another thread repeats checking with 100ms sleeping time,  just 
> hoping some checking would fail into a resuming window.
>
> Shanliang
>>
>> -JB-
>>
>> On 31.3.2014 11:26, shanliang wrote:
>>> Erik Gahlin wrote:
>>>> I also like to understand better.
>>> Possibly my previous reply was not clear enough or I missed something
>>> there.
>>>
>>> The test was to test JDK-6751643 as I cited in the last mail, here is
>>> the info from JDK-6751643 to which this test was developed:
>>> ------
>>> This bug can only occur if a debugger has multiple threads and calls 
>>> any
>>> of the following methods in one thread while simultaneously resuming 
>>> the
>>> same debuggee thread in a different debugger thread. Debuggers 
>>> shouldn't
>>> do this because it is a race condition and the result returned by these
>>> methods will vary depending upon just where in the processing of these
>>> methods the resume takes effect. EG, the frameCount() method could
>>> return 6 in a case where the debuggee has already been resumed and 
>>> there
>>> are no frames.
>>> ------
>>>
>>> To reproduce the bug, test did mainly 2 things by different threads:
>>> 1) received vm events and resumed vm, this was done by thread 
>>> "Thread-1"
>>> in the class TestScaffold which registered a listener and called the
>>> following method:
>>>     /**
>>>      * Events handled directly by scaffold always resume (well, almost
>>> always)
>>>      */
>>>         public void eventSetComplete(EventSet set) {
>>>         // The listener in connect(..) resumes after receiving our
>>>         // special VMDeathEvent.  We can't also do the resume
>>>         // here or we will probably get a VMDisconnectedException
>>>         if (!containsOurVMDeathRequest(set)) {
>>>             traceln("TS: set.resume() called");
>>>             set.resume();
>>>         }
>>>   }
>>>
>>> 2) called the method "check" in the class SimulResumerTarg, to see
>>> whether a NullPointerException was thrown, the thread name was "test
>>> resumer" (better to named as "checking thread"?)
>>>
>>> So one thread was doing resume, another thread was doing check at same.
>>> I added the code to see the different values of  frames.size() at 
>>> line 185:
>>>     for (i=0; i<10:i++) {
>>>         System.out.println("---frames.size(): "+frames.size());
>>>         Thhread.sleep(200);
>>>     }
>>>
>>> if printing out frames, sometime we could see one more frame:
>>>     ------------------ java.lang.Thread.yield()+-1 in thread 
>>> instance of
>>> SimulResumerTarg(name='Thread 2', id=109)
>>>
>>>
>>> Shanliang
>>>>
>>>> I looked at this failure before and I couldn't see what was wrong, not
>>>> in the test or product.
>>>>
>>>> Erik
>>>>
>>>> Jaroslav Bachorik skrev 3/27/14 4:49 PM:
>>>>> On 27.3.2014 15:49, shanliang wrote:
>>>>>> Hi,
>>>>>>
>>>>>> The call
>>>>>>     thr.frames(0, frames.size() - 1);
>>>>>> suffers a synchronization issue, the size may be changed after
>>>>>> frames.size() returns.
>>>>>
>>>>> Any idea why there is a synchronization issue? The code seems to be
>>>>> intended to run only when a breakpoint is hit and the target thread
>>>>> is suspended.
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> webrev:
>>>>>> http://cr.openjdk.java.net/~sjiang/JDK-6815126/00/
>>>>>>
>>>>>> bug:
>>>>>> https://bugs.openjdk.java.net/browse/JDK-6815126
>>>>>>
>>>>>> Shanliang
>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list