RFR: 6815126 intermittent SimulResumerTest.java failure
Staffan Larsen
staffan.larsen at oracle.com
Thu Apr 3 11:03:25 UTC 2014
Looks ok.
Thanks,
/Staffan
On 2 apr 2014, at 10:09, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
> Hi Shanliang,
>
> I'm fine with the proposed fix, although not a reviewer.
>
> On 2.4.2014 10:01, shanliang wrote:
>> Hope to get reviewed and to push this fix:
>>
>> 1) this is a fix for a bug labeled with "svc-nightly"
>>
>> 2) The current test must be useful. Yes the test could not be 100% sure
>> to test the bug JDK-6751643, but with its 2*10000 resume repeatings it
>> would have big chance to hit the bug conditions, the failure the patch
>> to fix happened exactly in the condition the bug JDK-6751643 could happen.
>>
>> 3) there is possibly someway to realize the synchronization logic
>> between the thread invoking the operations and the thread resuming, I
>> could see to add code into the method "resume" to do waiting for this
>> test, but I could not see an easy and practical way to do that.
>
> Just a blind shot - using eg. phaser or cyclic barrier to orchestrate stepping through the code in both threads could help to increase the chances of hitting the problematic situation (but not making it certain, though)
>
>
>>
>> 4) we can create a new bug to fix this synchronization issue if necessary.
>
> Sounds like a very good idea.
>
> Thanks,
>
> -JB-
>
>
>>
>> Thanks,
>> Shanliang
>>
>> shanliang wrote:
>>> Jaroslav Bachorik wrote:
>>>> Thanks Shanliang, it is clear now.
>>>>
>>>> The patch will get rid off the IOOBE but I have my doubts about what
>>>> the test actually tests. It is supposed to make sure that certain
>>>> operations will not throw NPE when the debugged thread is resumed
>>>> (from a concurrent debugger thread) before the operation has managed
>>>> to finish. However, there seems to be no synchronization logic
>>>> between the thread invoking the operations and the thread resuming
>>>> the paused debugged thread, relying only on hitting this condition by
>>>> chance.
>>>>
>>>> This test seems to be a good candidate for a thorough revision/rewrite.
>>> Not sure how to make the checking happen during a "resuming" window,
>>> the test creates 2 threads and each repeats "resume"10000 times, and
>>> one another thread repeats checking with 100ms sleeping time, just
>>> hoping some checking would fail into a resuming window.
>>>
>>> Shanliang
>>>>
>>>> -JB-
>>>>
>>>> On 31.3.2014 11:26, shanliang wrote:
>>>>> Erik Gahlin wrote:
>>>>>> I also like to understand better.
>>>>> Possibly my previous reply was not clear enough or I missed something
>>>>> there.
>>>>>
>>>>> The test was to test JDK-6751643 as I cited in the last mail, here is
>>>>> the info from JDK-6751643 to which this test was developed:
>>>>> ------
>>>>> This bug can only occur if a debugger has multiple threads and calls
>>>>> any
>>>>> of the following methods in one thread while simultaneously resuming
>>>>> the
>>>>> same debuggee thread in a different debugger thread. Debuggers
>>>>> shouldn't
>>>>> do this because it is a race condition and the result returned by these
>>>>> methods will vary depending upon just where in the processing of these
>>>>> methods the resume takes effect. EG, the frameCount() method could
>>>>> return 6 in a case where the debuggee has already been resumed and
>>>>> there
>>>>> are no frames.
>>>>> ------
>>>>>
>>>>> To reproduce the bug, test did mainly 2 things by different threads:
>>>>> 1) received vm events and resumed vm, this was done by thread
>>>>> "Thread-1"
>>>>> in the class TestScaffold which registered a listener and called the
>>>>> following method:
>>>>> /**
>>>>> * Events handled directly by scaffold always resume (well, almost
>>>>> always)
>>>>> */
>>>>> public void eventSetComplete(EventSet set) {
>>>>> // The listener in connect(..) resumes after receiving our
>>>>> // special VMDeathEvent. We can't also do the resume
>>>>> // here or we will probably get a VMDisconnectedException
>>>>> if (!containsOurVMDeathRequest(set)) {
>>>>> traceln("TS: set.resume() called");
>>>>> set.resume();
>>>>> }
>>>>> }
>>>>>
>>>>> 2) called the method "check" in the class SimulResumerTarg, to see
>>>>> whether a NullPointerException was thrown, the thread name was "test
>>>>> resumer" (better to named as "checking thread"?)
>>>>>
>>>>> So one thread was doing resume, another thread was doing check at same.
>>>>> I added the code to see the different values of frames.size() at
>>>>> line 185:
>>>>> for (i=0; i<10:i++) {
>>>>> System.out.println("---frames.size(): "+frames.size());
>>>>> Thhread.sleep(200);
>>>>> }
>>>>>
>>>>> if printing out frames, sometime we could see one more frame:
>>>>> ------------------ java.lang.Thread.yield()+-1 in thread
>>>>> instance of
>>>>> SimulResumerTarg(name='Thread 2', id=109)
>>>>>
>>>>>
>>>>> Shanliang
>>>>>>
>>>>>> I looked at this failure before and I couldn't see what was wrong, not
>>>>>> in the test or product.
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>> Jaroslav Bachorik skrev 3/27/14 4:49 PM:
>>>>>>> On 27.3.2014 15:49, shanliang wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> The call
>>>>>>>> thr.frames(0, frames.size() - 1);
>>>>>>>> suffers a synchronization issue, the size may be changed after
>>>>>>>> frames.size() returns.
>>>>>>>
>>>>>>> Any idea why there is a synchronization issue? The code seems to be
>>>>>>> intended to run only when a breakpoint is hit and the target thread
>>>>>>> is suspended.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> webrev:
>>>>>>>> http://cr.openjdk.java.net/~sjiang/JDK-6815126/00/
>>>>>>>>
>>>>>>>> bug:
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-6815126
>>>>>>>>
>>>>>>>> Shanliang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20140403/800ded49/attachment-0001.html>
More information about the serviceability-dev
mailing list