RFR: JDK-8051349: nsk/jvmti/scenarios/sampling/SP06/sp06t003 fails in nightly

Mon Dec 17 23:38:43 UTC 2018

On 12/17/18 2:30 PM, David Holmes wrote:
> On 18/12/2018 8:22 am, Chris Plummer wrote:
>> I think you need to set interruptReady inside the synchronized block. 
>> Otherwise checkReady() can still get to the interrupt() call before 
>> testedMethod() gets to the wait().
>
> That is not an issue. You can interrupt at any time independent of 
> owning the monitor. The interrupt remains pending until the thread 
> explicitly clears it, or an API, like wait() sees it, throws IE and 
> clears it.
Ok. In that case LGTM.

Chris
>
> David
>
>> Chris
>>
>> On 12/15/18 12:24 PM, gary.adams at oracle.com wrote:
>>> Updated webrev:
>>>
>>> http://cr.openjdk.java.net/~gadams/8051349/webrev.02/index.html
>>>
>>> On 12/15/18 8:15 AM, gary.adams at oracle.com wrote:
>>>> On 12/14/18 4:43 PM, Chris Plummer wrote:
>>>>> On 12/14/18 5:28 AM, Gary Adams wrote:
>>>>>
>>>>> BTW, I forgot to ask if you can fix the typo in the following 
>>>>> comment:
>>>>>
>>>>>  345             /* check frame equalaty */
>>>>
>>>> Yes, I fixed that one on Fri.
>>>> I usually save the typos for the final webrev.
>>>>
>>>>>> On 12/13/18, 11:51 AM, Chris Plummer wrote:
>>>>>>> Hi Gary,
>>>>>>>
>>>>>>> Unfortunately GetStackTrace() returns the top frame as the first 
>>>>>>> frame in the array. Thus if the thread is not suspended and 
>>>>>>> later you call GetFrameLocation() for some depth, there's no way 
>>>>>>> to make sure the frame at that depth is the same frame at that 
>>>>>>> depth in the array returned by GetStackTrace() unless you first 
>>>>>>> suspend the frame before calling GetFrameLocation(). You would 
>>>>>>> also need to call GetFrameCount() while suspended, compare that 
>>>>>>> with the number of frames returned by GetStackTrace(), and make 
>>>>>>> any needed depth adjustments. But since I think the intent is to 
>>>>>>> not suspend the thread here, it probably does not make sense to 
>>>>>>> do that, and instead accept that there might be some errors as 
>>>>>>> you have done.
>>>>>> I'm not totally comfortable with these attempted operations and then
>>>>>> bypassing the test failure if the threads were not suspended.
>>>>>> Would it be better to only perform check  thread operations on the
>>>>>> suspended threads and to totally skip the operations that can not
>>>>>> be safely performed when the thread is not suspended.
>>>>> I was starting to think along those lines. What's the point of 
>>>>> testing something if you know it can fail and you can't 
>>>>> distinguish between expected and unexpected failure. The other 
>>>>> option is to suspend like I describe above to make sure you can 
>>>>> accurately verify the stack trace. I'm just not sure if it makes 
>>>>> sense to do that. Why is the test run unsuspended in the first place?
>>>> The test runs several separate threads and each is presenting
>>>> a different scenario. It probably is good to try the various functions
>>>> about stack trace, frame counts and locations on suspended and resumed
>>>> threads. It is just a bit harder to determine what a failed condition
>>>> would be for the running threads.
>>>>
>>>>>>>
>>>>>>> One improvement I would like to see for your fix is to only 
>>>>>>> ignore JVMTI_ERROR_NO_MORE_FRAMES rather than ignore all 
>>>>>>> failures when suspended == NSK_TRUE.
>>>>>> The current layering of the macros and verify routines do not 
>>>>>> lend themselves
>>>>>> to this sort of selective error checking. I'd probably file a 
>>>>>> follow up bug to
>>>>>> address specific error checking.
>>>>> Can't you just bypass the use of NSK_VERIFY? Isn't JC getting rid 
>>>>> of it anyway and using his JNI exception mechanism? In that case 
>>>>> you would need to use the version that does not fail if there was 
>>>>> an exception.
>>>> I do not know what set of error returns would be acceptable
>>>> or how to express that in the new mechanism. That's why I suggested
>>>> addressing it in a follow up bug. It simply would narrow the range
>>>> of error returns allowed.
>>>>>>>
>>>>>>> Also, I don't think you want the suspended check here:
>>>>>>>
>>>>>>>  348         /* check if expected method frame found */
>>>>>>>  349         if ((suspended == NSK_TRUE) && (found <= 0)) {
>>>>>>>
>>>>>>> The check for finding the method is:
>>>>>>>
>>>>>>>  341             if (frameStack[j].method == 
>>>>>>> threadsDesc[i].method) {
>>>>>>>
>>>>>>> Since frameStack[] is returned by GetStackTrace(), it is not 
>>>>>>> impacted when not suspended, and the expected method should 
>>>>>>> always be in frameStack[] somewhere. The issue is only with 
>>>>>>> using GetFrameLocation() to correlate what is in the result of 
>>>>>>> GetStackTrace().
>>>>>> Agreed. The early continuation was bypassing the found method 
>>>>>> checking.
>>>>>>
>>>>>> BTW, another failure has been detected in sp06t001. This time the 
>>>>>> threads are suspended,
>>>>>> but I believe there is a race between thread start and the call 
>>>>>> to interrupt() the thread.
>>>>>> I think there may be some confusion over which thread is invoking 
>>>>>> the interrupt() call.
>>>>>> It is running on the main thread from the code after a call to 
>>>>>> start the thread, but the
>>>>>> thread may not have run when the interrupt is requested.
>>>>>>
>>>>>> public class sp06t001 extends DebugeeClass {
>>>>>> ...
>>>>>>
>>>>>>         // create threads list
>>>>>>         threads = new sp06t001Thread[] {
>>>>>>             new sp06t001ThreadRunning("threadRunning", log),
>>>>>>             new sp06t001ThreadEntering("threadEntering", log),
>>>>>>             new sp06t001ThreadWaiting("threadWaiting", log),
>>>>>>             new sp06t001ThreadSleeping("threadSleeping", log),
>>>>>>             new 
>>>>>> sp06t001ThreadRunningInterrupted("threadRunningInterrupted", log),
>>>>>>             new 
>>>>>> sp06t001ThreadRunningNative("threadRunningNative", log)
>>>>>>         };
>>>>>>
>>>>>>         // run threads
>>>>>>         log.display("Starting tested threads");
>>>>>>         try {
>>>>>>             synchronized (endingMonitor) {
>>>>>>                 // start threads (except first one)
>>>>>>                 for (int i = 0; i < threads.length; i++) {
>>>>>>                     threads[i].start();
>>>>>>                     if (!threads[i].checkReady()) {
>>>>>>                         throw new Failure("Unable to prepare 
>>>>>> thread ..."
>>>>>> ...
>>>>>>
>>>>>> class sp06t001ThreadRunningInterrupted extends sp06t001Thread {
>>>>>> ...
>>>>>>     public boolean checkReady() {
>>>>>>         // interrupt thread on wait()
>>>>>>         synchronized (waitingMonitor) {
>>>>>>             interrupt();
>>>>>>         }
>>>>>>         return true;
>>>>>>     }
>>>>> Certainly checkReady() could be called before the target thread 
>>>>> has started the wait(). Then the thread does the wait() and 
>>>>> probably never exits (or maybe the early interrupted is resulting 
>>>>> in some other unexpected behavior).
>>>> If we get to the wait then the framecount would be fine.
>>>>
>>>> The debuggee class launches 6 threads. The agent is synced with the 
>>>> debuggee
>>>> starting. I don't see the native agent synchronizing on the 6 
>>>> started threads,
>>>> when the threads are suspended and checked.
>>>>
>>>> As far as I can tell the call to interrupted() just sets a flag and 
>>>> it is up to the thread
>>>> itself to decide how to interpret the isInterrupted() state.
>>>>>
>>>>> I think you need synchronize between the two threads. What if the 
>>>>> target thread does a synchronized (waitingMonitor), sets a flag, 
>>>>> and then a wait(). The main thread sleeps until the flag is set, 
>>>>> does a synchronized(waitingMonitor), and then the interrupt().
>>>> Since the test is already using a waiting monitor, I do not want to
>>>> disturb how it is being used in setting up the interrupted thread 
>>>> test case.
>>>> The test is also using a "volatile boolean threadReady" for signalling
>>>> conditions in the test.
>>>>
>>>> On Fri I was experimenting with another boolean flag to delay the
>>>> checkReady until the testMethod was at the waiting monitor,
>>>> e.g. ready to be interrupted. It seems to work but I was waiting
>>>> for enough testruns before posting a revised webrev.
>>>>
>>>>>
>>>>> Chris
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On 12/13/18 5:25 AM, Gary Adams wrote:
>>>>>>>> While testing I ran into another of the related failures that were
>>>>>>>> associated with the original bug.
>>>>>>>> The following fake exception stacktrace is for failure analysis.
>>>>>>>> nsk.share.Fake_Exception_for_RULE_Creation: (sp02t003.cpp:313) 
>>>>>>>> jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, 
>>>>>>>> &qLocation)
>>>>>>>>     at nsk_lvcomplain(nsk_tools.cpp:172)
>>>>>>>> # ERROR: sp02t003.cpp, 313: 
>>>>>>>> jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, 
>>>>>>>> &qLocation)
>>>>>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>>>>>> - sp02t003.cpp, 310:       3 frame: method: 0x7f225042b2d8, 
>>>>>>>> location: 16
>>>>>>>> # ERROR: sp02t003.cpp, 313: 
>>>>>>>> jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, 
>>>>>>>> &qLocation)
>>>>>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>>>>>> # ERROR: sp02t003.cpp, 350: No expected method frame for not 
>>>>>>>> suspended thread #4 (threadRunningInterrupted)
>>>>>>>> In this particular failure, the GetFrameLocation call failed, 
>>>>>>>> because the frame was no longer
>>>>>>>> accessible.
>>>>>>>>
>>>>>>>> If the threads are not suspended, the GetFrameLocation may fail,
>>>>>>>> or if it passes, the qMethod and qLocation could belong to another
>>>>>>>> frame.
>>>>>>>>
>>>>>>>> I've updated the webrev to allow the call to GetFrameLocation, 
>>>>>>>> but to only
>>>>>>>> report an error if the threads are suspended. Similarly, the 
>>>>>>>> checks to compare
>>>>>>>> qMethod and qLocation will be skipped, if the threads are not 
>>>>>>>> suspended.
>>>>>>>> And the final comparison that the method was found will only be 
>>>>>>>> an error
>>>>>>>> if the threads are suspended.
>>>>>>>>
>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.01/
>>>>>>>>
>>>>>>>> On 12/12/18, 11:54 AM, Gary Adams wrote:
>>>>>>>>> After some testing with nsk verbose messages enabled,
>>>>>>>>> it is clear that this test is failing in checkThreads when the
>>>>>>>>> location did not match between the call to GetStackTrace
>>>>>>>>> and GetFrameLocation. For the tests that are run when the threads
>>>>>>>>> have not been suspended, there is no guarantee the locations
>>>>>>>>> will match.
>>>>>>>>>
>>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8051349
>>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.00/
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>