RFR: JDK-8051349: nsk/jvmti/scenarios/sampling/SP06/sp06t003 fails in nightly

Mon Dec 17 22:30:03 UTC 2018

On 18/12/2018 8:22 am, Chris Plummer wrote:
> I think you need to set interruptReady inside the synchronized block. 
> Otherwise checkReady() can still get to the interrupt() call before 
> testedMethod() gets to the wait().

That is not an issue. You can interrupt at any time independent of 
owning the monitor. The interrupt remains pending until the thread 
explicitly clears it, or an API, like wait() sees it, throws IE and 
clears it.

David

> Chris
> 
> On 12/15/18 12:24 PM, gary.adams at oracle.com wrote:
>> Updated webrev:
>>
>> http://cr.openjdk.java.net/~gadams/8051349/webrev.02/index.html
>>
>> On 12/15/18 8:15 AM, gary.adams at oracle.com wrote:
>>> On 12/14/18 4:43 PM, Chris Plummer wrote:
>>>> On 12/14/18 5:28 AM, Gary Adams wrote:
>>>>
>>>> BTW, I forgot to ask if you can fix the typo in the following comment:
>>>>
>>>>  345             /* check frame equalaty */
>>>
>>> Yes, I fixed that one on Fri.
>>> I usually save the typos for the final webrev.
>>>
>>>>> On 12/13/18, 11:51 AM, Chris Plummer wrote:
>>>>>> Hi Gary,
>>>>>>
>>>>>> Unfortunately GetStackTrace() returns the top frame as the first 
>>>>>> frame in the array. Thus if the thread is not suspended and later 
>>>>>> you call GetFrameLocation() for some depth, there's no way to make 
>>>>>> sure the frame at that depth is the same frame at that depth in 
>>>>>> the array returned by GetStackTrace() unless you first suspend the 
>>>>>> frame before calling GetFrameLocation(). You would also need to 
>>>>>> call GetFrameCount() while suspended, compare that with the number 
>>>>>> of frames returned by GetStackTrace(), and make any needed depth 
>>>>>> adjustments. But since I think the intent is to not suspend the 
>>>>>> thread here, it probably does not make sense to do that, and 
>>>>>> instead accept that there might be some errors as you have done.
>>>>> I'm not totally comfortable with these attempted operations and then
>>>>> bypassing the test failure if the threads were not suspended.
>>>>> Would it be better to only perform check  thread operations on the
>>>>> suspended threads and to totally skip the operations that can not
>>>>> be safely performed when the thread is not suspended.
>>>> I was starting to think along those lines. What's the point of 
>>>> testing something if you know it can fail and you can't distinguish 
>>>> between expected and unexpected failure. The other option is to 
>>>> suspend like I describe above to make sure you can accurately verify 
>>>> the stack trace. I'm just not sure if it makes sense to do that. Why 
>>>> is the test run unsuspended in the first place?
>>> The test runs several separate threads and each is presenting
>>> a different scenario. It probably is good to try the various functions
>>> about stack trace, frame counts and locations on suspended and resumed
>>> threads. It is just a bit harder to determine what a failed condition
>>> would be for the running threads.
>>>
>>>>>>
>>>>>> One improvement I would like to see for your fix is to only ignore 
>>>>>> JVMTI_ERROR_NO_MORE_FRAMES rather than ignore all failures when 
>>>>>> suspended == NSK_TRUE.
>>>>> The current layering of the macros and verify routines do not lend 
>>>>> themselves
>>>>> to this sort of selective error checking. I'd probably file a 
>>>>> follow up bug to
>>>>> address specific error checking.
>>>> Can't you just bypass the use of NSK_VERIFY? Isn't JC getting rid of 
>>>> it anyway and using his JNI exception mechanism? In that case you 
>>>> would need to use the version that does not fail if there was an 
>>>> exception.
>>> I do not know what set of error returns would be acceptable
>>> or how to express that in the new mechanism. That's why I suggested
>>> addressing it in a follow up bug. It simply would narrow the range
>>> of error returns allowed.
>>>>>>
>>>>>> Also, I don't think you want the suspended check here:
>>>>>>
>>>>>>  348         /* check if expected method frame found */
>>>>>>  349         if ((suspended == NSK_TRUE) && (found <= 0)) {
>>>>>>
>>>>>> The check for finding the method is:
>>>>>>
>>>>>>  341             if (frameStack[j].method == threadsDesc[i].method) {
>>>>>>
>>>>>> Since frameStack[] is returned by GetStackTrace(), it is not 
>>>>>> impacted when not suspended, and the expected method should always 
>>>>>> be in frameStack[] somewhere. The issue is only with using 
>>>>>> GetFrameLocation() to correlate what is in the result of 
>>>>>> GetStackTrace().
>>>>> Agreed. The early continuation was bypassing the found method checking.
>>>>>
>>>>> BTW, another failure has been detected in sp06t001. This time the 
>>>>> threads are suspended,
>>>>> but I believe there is a race between thread start and the call to 
>>>>> interrupt() the thread.
>>>>> I think there may be some confusion over which thread is invoking 
>>>>> the interrupt() call.
>>>>> It is running on the main thread from the code after a call to 
>>>>> start the thread, but the
>>>>> thread may not have run when the interrupt is requested.
>>>>>
>>>>> public class sp06t001 extends DebugeeClass {
>>>>> ...
>>>>>
>>>>>         // create threads list
>>>>>         threads = new sp06t001Thread[] {
>>>>>             new sp06t001ThreadRunning("threadRunning", log),
>>>>>             new sp06t001ThreadEntering("threadEntering", log),
>>>>>             new sp06t001ThreadWaiting("threadWaiting", log),
>>>>>             new sp06t001ThreadSleeping("threadSleeping", log),
>>>>>             new 
>>>>> sp06t001ThreadRunningInterrupted("threadRunningInterrupted", log),
>>>>>             new sp06t001ThreadRunningNative("threadRunningNative", log)
>>>>>         };
>>>>>
>>>>>         // run threads
>>>>>         log.display("Starting tested threads");
>>>>>         try {
>>>>>             synchronized (endingMonitor) {
>>>>>                 // start threads (except first one)
>>>>>                 for (int i = 0; i < threads.length; i++) {
>>>>>                     threads[i].start();
>>>>>                     if (!threads[i].checkReady()) {
>>>>>                         throw new Failure("Unable to prepare thread 
>>>>> ..."
>>>>> ...
>>>>>
>>>>> class sp06t001ThreadRunningInterrupted extends sp06t001Thread {
>>>>> ...
>>>>>     public boolean checkReady() {
>>>>>         // interrupt thread on wait()
>>>>>         synchronized (waitingMonitor) {
>>>>>             interrupt();
>>>>>         }
>>>>>         return true;
>>>>>     }
>>>> Certainly checkReady() could be called before the target thread has 
>>>> started the wait(). Then the thread does the wait() and probably 
>>>> never exits (or maybe the early interrupted is resulting in some 
>>>> other unexpected behavior).
>>> If we get to the wait then the framecount would be fine.
>>>
>>> The debuggee class launches 6 threads. The agent is synced with the 
>>> debuggee
>>> starting. I don't see the native agent synchronizing on the 6 started 
>>> threads,
>>> when the threads are suspended and checked.
>>>
>>> As far as I can tell the call to interrupted() just sets a flag and 
>>> it is up to the thread
>>> itself to decide how to interpret the isInterrupted() state.
>>>>
>>>> I think you need synchronize between the two threads. What if the 
>>>> target thread does a synchronized (waitingMonitor), sets a flag, and 
>>>> then a wait(). The main thread sleeps until the flag is set, does a 
>>>> synchronized(waitingMonitor), and then the interrupt().
>>> Since the test is already using a waiting monitor, I do not want to
>>> disturb how it is being used in setting up the interrupted thread 
>>> test case.
>>> The test is also using a "volatile boolean threadReady" for signalling
>>> conditions in the test.
>>>
>>> On Fri I was experimenting with another boolean flag to delay the
>>> checkReady until the testMethod was at the waiting monitor,
>>> e.g. ready to be interrupted. It seems to work but I was waiting
>>> for enough testruns before posting a revised webrev.
>>>
>>>>
>>>> Chris
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On 12/13/18 5:25 AM, Gary Adams wrote:
>>>>>>> While testing I ran into another of the related failures that were
>>>>>>> associated with the original bug.
>>>>>>> The following fake exception stacktrace is for failure analysis.
>>>>>>> nsk.share.Fake_Exception_for_RULE_Creation: (sp02t003.cpp:313) jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>>>>> 	at nsk_lvcomplain(nsk_tools.cpp:172)
>>>>>>> # ERROR: sp02t003.cpp, 313: jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>>>>> - sp02t003.cpp, 310:       3 frame: method: 0x7f225042b2d8, location: 16
>>>>>>> # ERROR: sp02t003.cpp, 313: jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>>>>> # ERROR: sp02t003.cpp, 350: No expected method frame for not suspended thread #4 (threadRunningInterrupted)
>>>>>>> In this particular failure, the GetFrameLocation call failed, 
>>>>>>> because the frame was no longer
>>>>>>> accessible.
>>>>>>>
>>>>>>> If the threads are not suspended, the GetFrameLocation may fail,
>>>>>>> or if it passes, the qMethod and qLocation could belong to another
>>>>>>> frame.
>>>>>>>
>>>>>>> I've updated the webrev to allow the call to GetFrameLocation, 
>>>>>>> but to only
>>>>>>> report an error if the threads are suspended. Similarly, the 
>>>>>>> checks to compare
>>>>>>> qMethod and qLocation will be skipped, if the threads are not 
>>>>>>> suspended.
>>>>>>> And the final comparison that the method was found will only be 
>>>>>>> an error
>>>>>>> if the threads are suspended.
>>>>>>>
>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.01/
>>>>>>>
>>>>>>> On 12/12/18, 11:54 AM, Gary Adams wrote:
>>>>>>>> After some testing with nsk verbose messages enabled,
>>>>>>>> it is clear that this test is failing in checkThreads when the
>>>>>>>> location did not match between the call to GetStackTrace
>>>>>>>> and GetFrameLocation. For the tests that are run when the threads
>>>>>>>> have not been suspended, there is no guarantee the locations
>>>>>>>> will match.
>>>>>>>>
>>>>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8051349
>>>>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.00/
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>