RFR: JDK-8051349: nsk/jvmti/scenarios/sampling/SP06/sp06t003 fails in nightly

Sat Dec 15 13:15:08 UTC 2018

On 12/14/18 4:43 PM, Chris Plummer wrote:
> On 12/14/18 5:28 AM, Gary Adams wrote:
>
> BTW, I forgot to ask if you can fix the typo in the following comment:
>
>  345             /* check frame equalaty */

Yes, I fixed that one on Fri.
I usually save the typos for the final webrev.

>> On 12/13/18, 11:51 AM, Chris Plummer wrote:
>>> Hi Gary,
>>>
>>> Unfortunately GetStackTrace() returns the top frame as the first 
>>> frame in the array. Thus if the thread is not suspended and later 
>>> you call GetFrameLocation() for some depth, there's no way to make 
>>> sure the frame at that depth is the same frame at that depth in the 
>>> array returned by GetStackTrace() unless you first suspend the frame 
>>> before calling GetFrameLocation(). You would also need to call 
>>> GetFrameCount() while suspended, compare that with the number of 
>>> frames returned by GetStackTrace(), and make any needed depth 
>>> adjustments. But since I think the intent is to not suspend the 
>>> thread here, it probably does not make sense to do that, and instead 
>>> accept that there might be some errors as you have done.
>> I'm not totally comfortable with these attempted operations and then
>> bypassing the test failure if the threads were not suspended.
>> Would it be better to only perform check  thread operations on the
>> suspended threads and to totally skip the operations that can not
>> be safely performed when the thread is not suspended.
> I was starting to think along those lines. What's the point of testing 
> something if you know it can fail and you can't distinguish between 
> expected and unexpected failure. The other option is to suspend like I 
> describe above to make sure you can accurately verify the stack trace. 
> I'm just not sure if it makes sense to do that. Why is the test run 
> unsuspended in the first place?
The test runs several separate threads and each is presenting
a different scenario. It probably is good to try the various functions
about stack trace, frame counts and locations on suspended and resumed
threads. It is just a bit harder to determine what a failed condition
would be for the running threads.

>>>
>>> One improvement I would like to see for your fix is to only ignore 
>>> JVMTI_ERROR_NO_MORE_FRAMES rather than ignore all failures when 
>>> suspended == NSK_TRUE.
>> The current layering of the macros and verify routines do not lend 
>> themselves
>> to this sort of selective error checking. I'd probably file a follow 
>> up bug to
>> address specific error checking.
> Can't you just bypass the use of NSK_VERIFY? Isn't JC getting rid of 
> it anyway and using his JNI exception mechanism? In that case you 
> would need to use the version that does not fail if there was an 
> exception.
I do not know what set of error returns would be acceptable
or how to express that in the new mechanism. That's why I suggested
addressing it in a follow up bug. It simply would narrow the range
of error returns allowed.
>>>
>>> Also, I don't think you want the suspended check here:
>>>
>>>  348         /* check if expected method frame found */
>>>  349         if ((suspended == NSK_TRUE) && (found <= 0)) {
>>>
>>> The check for finding the method is:
>>>
>>>  341             if (frameStack[j].method == threadsDesc[i].method) {
>>>
>>> Since frameStack[] is returned by GetStackTrace(), it is not 
>>> impacted when not suspended, and the expected method should always 
>>> be in frameStack[] somewhere. The issue is only with using 
>>> GetFrameLocation() to correlate what is in the result of 
>>> GetStackTrace().
>> Agreed. The early continuation was bypassing the found method checking.
>>
>> BTW, another failure has been detected in sp06t001. This time the 
>> threads are suspended,
>> but I believe there is a race between thread start and the call to 
>> interrupt() the thread.
>> I think there may be some confusion over which thread is invoking the 
>> interrupt() call.
>> It is running on the main thread from the code after a call to start 
>> the thread, but the
>> thread may not have run when the interrupt is requested.
>>
>> public class sp06t001 extends DebugeeClass {
>> ...
>>
>>         // create threads list
>>         threads = new sp06t001Thread[] {
>>             new sp06t001ThreadRunning("threadRunning", log),
>>             new sp06t001ThreadEntering("threadEntering", log),
>>             new sp06t001ThreadWaiting("threadWaiting", log),
>>             new sp06t001ThreadSleeping("threadSleeping", log),
>>             new 
>> sp06t001ThreadRunningInterrupted("threadRunningInterrupted", log),
>>             new sp06t001ThreadRunningNative("threadRunningNative", log)
>>         };
>>
>>         // run threads
>>         log.display("Starting tested threads");
>>         try {
>>             synchronized (endingMonitor) {
>>                 // start threads (except first one)
>>                 for (int i = 0; i < threads.length; i++) {
>>                     threads[i].start();
>>                     if (!threads[i].checkReady()) {
>>                         throw new Failure("Unable to prepare thread ..."
>> ...
>>
>> class sp06t001ThreadRunningInterrupted extends sp06t001Thread {
>> ...
>>     public boolean checkReady() {
>>         // interrupt thread on wait()
>>         synchronized (waitingMonitor) {
>>             interrupt();
>>         }
>>         return true;
>>     }
> Certainly checkReady() could be called before the target thread has 
> started the wait(). Then the thread does the wait() and probably never 
> exits (or maybe the early interrupted is resulting in some other 
> unexpected behavior).
If we get to the wait then the framecount would be fine.

The debuggee class launches 6 threads. The agent is synced with the debuggee
starting. I don't see the native agent synchronizing on the 6 started 
threads,
when the threads are suspended and checked.

As far as I can tell the call to interrupted() just sets a flag and it 
is up to the thread
itself to decide how to interpret the isInterrupted() state.
>
> I think you need synchronize between the two threads. What if the 
> target thread does a synchronized (waitingMonitor), sets a flag, and 
> then a wait(). The main thread sleeps until the flag is set, does a 
> synchronized(waitingMonitor), and then the interrupt().
Since the test is already using a waiting monitor, I do not want to
disturb how it is being used in setting up the interrupted thread test case.
The test is also using a "volatile boolean threadReady" for signalling
conditions in the test.

On Fri I was experimenting with another boolean flag to delay the
checkReady until the testMethod was at the waiting monitor,
e.g. ready to be interrupted. It seems to work but I was waiting
for enough testruns before posting a revised webrev.

>
> Chris
>>>
>>> thanks,
>>>
>>> Chris
>>>
>>> On 12/13/18 5:25 AM, Gary Adams wrote:
>>>> While testing I ran into another of the related failures that were
>>>> associated with the original bug.
>>>> The following fake exception stacktrace is for failure analysis.
>>>> nsk.share.Fake_Exception_for_RULE_Creation: (sp02t003.cpp:313) jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>> 	at nsk_lvcomplain(nsk_tools.cpp:172)
>>>> # ERROR: sp02t003.cpp, 313: jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>> - sp02t003.cpp, 310:       3 frame: method: 0x7f225042b2d8, location: 16
>>>> # ERROR: sp02t003.cpp, 313: jvmti->GetFrameLocation(threadsDesc[i].thread, j, &qMethod, &qLocation)
>>>> #   jvmti error: code=31, name=JVMTI_ERROR_NO_MORE_FRAMES
>>>> # ERROR: sp02t003.cpp, 350: No expected method frame for not suspended thread #4 (threadRunningInterrupted)
>>>> In this particular failure, the GetFrameLocation call failed, 
>>>> because the frame was no longer
>>>> accessible.
>>>>
>>>> If the threads are not suspended, the GetFrameLocation may fail,
>>>> or if it passes, the qMethod and qLocation could belong to another
>>>> frame.
>>>>
>>>> I've updated the webrev to allow the call to GetFrameLocation, but 
>>>> to only
>>>> report an error if the threads are suspended. Similarly, the checks 
>>>> to compare
>>>> qMethod and qLocation will be skipped, if the threads are not 
>>>> suspended.
>>>> And the final comparison that the method was found will only be an 
>>>> error
>>>> if the threads are suspended.
>>>>
>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.01/
>>>>
>>>> On 12/12/18, 11:54 AM, Gary Adams wrote:
>>>>> After some testing with nsk verbose messages enabled,
>>>>> it is clear that this test is failing in checkThreads when the
>>>>> location did not match between the call to GetStackTrace
>>>>> and GetFrameLocation. For the tests that are run when the threads
>>>>> have not been suspended, there is no guarantee the locations
>>>>> will match.
>>>>>
>>>>>   Issue: https://bugs.openjdk.java.net/browse/JDK-8051349
>>>>>   Webrev: http://cr.openjdk.java.net/~gadams/8051349/webrev.00/
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20181215/6f456eb7/attachment-0001.html>