JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events

Thu Aug 16 00:35:31 UTC 2018

+1

Thanks,
David

On 15/08/2018 8:41 AM, Alex Menkov wrote:
> Hi Gary,
> 
> I think check for RawMonitorWait error should be moved into the loop
> or exit loop condition can be updated to
> 
> 243     if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
> 244         break;
> 245     }
> 
> --alex
> 
> On 08/14/2018 10:03, Gary Adams wrote:
>> I did 1000 testruns with these changes. There were 3 failures due to 
>> "executor timeout"
>> but those are not related to these changes (infrastructure).
>>
>>    Webrev: 
>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>>    Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>
>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>> Hi Gary,
>>>
>>> Good analysis. I agree with your reasoning.
>>>
>>> I think these tests were written in a way that assumed that we have 
>>> to locally timeout to detect things not happening "quickly enough" 
>>> but as we well know "quickly enough" is highly dependent on a number 
>>> of factors. And now we rely on the test framework to manage timeouts.
>>>
>>> I think removing the timeout from the first wait() and simply 
>>> allowing the framework timeouts to take effect is a good solution.
>>>
>>> Repeating the wait in the second case with twice the timeout is less 
>>> clear. One second wasn't long enough, but perhaps 3 seconds will be. 
>>> And it probably will but is not guaranteed. The real guarantee is to 
>>> wait until you see the expected outcome (either by polling using the 
>>> timed-wait, or else by inserting a notify elsewhere - which itself 
>>> may be tricky). And once you start pushing on the wait/notify logic 
>>> the whole structure of the test potentially changes as you either 
>>> wait until you get the expected result, or else get timed out by the 
>>> framework.
>>>
>>> I think waiting a bit longer will probably suffice and cause minimum 
>>> disruption to the test logic, but I'd be included to put the wait 
>>> into a for-loop and use 1-second timeouts per iteration ie.
>>>
>>> // wait for up to 3 seconds
>>> for (int i = 0 ; i < 3 ; i++) {
>>>   err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>>   if (endsCount == endsExpected)
>>>     break;
>>> }
>>>
>>> Thanks,
>>> David
>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>> There's a fairly old bug concerning jvmti thread start and end events.
>>>> I believe there are 2 possible race conditions in the threadstart003
>>>> test based on some arbitrary timeout values used in the test itself.
>>>>
>>>> Here are a few snippets from the test to illustrate the problem area.
>>>>
>>>> In the initial sequence RunAgentThread is used to start a thread
>>>> with the agentThreadProc entry point.
>>>>
>>>>     188        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>     189        if (err != JVMTI_ERROR_NONE) {
>>>>     190            printf("(RawMonitorEnter) unexpected error: %s 
>>>> (%d)\n",
>>>>     191                   TranslateError(err), err);
>>>>     192            result = FAILED;
>>>>     193        }
>>>>
>>>> *   194        err = (*jvmti)->RunAgentThread(jvmti, thr, 
>>>> agentThreadProc,*
>>>>     195            NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>>     196        if (err != JVMTI_ERROR_NONE) {
>>>>     197            printf("(RunAgentThread) unexpected error: %s 
>>>> (%d)\n",
>>>>     198                   TranslateError(err), err);
>>>>     199            result = FAILED;
>>>>     200        }
>>>>
>>>> *   201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>> (jlong)WAIT_TIME);*
>>>>     202        if (err != JVMTI_ERROR_NONE) {
>>>>     203            printf("(RawMonitorWait) unexpected error: %s 
>>>> (%d)\n",
>>>>     204                   TranslateError(err), err);
>>>>     205            result = FAILED;
>>>>     206        }
>>>>     207        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>     208        if (err != JVMTI_ERROR_NONE) {
>>>>     209            printf("(RawMonitorExit) unexpected error: %s 
>>>> (%d)\n",
>>>>     210                   TranslateError(err), err);
>>>>     211            result = FAILED;
>>>>     212        }
>>>>
>>>> I believe the test expects to wake up based on the code in 
>>>> agentThreadProc
>>>> notification of the wait_lock.
>>>>
>>>>     121        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>     122        if (err != JVMTI_ERROR_NONE) {
>>>>     123            printf("(RawMonitorEnter) unexpected error: %s 
>>>> (%d)\n",
>>>>     124                   TranslateError(err), err);
>>>>     125            result = FAILED;
>>>>     126        }
>>>> *   127        err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>>     128        if (err != JVMTI_ERROR_NONE) {
>>>>     129            printf("(RawMonitorNotify) unexpected error: %s 
>>>> (%d)\n",
>>>>     130                   TranslateError(err), err);
>>>>     131            result = FAILED;
>>>>     132        }
>>>>     133        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>     134        if (err != JVMTI_ERROR_NONE) {
>>>>     135            printf("(RawMonitorExit) unexpected error: %s 
>>>> (%d)\n",
>>>>     136                   TranslateError(err), err);
>>>>     137            result = FAILED;
>>>>     138        }
>>>>
>>>> A second block for the wait_lock does not have a corresponding
>>>> notify call. I'm guessing the monitor is expected to time out,
>>>> so it allows for one second to complete the delivery of the
>>>> thread end event.
>>>>
>>>>    213
>>>>     214        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>     215        if (err != JVMTI_ERROR_NONE) {
>>>>     216            printf("(RawMonitorEnter) unexpected error: %s 
>>>> (%d)\n",
>>>>     217                   TranslateError(err), err);
>>>>     218            result = FAILED;
>>>>     219        }
>>>> *   220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>> (jlong)WAIT_TIME);*
>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>     222            printf("(RawMonitorWait) unexpected error: %s 
>>>> (%d)\n",
>>>>     223                   TranslateError(err), err);
>>>>     224            result = FAILED;
>>>>     225        }
>>>>     226        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>     227        if (err != JVMTI_ERROR_NONE) {
>>>>     228            printf("(RawMonitorExit) unexpected error: %s 
>>>> (%d)\n",
>>>>     229                   TranslateError(err), err);
>>>>     230            result = FAILED;
>>>>     231        }
>>>>
>>>> I see 2 possible race conditions here.
>>>>
>>>>    - if the thread is slow to start, then the first RawMonitorWait
>>>>      could timeout. The code would continue to the second 
>>>> RawMonitorWait
>>>>      which would wake up from the agentThreadProc call to 
>>>> RawMonitorNotify.
>>>>
>>>>    - if the thread ending is slow to complete then the second 
>>>> RawMonitorWait
>>>>      could timeout before the thread end event is processed.
>>>>
>>>> After the second RawMonitorWait completes, the event notification is 
>>>> disabled
>>>> for both thread start and end events. If we disable the end thread 
>>>> event before
>>>> the thread end event is delivered the counts will not match as 
>>>> expected.
>>>>
>>>>     233        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>> JVMTI_DISABLE,
>>>>     234                JVMTI_EVENT_THREAD_START, NULL);
>>>>     235        if (err != JVMTI_ERROR_NONE) {
>>>>     236            printf("Failed to disable 
>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>>     237                   TranslateError(err), err);
>>>>     238            result = FAILED;
>>>>     239        }
>>>>     240 *
>>>>     241        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>> JVMTI_DISABLE,
>>>>     242                JVMTI_EVENT_THREAD_END, NULL);*
>>>>     243        if (err != JVMTI_ERROR_NONE) {
>>>>     244            printf("Failed to disable JVMTI_EVENT_THREAD_END: 
>>>> %s (%d)\n",
>>>>     245                   TranslateError(err), err);
>>>>     246            result = FAILED;
>>>>     247        }
>>>>
>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>> For the second RawMonitorWait, we should retry the wait if
>>>> endsCount does not equal endsExpected.
>>>>
>>>>     201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>
>>>>
>>>>     220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>> (jlong)WAIT_TIME);
>>>>                if (endsCount != endsExpected) {
>>>>                    err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>> (jlong)WAIT_TIME * 2);
>>>>                }
>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>
>>>> So we remove the first race condition by removing the timeout on the 
>>>> first wait call.
>>>> And we accommodate a possible second race condition by retrying the 
>>>> second wait with
>>>> an extended delay.
>>>>
>>>> An alternative to the retry could be to add a RawMonitorNotify call 
>>>> in the thread
>>>> end event callback ThreadEnd() after the endsCount has been 
>>>> incremented.
>>>>
>>>> This test fails very infrequently, so it may be difficult to prove 
>>>> that the specific
>>>> problem has been resolved. I'll try some testing with the proposed 
>>>> changes
>>>> and see if we can work towards getting this test off the ProblemList 
>>>> for jdk12.
>>>>
>>>>
>>