JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events

Wed Aug 22 16:54:01 UTC 2018

Updated webrev:
   http://cr.openjdk.java.net/~gadams/8034084/webrev.01/index.html

On 8/15/18, 8:35 PM, David Holmes wrote:
> +1
>
> Thanks,
> David
>
> On 15/08/2018 8:41 AM, Alex Menkov wrote:
>> Hi Gary,
>>
>> I think check for RawMonitorWait error should be moved into the loop
>> or exit loop condition can be updated to
>>
>> 243     if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
>> 244         break;
>> 245     }
>>
>> --alex
>>
>> On 08/14/2018 10:03, Gary Adams wrote:
>>> I did 1000 testruns with these changes. There were 3 failures due to 
>>> "executor timeout"
>>> but those are not related to these changes (infrastructure).
>>>
>>>    Webrev: 
>>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>>>    Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>>
>>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>>> Hi Gary,
>>>>
>>>> Good analysis. I agree with your reasoning.
>>>>
>>>> I think these tests were written in a way that assumed that we have 
>>>> to locally timeout to detect things not happening "quickly enough" 
>>>> but as we well know "quickly enough" is highly dependent on a 
>>>> number of factors. And now we rely on the test framework to manage 
>>>> timeouts.
>>>>
>>>> I think removing the timeout from the first wait() and simply 
>>>> allowing the framework timeouts to take effect is a good solution.
>>>>
>>>> Repeating the wait in the second case with twice the timeout is 
>>>> less clear. One second wasn't long enough, but perhaps 3 seconds 
>>>> will be. And it probably will but is not guaranteed. The real 
>>>> guarantee is to wait until you see the expected outcome (either by 
>>>> polling using the timed-wait, or else by inserting a notify 
>>>> elsewhere - which itself may be tricky). And once you start pushing 
>>>> on the wait/notify logic the whole structure of the test 
>>>> potentially changes as you either wait until you get the expected 
>>>> result, or else get timed out by the framework.
>>>>
>>>> I think waiting a bit longer will probably suffice and cause 
>>>> minimum disruption to the test logic, but I'd be included to put 
>>>> the wait into a for-loop and use 1-second timeouts per iteration ie.
>>>>
>>>> // wait for up to 3 seconds
>>>> for (int i = 0 ; i < 3 ; i++) {
>>>>   err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>>>   if (endsCount == endsExpected)
>>>>     break;
>>>> }
>>>>
>>>> Thanks,
>>>> David
>>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>>> There's a fairly old bug concerning jvmti thread start and end 
>>>>> events.
>>>>> I believe there are 2 possible race conditions in the threadstart003
>>>>> test based on some arbitrary timeout values used in the test itself.
>>>>>
>>>>> Here are a few snippets from the test to illustrate the problem area.
>>>>>
>>>>> In the initial sequence RunAgentThread is used to start a thread
>>>>> with the agentThreadProc entry point.
>>>>>
>>>>>     188        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>     189        if (err != JVMTI_ERROR_NONE) {
>>>>>     190            printf("(RawMonitorEnter) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     191                   TranslateError(err), err);
>>>>>     192            result = FAILED;
>>>>>     193        }
>>>>>
>>>>> *   194        err = (*jvmti)->RunAgentThread(jvmti, thr, 
>>>>> agentThreadProc,*
>>>>>     195            NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>>>     196        if (err != JVMTI_ERROR_NONE) {
>>>>>     197            printf("(RunAgentThread) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     198                   TranslateError(err), err);
>>>>>     199            result = FAILED;
>>>>>     200        }
>>>>>
>>>>> *   201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>> (jlong)WAIT_TIME);*
>>>>>     202        if (err != JVMTI_ERROR_NONE) {
>>>>>     203            printf("(RawMonitorWait) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     204                   TranslateError(err), err);
>>>>>     205            result = FAILED;
>>>>>     206        }
>>>>>     207        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>     208        if (err != JVMTI_ERROR_NONE) {
>>>>>     209            printf("(RawMonitorExit) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     210                   TranslateError(err), err);
>>>>>     211            result = FAILED;
>>>>>     212        }
>>>>>
>>>>> I believe the test expects to wake up based on the code in 
>>>>> agentThreadProc
>>>>> notification of the wait_lock.
>>>>>
>>>>>     121        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>     122        if (err != JVMTI_ERROR_NONE) {
>>>>>     123            printf("(RawMonitorEnter) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     124                   TranslateError(err), err);
>>>>>     125            result = FAILED;
>>>>>     126        }
>>>>> *   127        err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>>>     128        if (err != JVMTI_ERROR_NONE) {
>>>>>     129            printf("(RawMonitorNotify) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     130                   TranslateError(err), err);
>>>>>     131            result = FAILED;
>>>>>     132        }
>>>>>     133        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>     134        if (err != JVMTI_ERROR_NONE) {
>>>>>     135            printf("(RawMonitorExit) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     136                   TranslateError(err), err);
>>>>>     137            result = FAILED;
>>>>>     138        }
>>>>>
>>>>> A second block for the wait_lock does not have a corresponding
>>>>> notify call. I'm guessing the monitor is expected to time out,
>>>>> so it allows for one second to complete the delivery of the
>>>>> thread end event.
>>>>>
>>>>>    213
>>>>>     214        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>     215        if (err != JVMTI_ERROR_NONE) {
>>>>>     216            printf("(RawMonitorEnter) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     217                   TranslateError(err), err);
>>>>>     218            result = FAILED;
>>>>>     219        }
>>>>> *   220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>> (jlong)WAIT_TIME);*
>>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>>     222            printf("(RawMonitorWait) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     223                   TranslateError(err), err);
>>>>>     224            result = FAILED;
>>>>>     225        }
>>>>>     226        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>     227        if (err != JVMTI_ERROR_NONE) {
>>>>>     228            printf("(RawMonitorExit) unexpected error: %s 
>>>>> (%d)\n",
>>>>>     229                   TranslateError(err), err);
>>>>>     230            result = FAILED;
>>>>>     231        }
>>>>>
>>>>> I see 2 possible race conditions here.
>>>>>
>>>>>    - if the thread is slow to start, then the first RawMonitorWait
>>>>>      could timeout. The code would continue to the second 
>>>>> RawMonitorWait
>>>>>      which would wake up from the agentThreadProc call to 
>>>>> RawMonitorNotify.
>>>>>
>>>>>    - if the thread ending is slow to complete then the second 
>>>>> RawMonitorWait
>>>>>      could timeout before the thread end event is processed.
>>>>>
>>>>> After the second RawMonitorWait completes, the event notification 
>>>>> is disabled
>>>>> for both thread start and end events. If we disable the end thread 
>>>>> event before
>>>>> the thread end event is delivered the counts will not match as 
>>>>> expected.
>>>>>
>>>>>     233        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>>> JVMTI_DISABLE,
>>>>>     234                JVMTI_EVENT_THREAD_START, NULL);
>>>>>     235        if (err != JVMTI_ERROR_NONE) {
>>>>>     236            printf("Failed to disable 
>>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>>>     237                   TranslateError(err), err);
>>>>>     238            result = FAILED;
>>>>>     239        }
>>>>>     240 *
>>>>>     241        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>>>> JVMTI_DISABLE,
>>>>>     242                JVMTI_EVENT_THREAD_END, NULL);*
>>>>>     243        if (err != JVMTI_ERROR_NONE) {
>>>>>     244            printf("Failed to disable 
>>>>> JVMTI_EVENT_THREAD_END: %s (%d)\n",
>>>>>     245                   TranslateError(err), err);
>>>>>     246            result = FAILED;
>>>>>     247        }
>>>>>
>>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>>> For the second RawMonitorWait, we should retry the wait if
>>>>> endsCount does not equal endsExpected.
>>>>>
>>>>>     201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>>
>>>>>
>>>>>     220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>>>> (jlong)WAIT_TIME);
>>>>>                if (endsCount != endsExpected) {
>>>>>                    err = (*jvmti)->RawMonitorWait(jvmti, 
>>>>> wait_lock, (jlong)WAIT_TIME * 2);
>>>>>                }
>>>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>>>
>>>>> So we remove the first race condition by removing the timeout on 
>>>>> the first wait call.
>>>>> And we accommodate a possible second race condition by retrying 
>>>>> the second wait with
>>>>> an extended delay.
>>>>>
>>>>> An alternative to the retry could be to add a RawMonitorNotify 
>>>>> call in the thread
>>>>> end event callback ThreadEnd() after the endsCount has been 
>>>>> incremented.
>>>>>
>>>>> This test fails very infrequently, so it may be difficult to prove 
>>>>> that the specific
>>>>> problem has been resolved. I'll try some testing with the proposed 
>>>>> changes
>>>>> and see if we can work towards getting this test off the 
>>>>> ProblemList for jdk12.
>>>>>
>>>>>
>>>