JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events

Tue Aug 14 22:41:34 UTC 2018

Hi Gary,

I think check for RawMonitorWait error should be moved into the loop
or exit loop condition can be updated to

243     if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
244         break;
245     }

--alex

On 08/14/2018 10:03, Gary Adams wrote:
> I did 1000 testruns with these changes. There were 3 failures due to 
> "executor timeout"
> but those are not related to these changes (infrastructure).
> 
>    Webrev: http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>    Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
> 
> On 8/12/18, 7:01 PM, David Holmes wrote:
>> Hi Gary,
>>
>> Good analysis. I agree with your reasoning.
>>
>> I think these tests were written in a way that assumed that we have to 
>> locally timeout to detect things not happening "quickly enough" but as 
>> we well know "quickly enough" is highly dependent on a number of 
>> factors. And now we rely on the test framework to manage timeouts.
>>
>> I think removing the timeout from the first wait() and simply allowing 
>> the framework timeouts to take effect is a good solution.
>>
>> Repeating the wait in the second case with twice the timeout is less 
>> clear. One second wasn't long enough, but perhaps 3 seconds will be. 
>> And it probably will but is not guaranteed. The real guarantee is to 
>> wait until you see the expected outcome (either by polling using the 
>> timed-wait, or else by inserting a notify elsewhere - which itself may 
>> be tricky). And once you start pushing on the wait/notify logic the 
>> whole structure of the test potentially changes as you either wait 
>> until you get the expected result, or else get timed out by the 
>> framework.
>>
>> I think waiting a bit longer will probably suffice and cause minimum 
>> disruption to the test logic, but I'd be included to put the wait into 
>> a for-loop and use 1-second timeouts per iteration ie.
>>
>> // wait for up to 3 seconds
>> for (int i = 0 ; i < 3 ; i++) {
>>   err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>   if (endsCount == endsExpected)
>>     break;
>> }
>>
>> Thanks,
>> David
>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>> There's a fairly old bug concerning jvmti thread start and end events.
>>> I believe there are 2 possible race conditions in the threadstart003
>>> test based on some arbitrary timeout values used in the test itself.
>>>
>>> Here are a few snippets from the test to illustrate the problem area.
>>>
>>> In the initial sequence RunAgentThread is used to start a thread
>>> with the agentThreadProc entry point.
>>>
>>>     188        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>     189        if (err != JVMTI_ERROR_NONE) {
>>>     190            printf("(RawMonitorEnter) unexpected error: %s 
>>> (%d)\n",
>>>     191                   TranslateError(err), err);
>>>     192            result = FAILED;
>>>     193        }
>>>
>>> *   194        err = (*jvmti)->RunAgentThread(jvmti, thr, 
>>> agentThreadProc,*
>>>     195            NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>     196        if (err != JVMTI_ERROR_NONE) {
>>>     197            printf("(RunAgentThread) unexpected error: %s 
>>> (%d)\n",
>>>     198                   TranslateError(err), err);
>>>     199            result = FAILED;
>>>     200        }
>>>
>>> *   201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>> (jlong)WAIT_TIME);*
>>>     202        if (err != JVMTI_ERROR_NONE) {
>>>     203            printf("(RawMonitorWait) unexpected error: %s 
>>> (%d)\n",
>>>     204                   TranslateError(err), err);
>>>     205            result = FAILED;
>>>     206        }
>>>     207        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>     208        if (err != JVMTI_ERROR_NONE) {
>>>     209            printf("(RawMonitorExit) unexpected error: %s 
>>> (%d)\n",
>>>     210                   TranslateError(err), err);
>>>     211            result = FAILED;
>>>     212        }
>>>
>>> I believe the test expects to wake up based on the code in 
>>> agentThreadProc
>>> notification of the wait_lock.
>>>
>>>     121        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>     122        if (err != JVMTI_ERROR_NONE) {
>>>     123            printf("(RawMonitorEnter) unexpected error: %s 
>>> (%d)\n",
>>>     124                   TranslateError(err), err);
>>>     125            result = FAILED;
>>>     126        }
>>> *   127        err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>     128        if (err != JVMTI_ERROR_NONE) {
>>>     129            printf("(RawMonitorNotify) unexpected error: %s 
>>> (%d)\n",
>>>     130                   TranslateError(err), err);
>>>     131            result = FAILED;
>>>     132        }
>>>     133        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>     134        if (err != JVMTI_ERROR_NONE) {
>>>     135            printf("(RawMonitorExit) unexpected error: %s 
>>> (%d)\n",
>>>     136                   TranslateError(err), err);
>>>     137            result = FAILED;
>>>     138        }
>>>
>>> A second block for the wait_lock does not have a corresponding
>>> notify call. I'm guessing the monitor is expected to time out,
>>> so it allows for one second to complete the delivery of the
>>> thread end event.
>>>
>>>    213
>>>     214        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>     215        if (err != JVMTI_ERROR_NONE) {
>>>     216            printf("(RawMonitorEnter) unexpected error: %s 
>>> (%d)\n",
>>>     217                   TranslateError(err), err);
>>>     218            result = FAILED;
>>>     219        }
>>> *   220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>> (jlong)WAIT_TIME);*
>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>     222            printf("(RawMonitorWait) unexpected error: %s 
>>> (%d)\n",
>>>     223                   TranslateError(err), err);
>>>     224            result = FAILED;
>>>     225        }
>>>     226        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>     227        if (err != JVMTI_ERROR_NONE) {
>>>     228            printf("(RawMonitorExit) unexpected error: %s 
>>> (%d)\n",
>>>     229                   TranslateError(err), err);
>>>     230            result = FAILED;
>>>     231        }
>>>
>>> I see 2 possible race conditions here.
>>>
>>>    - if the thread is slow to start, then the first RawMonitorWait
>>>      could timeout. The code would continue to the second RawMonitorWait
>>>      which would wake up from the agentThreadProc call to 
>>> RawMonitorNotify.
>>>
>>>    - if the thread ending is slow to complete then the second 
>>> RawMonitorWait
>>>      could timeout before the thread end event is processed.
>>>
>>> After the second RawMonitorWait completes, the event notification is 
>>> disabled
>>> for both thread start and end events. If we disable the end thread 
>>> event before
>>> the thread end event is delivered the counts will not match as expected.
>>>
>>>     233        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>> JVMTI_DISABLE,
>>>     234                JVMTI_EVENT_THREAD_START, NULL);
>>>     235        if (err != JVMTI_ERROR_NONE) {
>>>     236            printf("Failed to disable 
>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>     237                   TranslateError(err), err);
>>>     238            result = FAILED;
>>>     239        }
>>>     240 *
>>>     241        err = (*jvmti)->SetEventNotificationMode(jvmti, 
>>> JVMTI_DISABLE,
>>>     242                JVMTI_EVENT_THREAD_END, NULL);*
>>>     243        if (err != JVMTI_ERROR_NONE) {
>>>     244            printf("Failed to disable JVMTI_EVENT_THREAD_END: 
>>> %s (%d)\n",
>>>     245                   TranslateError(err), err);
>>>     246            result = FAILED;
>>>     247        }
>>>
>>> My recommendation is to change the first RawMonitorWait to use 0
>>> rather than WAIT_TIME. This means it will wait until notified.
>>> For the second RawMonitorWait, we should retry the wait if
>>> endsCount does not equal endsExpected.
>>>
>>>     201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>
>>>
>>>     220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>> (jlong)WAIT_TIME);
>>>                if (endsCount != endsExpected) {
>>>                    err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
>>> (jlong)WAIT_TIME * 2);
>>>                }
>>>     221        if (err != JVMTI_ERROR_NONE) {
>>>
>>> So we remove the first race condition by removing the timeout on the 
>>> first wait call.
>>> And we accommodate a possible second race condition by retrying the 
>>> second wait with
>>> an extended delay.
>>>
>>> An alternative to the retry could be to add a RawMonitorNotify call 
>>> in the thread
>>> end event callback ThreadEnd() after the endsCount has been incremented.
>>>
>>> This test fails very infrequently, so it may be difficult to prove 
>>> that the specific
>>> problem has been resolved. I'll try some testing with the proposed 
>>> changes
>>> and see if we can work towards getting this test off the ProblemList 
>>> for jdk12.
>>>
>>>
>